Open
Description
🚀 The feature, motivation and pitch
It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True
I'm monkey-patching myself for now, and wanted to leave some notes that may be helpful when support is added officially down the road.
from accelerate import init_empty_weights
from transformers import AutoModelForCausalLM
with init_empty_weights():
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
modeling_mod = sys.modules[model.__class__.__module__]
modeling_mod.apply_rotary_pos_emb = liger_rotary_pos_emb
modeling_mod.DeepseekV2RMSNorm = LigerRMSNorm
modeling_mod.DeepseekV2MLP = LigerSwiGLUMLP
modeling_mod.CrossEntropyLoss = LigerCrossEntropyLoss
modeling_mod.DeepseekV2ForCausalLM.forward = deepseekv2_lce_forward
One initial issue when swapping in swiglu:
File "/mnt/ML/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Coder-V2-Lite-Base/ea9b066cee82f82906fdd58898cb3788b1c5d770/modeling_deepseek.py", line 555, in <listcomp>
DeepseekV2MLP(
TypeError: LigerSwiGLUMLP.__init__() got an unexpected keyword argument 'intermediate_size'