[feat] support for DeepseekV2

### 🚀 The feature, motivation and pitch

It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True

I'm monkey-patching myself for now, and wanted to leave some notes that may be helpful when support is added officially down the road.

```python
from accelerate import init_empty_weights
from transformers import AutoModelForCausalLM

with init_empty_weights():
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    modeling_mod = sys.modules[model.__class__.__module__]

modeling_mod.apply_rotary_pos_emb = liger_rotary_pos_emb
modeling_mod.DeepseekV2RMSNorm = LigerRMSNorm
modeling_mod.DeepseekV2MLP = LigerSwiGLUMLP
modeling_mod.CrossEntropyLoss = LigerCrossEntropyLoss
modeling_mod.DeepseekV2ForCausalLM.forward = deepseekv2_lce_forward
```

One initial issue when swapping in swiglu:

```
  File "/mnt/ML/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Coder-V2-Lite-Base/ea9b066cee82f82906fdd58898cb3788b1c5d770/modeling_deepseek.py", line 555, in <listcomp>
    DeepseekV2MLP(
TypeError: LigerSwiGLUMLP.__init__() got an unexpected keyword argument 'intermediate_size'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] support for DeepseekV2 #129

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feat] support for DeepseekV2 #129

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions