Skip to content

[feat] support for DeepseekV2 #129

Open
@tmm1

Description

@tmm1

🚀 The feature, motivation and pitch

It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True

I'm monkey-patching myself for now, and wanted to leave some notes that may be helpful when support is added officially down the road.

from accelerate import init_empty_weights
from transformers import AutoModelForCausalLM

with init_empty_weights():
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    modeling_mod = sys.modules[model.__class__.__module__]

modeling_mod.apply_rotary_pos_emb = liger_rotary_pos_emb
modeling_mod.DeepseekV2RMSNorm = LigerRMSNorm
modeling_mod.DeepseekV2MLP = LigerSwiGLUMLP
modeling_mod.CrossEntropyLoss = LigerCrossEntropyLoss
modeling_mod.DeepseekV2ForCausalLM.forward = deepseekv2_lce_forward

One initial issue when swapping in swiglu:

  File "/mnt/ML/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Coder-V2-Lite-Base/ea9b066cee82f82906fdd58898cb3788b1c5d770/modeling_deepseek.py", line 555, in <listcomp>
    DeepseekV2MLP(
TypeError: LigerSwiGLUMLP.__init__() got an unexpected keyword argument 'intermediate_size'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions