Skip to content

[Bug] Testing new Llama-3_3-Nemotron-Super-49B-v1 by Nvidia: "Model architectures ['DeciLMForCausalLM'] are not supported for now." #4689

Open
@didier-durand

Description

@didier-durand

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I tried to run on SGLang Llama-3_3-Nemotron-Super-49B-v1 recently announced by Nvidia.

It seems not to be yet supported by SGLang since DeciLMForCausalLMis not yet accepted by SGLang. See below.

Can you add corresponding support?

Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 1748, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 218, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 74, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 166, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 176, in initialize
    self.load_model()
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 361, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 358, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 137, in _initialize_model
    model_class, _ = get_model_architecture(model_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/utils.py", line 37, in get_model_architecture
    return ModelRegistry.resolve_model_cls(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/registry.py", line 65, in resolve_model_cls
    return self._raise_for_unsupported(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/registry.py", line 32, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['DeciLMForCausalLM'] are not supported for now. Supported architectures: dict_keys(['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'MultiModalityCausalLM', 'DeepseekV3ForCausalLMNextN', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma2ForSequenceClassification', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GraniteForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'InternLM2ForRewardModel', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'InternLM3ForCausalLM', 'LlamaForClassification', 'LlamaForCausalLMEagle', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MiniCPMV', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'Phi3SmallForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2_5_VLForConditionalGeneration', 'Qwen2ForCausalLMEagle', 'Qwen2MoeForCausalLM', 'Qwen2ForRewardModel', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM', 'YiVLForCausalLM'])

Reproduction

Start SGLang and with nvidia/Llama-3_3-Nemotron-Super-49B-v1 coming from HuggingFace
The message above will appear right after this command

Environment

Amazon Linux 2023
SGLang 0.0.4.post1 = last officially published version as of this writing

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions