[Bug] Testing new Llama-3_3-Nemotron-Super-49B-v1 by Nvidia: "Model architectures ['DeciLMForCausalLM'] are not supported for now."

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

I tried to run on SGLang Llama-3_3-Nemotron-Super-49B-v1 recently announced by Nvidia.

It seems not to be yet supported by SGLang since `DeciLMForCausalLM`is not yet accepted by SGLang. See below.

Can you add corresponding support?

```
Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 1748, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 218, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 74, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 166, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 176, in initialize
    self.load_model()
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 361, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 358, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 137, in _initialize_model
    model_class, _ = get_model_architecture(model_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_loader/utils.py", line 37, in get_model_architecture
    return ModelRegistry.resolve_model_cls(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/registry.py", line 65, in resolve_model_cls
    return self._raise_for_unsupported(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/registry.py", line 32, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['DeciLMForCausalLM'] are not supported for now. Supported architectures: dict_keys(['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'MultiModalityCausalLM', 'DeepseekV3ForCausalLMNextN', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma2ForSequenceClassification', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GraniteForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'InternLM2ForRewardModel', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'InternLM3ForCausalLM', 'LlamaForClassification', 'LlamaForCausalLMEagle', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MiniCPMV', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'Phi3SmallForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2_5_VLForConditionalGeneration', 'Qwen2ForCausalLMEagle', 'Qwen2MoeForCausalLM', 'Qwen2ForRewardModel', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM', 'YiVLForCausalLM'])
```

### Reproduction

Start SGLang and with `nvidia/Llama-3_3-Nemotron-Super-49B-v1` coming from HuggingFace
The message above will appear right after this command

### Environment

Amazon Linux 2023
SGLang 0.0.4.post1 = last officially published version as of this writing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Testing new Llama-3_3-Nemotron-Super-49B-v1 by Nvidia: "Model architectures ['DeciLMForCausalLM'] are not supported for now." #4689

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Testing new Llama-3_3-Nemotron-Super-49B-v1 by Nvidia: "Model architectures ['DeciLMForCausalLM'] are not supported for now." #4689

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions