add function call parser for DeepSeek V3 #5224

finger92 · 2025-04-10T08:17:29Z

pls look at the old pr #5054 for basic information

update on Apr 10

I have updated the code and have fixed the "Token id 129279" problem. but this is still not working with xgramma. The output seems to be unstable when using xgrammar, exp the output may be
"<｜tool▁calls▁begin｜><<<<<<<<<<<<<<<<<<<<<<"
and then the server crashed with this exception:

File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 59, in accept_token
    assert self.matcher.accept_token(token)

I also add a new chat template which inspired by "function calling" part in "https://huggingface.co/deepseek-ai/DeepSeek-V2.5"

deepseek2.jinja

{% if not add_generation_prompt is defined %}
    {% set add_generation_prompt = false %}
{% endif %}
{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}
{%- for message in messages %}
    {%- if message['role'] == 'system' %}
        {%- if ns.is_first_sp %}
            {% set ns.system_prompt = ns.system_prompt + message['content'] %}
            {% set ns.is_first_sp = false %}
        {%- else %}
            {% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{{ bos_token }}
{{ ns.system_prompt }}
{%- for message in messages %}
    {%- if message['role'] == 'user' %}
        {%- set ns.is_tool = false -%}
        {%- set ns.is_first = false -%}
        {%- set ns.is_last_user = true -%}
        {{'<｜User｜>' + message['content'] + '<｜Assistant｜>'}}
    {%- endif %}
    {%- if message['role'] == 'assistant' and tools is defined and tools is not none %}
        {%- set ns.is_last_user = false -%}
        {%- if ns.is_tool %}
            {{'<｜tool▁outputs▁end｜>'}}
        {%- endif %}
        {%- set ns.is_first = false %}
        {%- set ns.is_tool = false -%}
        {%- set ns.is_output_first = true %}
        {%- if message['content'] is not none %}
            {{message['content']}} 
        {%- endif %} 
        {{'## Tools\n\n### Function\n\nYou have the following functions available:\n\n'}}
        {%- for tool in tools %}
            {{'- `' + tool['function']['name'] + '`:\n```json'}} {{- tool['function'] | tojson }} {{'\n```'}}
            {%- set ns.is_first = true -%}
        {%- endfor %}
    {%- endif %}
    {%- if message['role'] == 'assistant' and (tools is not defined or tools is none)%}
        {%- set ns.is_last_user = false -%}
        {%- if ns.is_tool %}
            {{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}
            {%- set ns.is_tool = false -%}
        {%- else %}
            {% set content = message['content'] %}
            {{content + '<｜end▁of▁sentence｜>'}}
        {%- endif %}
    {%- endif %}
    {%- if message['role'] == 'tool' %}
        {%- set ns.is_last_user = false -%}
        {%- set ns.is_tool = true -%}
        {%- if ns.is_output_first %}
            {{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
            {%- set ns.is_output_first = false %}
        {%- else %}
            {{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
        {%- endif %}
    {%- endif %}
{%- endfor -%}
{% if ns.is_tool %}
    {{'<｜tool▁outputs▁end｜>'}}
{% endif %}
{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}
    {{'<｜Assistant｜>'}}
{% endif %}

I did an function call test on these two templates and the result is

template	rate of success
deepseek.jinja	45%
deepseek2.jinja	100%

here is the test code

import requests, json, random, re

test_func = {
    "type": "function",
    "function": {
        "name": "query_weather",
        "description": "Get weather of an location, the user shoud supply a location first",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city, e.g. Beijing"
                }
            },
            "required": [
                "city"
            ]
        }
    }
}


req_base_url = "http://127.0.0.1:30000"
cities = [
    "Beijing", 
    "Chongqing",
    "Chengdu",
    "Dalian",
    "Guangzhou",
    "Hangzhou",
    "Harbin",
    "Hefei",
    "Kunming",
    "Lanzhou",
    "Nanjing",
    "Qingdao",
    "Shanghai",
    "Shenzhen",
    "Suzhou",
    "Tianjin",
    "Wuhan",
    "Xi'an",
    "Xiamen",
    "Zhengzhou"
]
user_content_tp = "Hows the weather like in {} today"
req_body = {
    "messages": [
        {
            "role": "user",
            "content": ""
        }
    ],
    "temperature": 0,
    "max_tokens": 100,
    "model": "deepseek-ai/DeepSeek-V3-0324",
    "tools": [
        test_func
    ]
}

total_failed = 0
for i in range(100):
    req_body["messages"][1]["content"] = user_content_tp.format(random.choice(cities))
    res = requests.post(req_base_url + "/v1/chat/completions", json=req_body)
    res = res.json()
    if len(res["choices"]) > 0:
        if len(res["choices"][0]["message"]["tool_calls"]) > 0:
            print("function call successfull: " + json.dumps(res["choices"][0]["message"]["tool_calls"][0]))
        else:
            total_failed += 1
            print("function call failed: " + json.dumps(res))
            
    # flush cache
    requests.get(req_base_url + "/flush_cache")
print(f"total_failed: {total_failed}")

minleminzui · 2025-04-11T03:24:33Z

could you add the document about tool calling of dspk v3， thanks. https://docs.sglang.ai/references/deepseek.html

finger92 · 2025-04-11T03:27:29Z

could you add the document about tool calling of dspk v3， thanks. https://docs.sglang.ai/references/deepseek.html

Sure, I will add it today

feng397 · 2025-04-11T03:30:30Z

Does it support streaming output？

finger92 · 2025-04-11T06:00:47Z

Does it support streaming output？

No. Is there any scenario where stream needs to be used during function call?

feng397 · 2025-04-11T06:29:30Z

Does it support streaming output？

No. Is there any scenario where stream needs to be used during function call?

I find it is supported in the official API:

ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]), finish_reason=None, index=0, logprobs=None)], created=, model='deepseek-chat', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_3d5141a69a_prod0225', usage=None)

ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='{"', name=None), type=None)]), finish_reason=None, index=0, logprobs=None)], created=, model='deepseek-chat', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_3d5141a69a_prod0225', usage=None)

ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='location', name=None), type=None)]), finish_reason=None, index=0, logprobs=None)], created=, model='deepseek-chat', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_3d5141a69a_prod0225', usage=None)

moqimoqidea · 2025-04-11T06:32:25Z

Does it support streaming output？

No. Is there any scenario where stream needs to be used during function call?

For example, Anthropic's Streaming request with tool use consists of a streaming message consisting of a text content (which may be empty) and a tool use part, and uses partial_json to combine the full functions and parameters of the tool use in multiple messages.

Thanks for your work, I think it would be awesome if sglang + deepseek v3 function call could support streaming.

…g doc

finger92 · 2025-04-11T07:22:27Z

could you add the document about tool calling of dspk v3， thanks. https://docs.sglang.ai/references/deepseek.html

I have added the doc.
I also add the chat template mentioned in this pr.

finger92 · 2025-04-11T07:27:49Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

minleminzui · 2025-04-11T07:43:37Z

@finger92 is the chat_template of dpsk v3 the same with the one of dpsk v2.5? why did you use the chat_template of dpsk v2.5?

minleminzui · 2025-04-11T07:46:51Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

@feng397 @moqimoqidea @finger92 Streaming request with tool use has been realized in Sglang. No additional implementation is required for dpsk v3

finger92 · 2025-04-11T07:55:14Z

@finger92 is chat_template of dpsk v3 the same the one of dpsk v2.5? why did you use the chat_template of dpsk v2.5?

Oh, I just check and see that they are different. I will update it to V3's template

finger92 · 2025-04-11T07:57:29Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

@feng397 @moqimoqidea @finger92 Streaming request with tool use has been realized in Sglang. No additional implementation is required for dpsk v3

I see that. But the "function part" for deepseek output is not completely json format, so it has some compatible issues

minleminzui · 2025-04-11T08:05:04Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

@feng397 @moqimoqidea @finger92 Streaming request with tool use has been realized in Sglang. No additional implementation is required for dpsk v3

I see that. But the "function part" for deepseek output is not completely json format, so it has some compatible issues

ok，thanks， I'll check it， but please remove the deepseek.jinja in docs/references/deepseek.jinja, replace the url https://github.com/sgl-project/sglang/tree/main/docs/references/deepseek.jinja with https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/tokenizer_config.json#L34

finger92 · 2025-04-11T09:46:57Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

@feng397 @moqimoqidea @finger92 Streaming request with tool use has been realized in Sglang. No additional implementation is required for dpsk v3

I see that. But the "function part" for deepseek output is not completely json format, so it has some compatible issues

ok，thanks， I'll check it， but please remove the deepseek.jinja in docs/references/deepseek.jinja, replace the url https://github.com/sgl-project/sglang/tree/main/docs/references/deepseek.jinja with https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/tokenizer_config.json#L34

The chat template I submitted is an adapted version of V3's.
Using the original chat template, my idea is to add a function call prompt to the message in adapter.py when the tools appear in request. Do you think this is a good idea?

minleminzui · 2025-04-11T10:36:19Z

@feng397 @moqimoqidea Thanks for pointing out. I will check these. Maybe I can create another PR for implement "function calling with streaming request". Probably next week.

@feng397 @moqimoqidea @finger92 Streaming request with tool use has been realized in Sglang. No additional implementation is required for dpsk v3

I see that. But the "function part" for deepseek output is not completely json format, so it has some compatible issues

ok，thanks， I'll check it， but please remove the deepseek.jinja in docs/references/deepseek.jinja, replace the url https://github.com/sgl-project/sglang/tree/main/docs/references/deepseek.jinja with https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/tokenizer_config.json#L34

The chat template I submitted is an adapted version of V3's. Using the original chat template, my idea is to add a function call prompt to the message in adapter.py when the tools appear in request. Do you think this is a good idea?

@finger92 I think it is better not to change its original chat template, please use the original chat template, change the code，thanks

…n enable deepseek func call parser

python/sglang/srt/openai_api/adapter.py

finger92 · 2025-04-17T09:59:12Z

@zhaochenyang20 Hi, thanks for trigging CI. I think the failed check is not related to this PR. what do you think?

zhaochenyang20 · 2025-04-17T23:27:28Z

@finger92 after rerun it, we can merge. No need to do anything on your side.

minleminzui · 2025-04-12T05:42:31Z

python/sglang/srt/function_call_parser.py

@@ -25,6 +25,7 @@
    "<tool_call>",
    "<|python_tag|>",
    "[TOOL_CALLS]",
+    "<｜tool▁calls▁begin｜>",


#5054 (comment)

zhaochenyang20 · 2025-04-18T00:44:38Z

@finger92 COuld you please rebase with main, I don't have the right to write. So it fails due to lack of #5503

finger92 · 2025-04-18T02:03:46Z

@finger92 COuld you please rebase with main, I don't have the right to write. So it fails due to lack of #5503

Thanks for replying, I have updated it

zhaochenyang20 · 2025-04-18T04:04:38Z

@finger92 cool, wait for us to run the CI. No need to do anything!

finger92 · 2025-04-18T09:31:28Z

@zhaochenyang20 I saw the CI has been passed, do I need to update branch again?

minleminzui · 2025-04-18T09:33:06Z

@zhaochenyang20 I saw the CI has been passed, do I need to update branch again?

No need. Just wait for it to be merged

YCG09 · 2025-04-23T09:49:37Z

@finger92 I encountered a problem when using --tool-call-parser deepseekv3 to enable tool calling: it conflicts with Multi-token Prediction (MTP). I followed the documentation from https://docs.sglang.ai/references/deepseek.html regarding MTP usage for DeepSeek models, with a launch command like:

python3 -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3-0324 \
  --speculative-algorithm EAGLE \
  --speculative-draft-model-path lmsys/DeepSeek-V3-0324-NextN \
  --speculative-num-steps 1 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 2 \
  --trust-remote-code \
  --tp 8

When MTP is enabled, the tool-call-parser fails to return valid tool call results. Disabling MTP allows tool calling to work correctly.
Do you have any idea why enabling MTP conflicts with tool calling in DeepSeek? Is there any way to make them work together?

minleminzui · 2025-04-23T10:14:33Z

@finger92 I encountered a problem when using --tool-call-parser deepseekv3 to enable tool calling: it conflicts with Multi-token Prediction (MTP). I followed the documentation from https://docs.sglang.ai/references/deepseek.html regarding MTP usage for DeepSeek models, with a launch command like:

python3 -m sglang.launch_server
--model-path deepseek-ai/DeepSeek-V3-0324
--speculative-algorithm EAGLE
--speculative-draft-model-path lmsys/DeepSeek-V3-0324-NextN
--speculative-num-steps 1
--speculative-eagle-topk 1
--speculative-num-draft-tokens 2
--trust-remote-code
--tp 8
When MTP is enabled, the tool-call-parser fails to return valid tool call results. Disabling MTP allows tool calling to work correctly. Do you have any idea why enabling MTP conflicts with tool calling in DeepSeek? Is there any way to make them work together?

please raise an issue， thanks. Our current tool calls are all based on structral tags, which leads to conflicts with speculative decoding

stevewein · 2025-04-28T06:01:03Z

After deploying deepseekv3-0324 using sglang 0.4.5-post3-cu124, when testing the function call, it was found that if the message contains system information, the function behaves abnormally.

for example

curl "http://XXX/v1/chat/completions"
-H "Content-Type: application/json"
-d '{
"messages": [
{
"role": "system",
"content": "Use the tools to answer the questions."
},
{
"role": "user",
"content": "What is the weather in Tokyo"
}
],
"temperature": 0.3,
"max_tokens": 1000,
"model": "mssgpt_v3",
"tool_choice": "auto",
"tools": [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two numbers",
"parameters": {
"properties": {
"a": {
"title": "A",
"type": "integer"
},
"b": {
"title": "B",
"type": "integer"
}
},
"required": [
"a",
"b"
],
"title": "addArguments",
"type": "object"
}
}
},
{
"type": "function",
"function": {
"name": "get_secret_word",
"description": "",
"parameters": {
"properties": {},
"title": "get_secret_wordArguments",
"type": "object"
}
}
},
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "",
"parameters": {
"properties": {
"city": {
"title": "City",
"type": "string"
}
},
"required": [
"city"
],
"title": "get_current_weatherArguments",
"type": "object"
}
}
}
]
}
'
the function call is wrong and "content" is not null, "tool_calls" is null

{
"id": "1d2854e227fe437191ee78b16e1bcfd8",
"object": "chat.completion",
"created": 1745811416,
"model": "mssgpt-v3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'll add the numbers 7 and 22 for you. One moment.???????????????????????????",
"reasoning_content": null,
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 1
}
],
"usage": {
"prompt_tokens": 278,
"total_tokens": 349,
"completion_tokens": 71,
"prompt_tokens_details": null
}
}

BUT....

curl "http://xxxx/v1/chat/completions"
-H "Content-Type: application/json"
-d '{
"messages": [

    {
        "role": "user",
        "content": "What is the weather in Tokyo"
    }
],
"temperature": 0.3,
"max_tokens": 1000,
"model": "mssgpt_v3",
"tool_choice": "auto",
"tools": [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two numbers",
            "parameters": {
                "properties": {
                    "a": {
                        "title": "A",
                        "type": "integer"
                    },
                    "b": {
                        "title": "B",
                        "type": "integer"
                    }
                },
                "required": [
                    "a",
                    "b"
                ],
                "title": "addArguments",
                "type": "object"
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_secret_word",
            "description": "",
            "parameters": {
                "properties": {},
                "title": "get_secret_wordArguments",
                "type": "object"
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "",
            "parameters": {
                "properties": {
                    "city": {
                        "title": "City",
                        "type": "string"
                    }
                },
                "required": [
                    "city"
                ],
                "title": "get_current_weatherArguments",
                "type": "object"
            }
        }
    }
]

}
'
function call correct tool_calls reuslt successs

{
"id": "f387b0fe9e5a40e8b40a585d69dcfeb1",
"object": "chat.completion",
"created": 1745811225,
"model": "mssgpt_v3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"reasoning_content": null,
"tool_calls": [
{
"id": "2",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{"city": "Tokyo"}"
}
}
]
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 261,
"total_tokens": 282,
"completion_tokens": 21,
"prompt_tokens_details": null
}
}

stevewein · 2025-04-28T06:10:23Z

@finger92 After deploying deepseekv3-0324 using sglang 0.4.5-post3-cu124,
1、when testing the function call, it was found that if the message contains system information, the function behaves abnormally.
2、when testing the function call with message of role "tool",
the sglang replay with " Internal Server Error "

stevewein · 2025-04-28T06:11:26Z

@finger92 please check issue #5814 #5815

lambert0312 · 2025-04-28T09:12:36Z

1、when testing the function call, it was found that if the message contains system information, the function behaves abnormally.

I think it's because of the following code:

                if (
                    tools
                    and tokenizer_manager.server_args.tool_call_parser == "deepseekv3"
                ):
                    # add function call prompt to deepseekv3
                    openai_compatible_messages.append(
                        {
                            "role": "system",
                            "content": """You are a helpful Assistant.
                    ## Tools
                    ### Function
                    You have the following functions available:
                    """
                            + "".join(
                                [
                                    f"""
                        - `{tool['name']}`:
                        ```json
                        {json.dumps(tool)}
                        ```
                        """
                                    for tool in tools
                                ]
                            ),
                        }
                    )

If the user requests to add his own system prompt words, it will conflict with this!

ForcewithMe66 · 2025-04-28T15:59:15Z

Hi @stevewein , I also met the problems

when testing the function call with message of role "tool",the sglang replay with " Internal Server Error "

I can fix this problem by setting --disable-cuda-graph as suggested in the comment sgl code.

However, the TPS drops from 50 to 10 on 1 H20(141G) nodes. @minleminzui .

finger92 · 2025-04-29T12:41:00Z

Hi @stevewein , I also met the problems
when testing the function call with message of role "tool",the sglang replay with " Internal Server Error "
I can fix this problem by setting --disable-cuda-graph as suggested in the comment sgl code.

However, the TPS drops from 50 to 10 on 1 H20(141G) nodes. @minleminzui .

I trid with the latest sglang code also on 1 H20 node, and the tps is about 50t/s, my command is

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --mem-fraction-static 0.9 --tool-call-parser deepseekv3

CatherineSue · 2025-04-29T20:30:19Z

python/sglang/srt/openai_api/adapter.py

@@ -938,6 +938,35 @@ def v1_chat_generate_request(

            if chat_template_name is None:
                openai_compatible_messages = []
+                if (
+                    tools
+                    and tokenizer_manager.server_args.tool_call_parser == "deepseekv3"


why do we not add a chat template for this? this kind of prompt engineering is more suitable to add into a chat template.

why do we not add a chat template for this? this kind of prompt engineering is more suitable to add into a chat template.

Agree to update a new built-in template

Got it, we can create a new PR by using custom chat template

* fix: update pr-test-sgl-kernel (sgl-project#5399) * kernel: support slightly faster merge_state_v2 cuda kernel (sgl-project#5381) * chore: bump sgl-kernel 0.0.9 (sgl-project#5400) * chore: upgrade sgl-kernel 0.0.9 (sgl-project#5401) * Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (sgl-project#5406) * Fix bench_serving with random-ids (sgl-project#5214) * [misc] fix ci flaky case (sgl-project#5352) * [FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (sgl-project#5412) * Support dynamic connection and TP 16 (sgl-project#5351) Co-authored-by: luoyuan.luo <[email protected]> * Fix broadcast use cuda device lead to memory capacity unbalanced (sgl-project#5416) * [PD] Fix dynamic port support and MLA buffer for Mooncake (sgl-project#5415) Signed-off-by: Shangming Cai <[email protected]> Co-authored-by: ybyang <[email protected]> * Distinguish bootstrap key only in decode server (sgl-project#5422) * [PD] Remove unused bootstrap param and fix port table type (sgl-project#5423) * [minor] cleanup cmakelists.txt (sgl-project#5420) * bugfix: fix merge_state_v2 cuda graph (sgl-project#5419) * chore: bump sgl-kernel v0.0.9.post1 (sgl-project#5430) * fix: solve release issue (sgl-project#5434) * BLackwell cutlass mla: Add check for bad page size/block num combinations (sgl-project#5431) * feat: update model_specific_adjustment (sgl-project#5344) Co-authored-by: hebiao064 <[email protected]> * chore: upgrade sgl-kernel 0.0.9.post1 (sgl-project#5436) * Fix ignore_eos parameter when loading a chat template (sgl-project#5264) * add attention backend supporting matrix in the doc (sgl-project#5211) Co-authored-by: Stefan He <[email protected]> * Support BNB quantization for llama/mllama (sgl-project#5038) Co-authored-by: Yuhao Yang <[email protected]> * [Docs] Update start/install.md (sgl-project#5398) * [Minor] Move torch.compile patch to a better place (sgl-project#5397) * [Bug fix] need record start time in pd mode (sgl-project#5425) * Support MHA with chunked prefix cache for DeepSeek chunked prefill (sgl-project#5113) * chore: bump v0.4.5.post1 (sgl-project#5445) * Fix several minor issues in PD disaggregation (sgl-project#5444) * [doc] Update benchmark_and_profiling.md (sgl-project#5449) * Update cutlass dependency. (sgl-project#5447) * add multi-lora feature in README.md (sgl-project#5463) * Clean up imports (sgl-project#5467) * [verl] Modify the update_weights func to align with verl's resharding (sgl-project#5345) Co-authored-by: Chayenne <[email protected]> * [Model Support] unsloth/Phi-4-mini bnb model (sgl-project#4982) Co-authored-by: yhyang201 <[email protected]> Co-authored-by: Liangsheng Yin <[email protected]> Co-authored-by: Chayenne <[email protected]> Co-authored-by: Yineng Zhang <[email protected]> * Update attention_backend.md: plural form (sgl-project#5489) * Add test for flash_attn_varlen_func kernel (sgl-project#5484) * Deprecate disable-mla (sgl-project#5481) * Deprecate enable-flashinfer-mla and enable-flashmla (sgl-project#5480) * Feat/support encoder model (like bert) (sgl-project#4887) * Enable local attention during decode (sgl-project#5479) * Refactor DeepSeek decoder layer branches (sgl-project#5205) * Fix a link in sgl-kernel/README.md (sgl-project#5493) * [Bug fix] use correct func path in deepseek (sgl-project#5496) Signed-off-by: Xuchun Shang <[email protected]> * Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (sgl-project#5503) * [Feat] Update sgl-kernel flashinfer to latest main version (sgl-project#5500) Co-authored-by: zhyncs <[email protected]> * Fix: Incorrect parameters passed to forward_batch_generation (sgl-project#5506) (sgl-project#5511) * Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (sgl-project#5426) Co-authored-by: ocss884 <[email protected]> * [docs] Fix several consistency issues in sampling_params.md (sgl-project#5373) Signed-off-by: windsonsea <[email protected]> Co-authored-by: Baizhou Zhang <[email protected]> * Configuration qwen2_moe.py - qkv_bias now in transformers (sgl-project#5512) * Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (sgl-project#4836) * Sgl kernel fused_moe_gate support n_shared_experts (sgl-project#5440) * chore: bump sgl-kernel 0.0.9.post2 (sgl-project#5518) * use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (sgl-project#5473) Co-authored-by: Zhang Kaihong <[email protected]> * fix kimi vl running bug after rebase main (sgl-project#5461) * fix bug of VLLM_AVAILABLE not defined (sgl-project#5497) * Avoid computing lse in Ragged Prefill when there's no prefix. (sgl-project#5476) Co-authored-by: Baizhou Zhang <[email protected]> * [Model] Adding Qwen3 and Qwen3MoE (sgl-project#4693) * fix util import (sgl-project#5542) * Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (sgl-project#5544) * chore: upgrade sgl-kernel 0.0.9.post2 (sgl-project#5540) * Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (sgl-project#5340) * Make profiler output file names consistent (sgl-project#5548) * [PD] Tiny fix timeout error when generate (sgl-project#5545) * [PD] Fix no cache connect for recevier (sgl-project#5534) * feat: use flashinfer jit package (sgl-project#5547) * [PD] Remove the requirement of config file for mooncake backend (sgl-project#5460) * restruct compressed_tensors_w8a8_fp8 (sgl-project#5475) * simplify the control logic for using shared experts fusion (sgl-project#5504) * Remove one kernel in per_tensor_quant_mla_fp8 (sgl-project#5549) * Fix sampler nan check when calling top_k_top_p_sampling_from_probs (sgl-project#5546) * [PD] Support page size > 1 (sgl-project#5561) * fix hicache write back (sgl-project#5543) * Minor update for ROCm variable style (sgl-project#5562) * Fix bench_one_batch producing unnatural results for expert parallel (sgl-project#5149) * [perf] introduce deep gemm group_gemm_masked as bmm (sgl-project#5432) * [PD] Fix DeepSeek cannot be run on latest master (sgl-project#5568) * Fix BumpAllocator error when no input_ids (sgl-project#5564) * enable DeepSeek V3 shared_experts_fusion in sm90 (sgl-project#5571) * [Fix] fix outlines and xgrammar (sgl-project#4947) * [Doc]Add instruction for profiling with bench_one_batch (sgl-project#5581) * Release v0.4.5.post2 (sgl-project#5582) * Fix bench_serving fail when zero warmup requests (sgl-project#5574) * Fix DeepEP cannot run on latest master (sgl-project#5567) * Fix torch memory saver not enabled in DP scenario (sgl-project#5560) * Super tiny fix typo (sgl-project#5559) * Add document for LoRA serving (sgl-project#5521) * Tiny improve error message (sgl-project#5526) * [PD] Fix server crash when using batch requests (sgl-project#5531) * [Feat] upgrade pytorch2.6 (sgl-project#5417) * Fix enable chunked prefill for Llama4 (sgl-project#5575) * fix: use fa3 for gemma2 (sgl-project#5586) * Fix ChatCompletionMessageGenericParam to allow for None content (sgl-project#5452) * [PD] Fix large page size + chunk prefill (sgl-project#5588) * Add test config yamls for Deepseek v3 (sgl-project#5433) * [Feature] Prefill assistant response - add continue_final_message parameter (sgl-project#4226) Co-authored-by: Chayenne <[email protected]> * add function call parser for DeepSeek V3 (sgl-project#5224) * smaller and non gated models for docs (sgl-project#5378) * Feat: Implement JSON Mode (response_format.type="json_object") (sgl-project#4733) Co-authored-by: Kyle Pena <[email protected]> * check marlin format before attempting conversion (sgl-project#4675) * compressed_tensors: port w8a16 fp8 from vllm (sgl-project#4852) * Fix one more issue reported by torchfix (sgl-project#4859) * Add sanity check for max_running_requests (sgl-project#5016) * Correct grafana heatmap. (sgl-project#5019) * Perform Batch Tokenization. (sgl-project#5141) * Speedup shared expert weight construction by avoid cloning (sgl-project#5188) * Tiny add Engine.flush_cache API (sgl-project#5241) * [misc] remove is_cuda_available (sgl-project#5319) * Fix flush cache (sgl-project#5590) * Add Speculative Decoding Eagle3 topk > 1 (sgl-project#5318) Co-authored-by: Stefan He <[email protected]> Co-authored-by: Yubo Wang <[email protected]> * upstream hicache fixes (sgl-project#5570) * Tiny add warning when cannot recognize bool env var (sgl-project#5348) * Modify metrics service endpoint (sgl-project#3443) * Update protocol.py to fix sgl-project#4589 (sgl-project#4590) * [Feat.] Enable grafana to show metrics (sgl-project#4718) Co-authored-by: zhaochenyang20 <[email protected]> * [Fix] Enhance DP Attention for IPv6 Compatibility (sgl-project#4937) * Support o1 model on Azure (sgl-project#4980) Co-authored-by: Shan Yu <[email protected]> * Tiny remove duplicated code (sgl-project#5021) * Tiny update error hint (sgl-project#5037) * Support PD bootstrap fields on /v1/chat/completions endpoint (sgl-project#5488) * [PD] Fix generate endpoint of min_lb for PD (sgl-project#5598) Signed-off-by: Shangming Cai <[email protected]> * [PD] Fix edge case and simplify large page size + chunked prefill (sgl-project#5589) * [PD] Add NIXL transfer backend (sgl-project#5477) * [PD] Support decode overlap schedule (sgl-project#5608) * [PD] Support prefill overlap + Ensure no race condition (sgl-project#5609) * Enhance GPU memory settings (sgl-project#5604) * [feature] enable pre compile jit deep_gemm (sgl-project#5580) * Clean up mem settings (sgl-project#5610) * Support aiter RMSNorm in AMD (sgl-project#5510) Co-authored-by: JieXin Liang <[email protected]> * chore: bump v0.4.5.post3 (sgl-project#5611) * Remove extra copy in deepseek forward absorb (sgl-project#5578) Co-authored-by: saienduri <[email protected]> * [Doc] Fix a 404 link to llama-405b (sgl-project#5615) Signed-off-by: windsonsea <[email protected]> * [fix] force use deepgemm in compile_deep_gemm (sgl-project#5618) * [fix] fix compile_deep_gemm missing kv_b_proj (sgl-project#5620) * fix: gemma 3 not use softcap (sgl-project#5622) * Fix FA3 DeepSeek prefill performance regression (sgl-project#5624) Co-authored-by: ispobock <[email protected]> * [NFC] Remove duplicate `compressed-tensors` (sgl-project#5640) * Fix shared experts fusion error without quantization (sgl-project#5632) * [feature] Add H20 fp8_w8a8 FusedMoE config for --n-share-experts-fusion=16 (sgl-project#5641) Co-authored-by: yuethe <[email protected]> * fix flashmla bug (sgl-project#5272) * [fix] reduce dp capture bs (sgl-project#5634) Co-authored-by: alcanerian <[email protected]> * Remove q concat in FA3 backend for DeepSeek decode (sgl-project#5638) * Revert "Support aiter RMSNorm in AMD" (sgl-project#5646) * fix: update bench_speculative (sgl-project#5649) * Turn on DeepGemm By Default and Update Doc (sgl-project#5628) * Fuse q_a_proj and kv_a_proj (sgl-project#5619) * Remove unnecessary `torch.full` in DeepSeek (sgl-project#5601) * [1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (sgl-project#5281) * fix sgl-kernel unit tests (sgl-project#5666) * fix awq_dequantize import (sgl-project#5669) * Integrating PD disaggregation with DP attention and DeepEP (sgl-project#5435) Co-authored-by: Byron Hsu <[email protected]> * fix gemma3 unit test (sgl-project#5670) * fix torchvision::nms not exist (sgl-project#5671) * [PD] Add support for dp attention with mooncake (sgl-project#5530) Signed-off-by: Shangming Cai <[email protected]> * tune the threshold of gemma-2-27b-it in test_nightly_gsm8k_eval.py (sgl-project#5677) * [Doc] Fix two 404 links caused by sglang typo (sgl-project#5667) Signed-off-by: windsonsea <[email protected]> * fix: update truss bench_serving (sgl-project#5683) * fix: only compile ApplyTokenBitmaskInplace cu124+ (sgl-project#5686) * chore: bump sgl-kernel 0.1.0 (sgl-project#5688) * vlm: enable radix cache for qwen-vl models (sgl-project#5349) Co-authored-by: Xinyuan Tong <[email protected]> * [BugFix] Fix combination of MTP and `--n-share-experts-fusion`with R1 (sgl-project#5707) * Fix weight loading bug for Deepseek v3+nextn (sgl-project#5684) * Add example to use sgl engine with fastapi (sgl-project#5648) Co-authored-by: Ravi Theja Desetty <[email protected]> * [Doc] Fix a link to Weilin Zhao (sgl-project#5706) Signed-off-by: windsonsea <[email protected]> * Add MMMU benchmark results (sgl-project#4491) Co-authored-by: Ravi Theja Desetty <[email protected]> * [Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct) (sgl-project#5078) Co-authored-by: vincent-4 <[email protected]> * [PD] Better logs (sgl-project#5715) * [PD] Add kvargs table and thread pool for kvcache sender of mooncake (sgl-project#5738) Signed-off-by: Shangming Cai <[email protected]> * [PD]: Support Muti Prefill in one node (sgl-project#5704) Co-authored-by: shuaills <[email protected]> * Fix: deepseek forward absorb (sgl-project#5723) Co-authored-by: ispobock <[email protected]> * Pin torch audio to 2.6.0 (sgl-project#5750) * Revert "[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct)" (sgl-project#5754) * Disable flaky eagle tests (sgl-project#5753) * update triton 3.2.0 h200 fused moe triton config and add warning about triton fused_moe_kernel performance degradation due to different Triton versions. (sgl-project#5740) * [Docs] Update runtime/engine/readme.md (sgl-project#5737) Signed-off-by: windsonsea <[email protected]> * Reorder loop in shared expert weight loading (sgl-project#5719) * fix: fix one more bug from merging mm_inputs (sgl-project#5718) Co-authored-by: Xinyuan Tong <[email protected]> Co-authored-by: XinyuanTong <[email protected]> * [Fix]: support deepseek-vl2-tiny model (sgl-project#5552) Co-authored-by: bppps <[email protected]> * Bugfix for minicpmo vision test (sgl-project#5760) * [Minor] fix documentations (sgl-project#5756) * Add an assertion to enhance the robustness of the operator (sgl-project#5736) * fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512 (sgl-project#5733) * Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (sgl-project#5728) * [fix] fix potential bumpy throughtput with deepgemm (sgl-project#5722) * Resolves the `404 Not Found` error when running `compile_deep_gemm.py` in multi-node setups (sgl-project#5720) * perf: update H20 fused_moe_triton kernel config to get higher throughput during prefilling (sgl-project#5716) * we fix the non existent access of `decrypted_config_file` (sgl-project#5685) * CI: rewrite test_vision_chunked_prefill to speedup (sgl-project#5682) * Fuse MLA set kv cache kernel (sgl-project#5748) * Update amd docker image to `sglang:v0.4.5.post3-rocm630`. (sgl-project#5697) * [feature] support for roberta embedding models (sgl-project#5730) * [fix] fix bench_one_batch_server (sgl-project#5607) * support for the DeepSeek model by enabling streaming response parsing (sgl-project#5592) * fix: Use `is not None` instead of `!= None` for None checks. (sgl-project#5687) * Add Llama 4 to FA3 test (sgl-project#5509) * [misc] more decode step log for batch_one_batch (sgl-project#5565) * Handle JSONDecodeError while processing request data (sgl-project#5599) * fix(srt): check if sample_indices is not None before usage. (sgl-project#5633) * update llguidance to 0.7.11; adds StructTag (sgl-project#4870) * Use sgl-kernel sgl_per_token_group_quant_int8 (sgl-project#4971) * Add memory_saver check (sgl-project#4986) Signed-off-by: Kebe <[email protected]> * add switch to disable open api doc (sgl-project#3744) Signed-off-by: congcongke <[email protected]> * Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512" (sgl-project#5772) * Fix eagle test case (sgl-project#5776) * Split local attention test from fa3 test (sgl-project#5774) * Revert "Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512"" (sgl-project#5777) * Simplify FA3 tests (sgl-project#5779) * Revert "[fix] fix bench_one_batch_server" (sgl-project#5785) * Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (sgl-project#5786) * [CI] Tune threshold (sgl-project#5787) * [CI] fix port conflicts (sgl-project#5789) * [CI] Fix ci tests (sgl-project#5769) * [PD]Reduce kv transfer threads (sgl-project#5791) * [CI] Fix test case (sgl-project#5790) * Add 8-GPU Test for Deepseek-V3 (sgl-project#5691) Co-authored-by: Lianmin Zheng <[email protected]> * Release v0.4.6 (sgl-project#5795) * Update nightly-test.yml (sgl-project#5797) * [CI] Improve github summary & enable fa3 for more models (sgl-project#5796) * [Docs] update grafana setup guide in production metrics (sgl-project#5643) Co-authored-by: NoahM <[email protected]> * [Misc] add structure logging, write to file and log tracing for SGL Router * Improve overlap scheduling (sgl-project#5788) * Add Cutlass MLA attention backend (sgl-project#5390) * chore: upgrade sgl-kernel 0.1.0 (sgl-project#5690) * Dockerfile.dev pip scikit_build_core (sgl-project#5807) * Add a doc to fix sgl-kernel build link error in py39 with ccache (sgl-project#5809) * Turn on overlap scheduler for multimodal models (sgl-project#5771) * Tiny refactor DefaultModelLoader.Source (sgl-project#5482) * [Docs] Replace lists with tables for cleanup and readability in server_arguments (sgl-project#5276) * Revert "Tiny refactor DefaultModelLoader.Source" (sgl-project#5825) * Feat: add support for thinking mode via chat_template_kwargs.enable_t… (sgl-project#5551) Co-authored-by: shuaills <[email protected]> Co-authored-by: Chayenne <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Yineng Zhang <[email protected]> * fix: fix the error where the content is None when reasoning and tool … (sgl-project#5838) * feat: Add fused moe triton config for qwen3 moe on h100 (sgl-project#5833) * fused moe triton tuning script support qwen3 (sgl-project#5842) * feat: Add fused moe triton config for qwen3bf16 moe on h20 (sgl-project#5839) * [PD] support pd fake transfer for warmup (sgl-project#5726) * [config] qwen3moe_tune_h20 fp8 tp4 (sgl-project#5846) * [Doc] Recover history of server_arguments.md (sgl-project#5851) * feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 (sgl-project#5850) * [CI] test chunked prefill more (sgl-project#5798) * ROCm: update AITER (sgl-project#5816) * [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (sgl-project#5847) Co-authored-by: sighingnow <[email protected]> * [Fix] Missing bootstrap_port field (sgl-project#5823) * feat: update is_fa3_default_architecture (sgl-project#5854) * add fused moe config for qwen3moe fp8/bf16 (sgl-project#5849) * chore: bump v0.4.6.post1 (sgl-project#5845) * fix for hpu backend in model runner and server args Signed-off-by: Mohit Sinha <[email protected]> * rebase formatting issue Signed-off-by: Mohit Sinha <[email protected]> * [SW-228218]: Fix device mismatch in frequency penalty. Ensure tensors in BatchedFrequencyPenalizer are on the same device by moving output_ids and frequency_penalties to the device of cumulated_frequency_penalties. This resolves a RuntimeError caused by tensors on cpu and hpu:0 during logits subtraction. --------- Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: Xuchun Shang <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: Kebe <[email protected]> Signed-off-by: congcongke <[email protected]> Signed-off-by: Mohit Sinha <[email protected]> Co-authored-by: Yineng Zhang <[email protected]> Co-authored-by: DefTruth <[email protected]> Co-authored-by: fzyzcjy <[email protected]> Co-authored-by: Yuhong Guo <[email protected]> Co-authored-by: JieXin Liang <[email protected]> Co-authored-by: Zhaoyang Hao <[email protected]> Co-authored-by: Yuan Luo <[email protected]> Co-authored-by: luoyuan.luo <[email protected]> Co-authored-by: lambert0312 <[email protected]> Co-authored-by: shangmingc <[email protected]> Co-authored-by: ybyang <[email protected]> Co-authored-by: Liangsheng Yin <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Trevor Morris <[email protected]> Co-authored-by: hebiao064 <[email protected]> Co-authored-by: Chang Su <[email protected]> Co-authored-by: mRSun15 <[email protected]> Co-authored-by: ryang <[email protected]> Co-authored-by: Yuhao Yang <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: ybyang <[email protected]> Co-authored-by: Baizhou Zhang <[email protected]> Co-authored-by: Cheng Wan <[email protected]> Co-authored-by: Xiaoyu Zhang <[email protected]> Co-authored-by: Elfie Guo <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: BearBiscuit <[email protected]> Co-authored-by: Chayenne <[email protected]> Co-authored-by: eigen <[email protected]> Co-authored-by: yhyang201 <[email protected]> Co-authored-by: Didier Durand <[email protected]> Co-authored-by: woodx <[email protected]> Co-authored-by: Xuchun Shang <[email protected]> Co-authored-by: mlmz <[email protected]> Co-authored-by: PGFLMG <[email protected]> Co-authored-by: u4lr451 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Michael Feil <[email protected]> Co-authored-by: strgrb <[email protected]> Co-authored-by: Zhang Kaihong <[email protected]> Co-authored-by: liwenju0 <[email protected]> Co-authored-by: Wenxuan Tan <[email protected]> Co-authored-by: yhyang201 <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: Byron Hsu <[email protected]> Co-authored-by: Zhiqiang Xie <[email protected]> Co-authored-by: Zhaoyi Li <[email protected]> Co-authored-by: lukec <[email protected]> Co-authored-by: tarinkk <[email protected]> Co-authored-by: AmadeusW <[email protected]> Co-authored-by: Adarsh Shirawalmath <[email protected]> Co-authored-by: Yi Zhou <[email protected]> Co-authored-by: simveit <[email protected]> Co-authored-by: kyle-pena-kuzco <[email protected]> Co-authored-by: Kyle Pena <[email protected]> Co-authored-by: Enrique Shockwave <[email protected]> Co-authored-by: Juwan Yoo <[email protected]> Co-authored-by: Brayden Zhong <[email protected]> Co-authored-by: mac0ne <[email protected]> Co-authored-by: Sundara Raman Ramachandran <[email protected]> Co-authored-by: Qingquan Song <[email protected]> Co-authored-by: moontidef <[email protected]> Co-authored-by: Huapeng Zhou <[email protected]> Co-authored-by: Lucius <[email protected]> Co-authored-by: Chuyue Sun <[email protected]> Co-authored-by: Shan Yu <[email protected]> Co-authored-by: Yongtong Wu <[email protected]> Co-authored-by: michael-amd <[email protected]> Co-authored-by: Ke Bao <[email protected]> Co-authored-by: saienduri <[email protected]> Co-authored-by: ispobock <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: saltyfish66 <[email protected]> Co-authored-by: yuethe <[email protected]> Co-authored-by: alcanerian <[email protected]> Co-authored-by: HAI <[email protected]> Co-authored-by: Mick <[email protected]> Co-authored-by: Xinyuan Tong <[email protected]> Co-authored-by: Ravi Theja <[email protected]> Co-authored-by: Ravi Theja Desetty <[email protected]> Co-authored-by: vincent-4 <[email protected]> Co-authored-by: IAN <[email protected]> Co-authored-by: shuaills <[email protected]> Co-authored-by: XinyuanTong <[email protected]> Co-authored-by: ZXN <[email protected]> Co-authored-by: bppps <[email protected]> Co-authored-by: Yi Zhang <[email protected]> Co-authored-by: Kyungmin Lee <[email protected]> Co-authored-by: vzed <[email protected]> Co-authored-by: DavidBao <[email protected]> Co-authored-by: Frankey_8080 <[email protected]> Co-authored-by: yan97ao <[email protected]> Co-authored-by: aoshen524 <[email protected]> Co-authored-by: Michał Moskal <[email protected]> Co-authored-by: Kebe <[email protected]> Co-authored-by: zhanweidu <[email protected]> Co-authored-by: NoahM <[email protected]> Co-authored-by: Simo Lin <[email protected]> Co-authored-by: JiLi <[email protected]> Co-authored-by: sighingnow <[email protected]> Co-authored-by: XTY <[email protected]> Co-authored-by: vikram singh shekhawat <[email protected]>

finger92 added 3 commits April 4, 2025 07:53

add function call parer for deepseek v3

6ce1b8e

Merge branch 'sgl-project:main' into main

80444bd

typo fix for deepseek function call parser

ce47a14

finger92 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners April 10, 2025 08:17

finger92 changed the title ~~Deepseek func call~~ add function call parser for DeepSeek V3 Apr 10, 2025

finger92 mentioned this pull request Apr 10, 2025

add function call parser for DeepSeek V3 #5054

Closed

6 tasks

minleminzui self-assigned this Apr 11, 2025

bugfix for multiple call in one request; add deepseek function callin…

bdc4f1e

…g doc

finger92 requested a review from zhaochenyang20 as a code owner April 11, 2025 07:20

remove custom deepseek chat template; add function calling prompt whe…

958d70b

…n enable deepseek func call parser

sleepcoo reviewed Apr 16, 2025

View reviewed changes

python/sglang/srt/openai_api/adapter.py Show resolved Hide resolved

improve assistant prompt for dsv3 func call

8d28250

minleminzui approved these changes Apr 18, 2025

View reviewed changes

Merge branch 'main' into deepseek_func_call

fd02d16

Merge branch 'main' into deepseek_func_call

53852ce

merrymercy merged commit fac17ac into sgl-project:main Apr 21, 2025
58 of 72 checks passed

tarinkk pushed a commit to Pb314314/sglang that referenced this pull request Apr 21, 2025

add function call parser for DeepSeek V3 (sgl-project#5224)

3884757

Frank-Jie mentioned this pull request Apr 21, 2025

support for the DeepSeek model by enabling streaming response parsing #5592

Merged

minleminzui mentioned this pull request Apr 28, 2025

[Bug] MTP conflicts with structural tag #5804

Open

5 tasks

Frank-Jie mentioned this pull request Apr 29, 2025

Fix: Resolve issue #5814 by adjusting function call prompt order in messages #5882

Open

CatherineSue reviewed Apr 29, 2025

View reviewed changes

CatherineSue mentioned this pull request Apr 30, 2025

feat: Refactor DeepSeekV3 function call #5908

Merged

6 tasks

add function call parser for DeepSeek V3 #5224

add function call parser for DeepSeek V3 #5224

Uh oh!

Conversation

finger92 commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

update on Apr 10

Uh oh!

minleminzui commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

feng397 commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

feng397 commented Apr 11, 2025

Uh oh!

moqimoqidea commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

minleminzui commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minleminzui commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minleminzui commented Apr 11, 2025

Uh oh!

finger92 commented Apr 11, 2025

Uh oh!

minleminzui commented Apr 11, 2025

Uh oh!

Uh oh!

finger92 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Apr 17, 2025

Uh oh!

minleminzui Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Apr 18, 2025

Uh oh!

finger92 commented Apr 18, 2025

Uh oh!

zhaochenyang20 commented Apr 18, 2025

Uh oh!

finger92 commented Apr 18, 2025

Uh oh!

minleminzui commented Apr 18, 2025

Uh oh!

Uh oh!

YCG09 commented Apr 23, 2025

Uh oh!

minleminzui commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevewein commented Apr 28, 2025

for example

Uh oh!

stevewein commented Apr 28, 2025

Uh oh!

stevewein commented Apr 28, 2025

Uh oh!

lambert0312 commented Apr 28, 2025

Uh oh!

ForcewithMe66 commented Apr 28, 2025

Uh oh!

finger92 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finger92 commented Apr 10, 2025 •

edited

Loading

minleminzui commented Apr 11, 2025 •

edited

Loading

minleminzui commented Apr 11, 2025 •

edited

Loading

finger92 commented Apr 11, 2025 •

edited

Loading

finger92 commented Apr 17, 2025 •

edited

Loading

minleminzui commented Apr 23, 2025 •

edited

Loading

finger92 commented Apr 29, 2025 •

edited

Loading