Fix incorrect LoRA weight loading for fused gate_up_proj #6734

lifuhuang · 2025-05-29T08:22:36Z

Motivation

During testing, I identified two bugs introduced in my previous PR for supporting phi-4-mm.

(Major impact) I introduced fused LoRA weight support in my prev PR, however, I did not correctly handle gate_up_proj shape. This problem was not caught earlier because pytorch implicitly handled shape mismatch through broadcast.
(Minor impact) I had an incorrect understanding of the seqlens calculation for Idefics embedding. When tgt_sizes is absent, input_embeds should not have been masked. Reference: https://github.com/vllm-project/vllm/blob/fd7bb88d72ba721d6eb4f9d34198ad930c36c177/vllm/model_executor/models/idefics2_vision_model.py#L34

MMMU score for phi4 mm increased from 47 to 53.1 after these two fixes, consistent with vLLM (difference within random error). Both vLLM and SGLang are slightly lower than the benchmark reported in the original paper, but the difference might be due to different sampling params or benchmark script.

Framework	MMMU
SGLang	53.1
vLLM	52.5
original paper	55

Modification

Reshape gate_up_proj to conform to the current convention.
Correct the seqlens calculation for Idefics embedding.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

lifuhuang · 2025-05-31T08:35:28Z

Hi @lifuhuang , is there any accuracy improvement with this PR?

Updated description. MMMU_val for phi4mm increased from 47 to 53, mostly in par with the official benchmark (55). The majority of the gain was coming from the bug fix for LoRA shape. Thank you for catching it in my last PR :)

python/sglang/srt/conversation.py

python/sglang/srt/models/phi4mm.py

Fridge003

LGTM

…#6734)

Fix incorrect seqlens calculation for phi4mm.

140aa95

lifuhuang requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock, ByronHsu and zhaochenyang20 as code owners May 29, 2025 08:22

lifuhuang requested a review from mickqian May 29, 2025 08:22

This comment was marked as outdated.

Sign in to view

zhyncs and others added 2 commits May 29, 2025 08:14

Merge branch 'main' into lifuhuang/fix-seqlen

7113664

Merge branch 'main' into lifuhuang/fix-seqlen

34304c7

This comment was marked as resolved.

Sign in to view

checkpoint.

1792b3b

lifuhuang requested a review from Fridge003 as a code owner May 31, 2025 08:06

Revert risky change to a separate PR.

2e16fe6

lifuhuang changed the title ~~Fix incorrect seqlens calculation for phi4mm.~~ Fix incorrect LoRA weight loading for fused gate_up_proj May 31, 2025

Merge branch 'main' into lifuhuang/fix-seqlen

5fd1add

Fridge003 reviewed May 31, 2025

View reviewed changes

python/sglang/srt/conversation.py Show resolved Hide resolved

python/sglang/srt/models/phi4mm.py Outdated Show resolved Hide resolved

Fix typehints.

bb9d95d

lifuhuang requested a review from Fridge003 May 31, 2025 18:20

Fridge003 approved these changes May 31, 2025

View reviewed changes

lifuhuang mentioned this pull request May 28, 2025

[Feature] Phi-4-MM support #6544

Open

7 tasks

zhyncs approved these changes May 31, 2025

View reviewed changes

zhyncs merged commit 094fbda into main May 31, 2025
62 of 76 checks passed

zhyncs deleted the lifuhuang/fix-seqlen branch May 31, 2025 20:41

Edenzzzz pushed a commit to Edenzzzz/sglang that referenced this pull request Jun 2, 2025

Fix incorrect LoRA weight loading for fused gate_up_proj (sgl-project…

813e7f6

…#6734)

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Fix incorrect LoRA weight loading for fused gate_up_proj (sgl-project…

6341133

…#6734)

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Fix incorrect LoRA weight loading for fused gate_up_proj (sgl-project…

07f23dc

…#6734)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect LoRA weight loading for fused gate_up_proj #6734

Fix incorrect LoRA weight loading for fused gate_up_proj #6734

Uh oh!

lifuhuang commented May 29, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

lifuhuang commented May 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

Uh oh!

Fix incorrect LoRA weight loading for fused gate_up_proj #6734

Fix incorrect LoRA weight loading for fused gate_up_proj #6734

Uh oh!

Conversation

lifuhuang commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

lifuhuang commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lifuhuang commented May 29, 2025 •

edited

Loading

lifuhuang commented May 31, 2025 •

edited

Loading