Skip to content

Fix incorrect LoRA weight loading for fused gate_up_proj #6734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 31, 2025

Conversation

lifuhuang
Copy link
Collaborator

@lifuhuang lifuhuang commented May 29, 2025

Motivation

During testing, I identified two bugs introduced in my previous PR for supporting phi-4-mm.

  1. (Major impact) I introduced fused LoRA weight support in my prev PR, however, I did not correctly handle gate_up_proj shape. This problem was not caught earlier because pytorch implicitly handled shape mismatch through broadcast.

  2. (Minor impact) I had an incorrect understanding of the seqlens calculation for Idefics embedding. When tgt_sizes is absent, input_embeds should not have been masked. Reference: https://github.com/vllm-project/vllm/blob/fd7bb88d72ba721d6eb4f9d34198ad930c36c177/vllm/model_executor/models/idefics2_vision_model.py#L34

MMMU score for phi4 mm increased from 47 to 53.1 after these two fixes, consistent with vLLM (difference within random error). Both vLLM and SGLang are slightly lower than the benchmark reported in the original paper, but the difference might be due to different sampling params or benchmark script.

Framework MMMU
SGLang 53.1
vLLM 52.5
original paper 55

image

Modification

  1. Reshape gate_up_proj to conform to the current convention.
  2. Correct the seqlens calculation for Idefics embedding.

Checklist

gemini-code-assist[bot]

This comment was marked as outdated.

gemini-code-assist[bot]

This comment was marked as outdated.

@Fridge003

This comment was marked as resolved.

@lifuhuang lifuhuang requested a review from Fridge003 as a code owner May 31, 2025 08:06
@lifuhuang lifuhuang changed the title Fix incorrect seqlens calculation for phi4mm. Fix incorrect LoRA weight loading for fused gate_up_proj May 31, 2025
@lifuhuang
Copy link
Collaborator Author

lifuhuang commented May 31, 2025

Hi @lifuhuang , is there any accuracy improvement with this PR?

Updated description. MMMU_val for phi4mm increased from 47 to 53, mostly in par with the official benchmark (55). The majority of the gain was coming from the bug fix for LoRA shape. Thank you for catching it in my last PR :)

@lifuhuang lifuhuang requested a review from Fridge003 May 31, 2025 18:20
Copy link
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lifuhuang lifuhuang mentioned this pull request May 28, 2025
7 tasks
@zhyncs zhyncs merged commit 094fbda into main May 31, 2025
62 of 76 checks passed
@zhyncs zhyncs deleted the lifuhuang/fix-seqlen branch May 31, 2025 20:41
Edenzzzz pushed a commit to Edenzzzz/sglang that referenced this pull request Jun 2, 2025
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants