-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix incorrect LoRA weight loading for fused gate_up_proj #6734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Updated description. MMMU_val for phi4mm increased from 47 to 53, mostly in par with the official benchmark (55). The majority of the gain was coming from the bug fix for LoRA shape. Thank you for catching it in my last PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
During testing, I identified two bugs introduced in my previous PR for supporting phi-4-mm.
(Major impact) I introduced fused LoRA weight support in my prev PR, however, I did not correctly handle gate_up_proj shape. This problem was not caught earlier because pytorch implicitly handled shape mismatch through broadcast.
(Minor impact) I had an incorrect understanding of the seqlens calculation for Idefics embedding. When tgt_sizes is absent, input_embeds should not have been masked. Reference: https://github.com/vllm-project/vllm/blob/fd7bb88d72ba721d6eb4f9d34198ad930c36c177/vllm/model_executor/models/idefics2_vision_model.py#L34
MMMU score for phi4 mm increased from 47 to 53.1 after these two fixes, consistent with vLLM (difference within random error). Both vLLM and SGLang are slightly lower than the benchmark reported in the original paper, but the difference might be due to different sampling params or benchmark script.
Modification
Checklist