-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Refactor LoRA handling to support adapter tensors in fused format #6585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Also we need a test for Phi4MM model, which can be put under |
Sounds good. Thank you for the suggestion! Currently I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
However, there are some models (e.g., Phi4MM whose weight tensors come in pre-fused format, which is not supported as-is by SGL today:
.
Modifications
map_lora_module_name
that right now only serves as a "filter" to ensure LoRAManager does not incorrectly maps LoRA weights to unwanted modules (e.g., vision towers).Example
In my local branch, I can verify that Phi4MM LoRA can be loaded successfully and observed a significant increase in MMMU score from 0.38 to 0.472.
It's worth noting that, I noticed that the MMMU is still lower than what's claimed by the author (0.55). However, it's difficult to conclude the source of the discrepancy, as I am seeing similar situation for exisitng non-LoRA models as well. The root cause could be one of the following: (1) incorrect model implementation (2) issues with the benchmark script (3) paper authors had a different benchmarking setup. I will add it to my follow-up list.
Checklist