Skip to content

Refactor LoRA handling to support adapter tensors in fused format #6585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 27, 2025

Conversation

lifuhuang
Copy link
Collaborator

@lifuhuang lifuhuang commented May 25, 2025

Motivation

  1. Currently SGL expects LoRA weights to come in certain formats (e.g., q_proj, k_proj, v_proj) and conducts some stacking during loading time to meet expectation of the lora backends.

However, there are some models (e.g., Phi4MM whose weight tensors come in pre-fused format, which is not supported as-is by SGL today:
image.

  1. The current LoRA impl in SGL uses layer name and operation name (e.g., "qkv_proj") to uniquely identify LoRA adapters, which works perfectly fine for conventional LLM. However, for model architectures that have more than one attention stack (e.g., VLM), we need more accurate mapping between LoRA weights and the base model modules.

Modifications

  1. Generalize the preprocessing function from stacking-only to "normalization" at both directions (stacking, splitting, replicating, etc.) based on the initial tensor shape.
  2. To limit the scope of this PR, I did not fully implement the accurate mapping, but only introduced a map_lora_module_name that right now only serves as a "filter" to ensure LoRAManager does not incorrectly maps LoRA weights to unwanted modules (e.g., vision towers).

Example

In my local branch, I can verify that Phi4MM LoRA can be loaded successfully and observed a significant increase in MMMU score from 0.38 to 0.472.

It's worth noting that, I noticed that the MMMU is still lower than what's claimed by the author (0.55). However, it's difficult to conclude the source of the discrepancy, as I am seeing similar situation for exisitng non-LoRA models as well. The root cause could be one of the following: (1) incorrect model implementation (2) issues with the benchmark script (3) paper authors had a different benchmarking setup. I will add it to my follow-up list.

image

Checklist

@Fridge003

This comment was marked as resolved.

@lifuhuang lifuhuang requested a review from CatherineSue as a code owner May 26, 2025 04:49
@lifuhuang lifuhuang requested a review from Fridge003 May 26, 2025 06:12
@Fridge003
Copy link
Collaborator

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py.
This test can be added in the future PR.

@lifuhuang
Copy link
Collaborator Author

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py. This test can be added in the future PR.

Sounds good. Thank you for the suggestion! Currently I added TestPhi4MMServer in test_vision_openai_server_b.py. Will take a look into test_vlm_models and add it in a follow-up PR.

@lifuhuang lifuhuang reopened this May 27, 2025
Copy link
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhyncs zhyncs merged commit 477a101 into sgl-project:main May 27, 2025
15 of 21 checks passed
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants