Refactor LoRA handling to support adapter tensors in fused format #6585

lifuhuang · 2025-05-25T07:05:56Z

Motivation

Currently SGL expects LoRA weights to come in certain formats (e.g., q_proj, k_proj, v_proj) and conducts some stacking during loading time to meet expectation of the lora backends.

However, there are some models (e.g., Phi4MM whose weight tensors come in pre-fused format, which is not supported as-is by SGL today:
.

The current LoRA impl in SGL uses layer name and operation name (e.g., "qkv_proj") to uniquely identify LoRA adapters, which works perfectly fine for conventional LLM. However, for model architectures that have more than one attention stack (e.g., VLM), we need more accurate mapping between LoRA weights and the base model modules.

Modifications

Generalize the preprocessing function from stacking-only to "normalization" at both directions (stacking, splitting, replicating, etc.) based on the initial tensor shape.
To limit the scope of this PR, I did not fully implement the accurate mapping, but only introduced a map_lora_module_name that right now only serves as a "filter" to ensure LoRAManager does not incorrectly maps LoRA weights to unwanted modules (e.g., vision towers).

Example

In my local branch, I can verify that Phi4MM LoRA can be loaded successfully and observed a significant increase in MMMU score from 0.38 to 0.472.

It's worth noting that, I noticed that the MMMU is still lower than what's claimed by the author (0.55). However, it's difficult to conclude the source of the discrepancy, as I am seeing similar situation for exisitng non-LoRA models as well. The root cause could be one of the following: (1) incorrect model implementation (2) issues with the benchmark script (3) paper authors had a different benchmarking setup. I will add it to my follow-up list.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

python/sglang/srt/lora/lora.py

python/sglang/srt/lora/utils.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/models/phi4mmvllm.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/openai_api/adapter.py

Fridge003 · 2025-05-26T06:48:27Z

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py.
This test can be added in the future PR.

lifuhuang · 2025-05-26T07:32:26Z

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py. This test can be added in the future PR.

Sounds good. Thank you for the suggestion! Currently I added TestPhi4MMServer in test_vision_openai_server_b.py. Will take a look into test_vlm_models and add it in a follow-up PR.

Fridge003

LGTM

…l-project#6585)

lifuhuang added 10 commits May 25, 2025 06:42

Squash all changes.

bbcbe70

Restore files.

6f24f36

Add TODO tags for follow-ups.

fa70735

Refactor Phi4MMForCausalLM to use composition instead of inheritance.

138f6fe

Fix typo.

9dbec16

Address PR comments.

c9f0155

Fix lint

11e67c3

Checkpoint.

d1770c9

Merge remote-tracking branch 'origin/main' into lora-new

24fd5e5

Revert unexpected changes.

2d8c56a

lifuhuang requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock, ByronHsu, zhaochenyang20 and Fridge003 as code owners May 25, 2025 07:05

Merge branch 'main' into lora-new

6a8fe83

zhyncs assigned Fridge003 and Qiaolin-Yu May 25, 2025

This was referenced May 23, 2025

[Feature] Phi-4-MM support #6544

Open

Support Phi-4 Multi-Modal (text + vision only) #6494

Merged

This comment was marked as resolved.

Sign in to view

Merge remote-tracking branch 'origin/main' into lora-new

2d22f63

Fridge003 requested changes May 26, 2025

View reviewed changes

lifuhuang added 4 commits May 26, 2025 01:15

pre-commit autofix.

f4e0ff9

Address comments.

c4a6f2f

Merge branch 'main' into lora-new

d6a4818

Minor.

a51ac40

lifuhuang requested a review from CatherineSue as a code owner May 26, 2025 04:49

lifuhuang requested a review from Fridge003 May 26, 2025 06:12

lifuhuang added 2 commits May 26, 2025 06:19

Fix incorrect error message.

bd3d31d

Merge branch 'main' into lora-new

9b569e9

Fridge003 reviewed May 26, 2025

View reviewed changes

python/sglang/srt/openai_api/adapter.py Outdated Show resolved Hide resolved

Revert accidental change.

0d90ab3

Merge branch 'main' into lora-new

60919a3

lifuhuang requested a review from Fridge003 May 26, 2025 07:33

lifuhuang closed this May 27, 2025

lifuhuang force-pushed the lora-new branch from 60919a3 to f77da69 Compare May 27, 2025 03:03

lifuhuang added 3 commits May 27, 2025 03:04

Merge remote-tracking branch 'origin/main' into lora-new

2c084c7

Merge remote-tracking branch 'origin/lora-new' into lora-new

fb9de4c

Merge remote-tracking branch 'origin/main' into lora-new

22b330c

lifuhuang reopened this May 27, 2025

Merge branch 'main' into lora-new

2d6f8a0

Fridge003 approved these changes May 27, 2025

View reviewed changes

zhyncs merged commit 477a101 into sgl-project:main May 27, 2025
15 of 21 checks passed

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Refactor LoRA handling to support adapter tensors in fused format (sg…

503415b

…l-project#6585)

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Refactor LoRA handling to support adapter tensors in fused format (sg…

dbc3368

…l-project#6585)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor LoRA handling to support adapter tensors in fused format #6585

Refactor LoRA handling to support adapter tensors in fused format #6585

Uh oh!

lifuhuang commented May 25, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented May 26, 2025

Uh oh!

lifuhuang commented May 26, 2025

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

Uh oh!

Refactor LoRA handling to support adapter tensors in fused format #6585

Refactor LoRA handling to support adapter tensors in fused format #6585

Uh oh!

Conversation

lifuhuang commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Example

Checklist

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented May 26, 2025

Uh oh!

lifuhuang commented May 26, 2025

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lifuhuang commented May 25, 2025 •

edited

Loading