Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

lifuhuang · 2025-06-21T08:29:00Z

Motivation

This is the first PR for supporting #2686.

Due to the complexity of the change, I plan to introduce the change in 2-3 pull requests to simplify code reviews, verification, and conflict resolution. At high level, they are expected to happen in following steps:

(This PR) Refactor LoRAManager and LoRAMemoryPool for cleaner separation of mutable and immutable state in a way that's functionally equivalent to the existing code.
Expose loading and unloading APIs in engine/server and add e2e CI tests.
Perf optimization and corner case handling.

This PR maps to the step 1 above.

Modifications

Separate mutable and immutable states in LoRAManager to simply mutable state management.
Decouple updates of mutable states (e.g., adapters, memory buffer) from the initialization of LoRAManager and LoRAMemoryPool, such that they can be called multiple times.
Ensure LoRAManager and LoRAMemoryPool supports incremental update to avoid excessive perf overheads during loading/unloading, states of loaded adapters can be reused across loads/unloads.
Minor clean-ups.

Benchmark

I ran MMMU benchmark as a sanity check to demonstrate functional equivalency (difference is randomization error):

Framework	MMMU
This PR	54.7
main	54.4

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…ort.

…uhuang/dynamic-lora

gemini-code-assist

Summary of Changes

Hello @lifuhuang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents the foundational first step towards enabling dynamic loading and unloading of LoRA adapters. I've focused on fundamentally restructuring the LoRAManager and LoRAMemoryPool classes to decouple their static initialization from the dynamic management of LoRA adapter states. This allows for incremental updates to the system's LoRA configurations and memory buffers, paving the way for future APIs that will expose dynamic LoRA operations to users, ultimately enhancing system flexibility and resource utilization.

Highlights

LoRAManager Refactoring: I've refactored the LoRAManager class to clearly separate its mutable and immutable states. This involved removing the initial lora_paths parameter from its constructor and introducing a dedicated init_state method for mutable components.
Dynamic LoRA Operations: New load_lora_adapters and unload_lora_adapters methods have been added to LoRAManager. These methods enable dynamic addition and removal of LoRA models, allowing for incremental updates to the system's LoRA configurations.
Incremental State Updates: A central update_state_from_configs method has been introduced in LoRAManager. This orchestrator intelligently updates all relevant internal states, including LoRA adapters in CPU memory, weight names, LoRA modules, and memory buffers, whenever LoRA configurations change.
LoRAMemoryPool Adaptations: The LoRAMemoryPool has been modified to support dynamic buffer allocation. It no longer holds max_lora_dim or lora_modules as static instance variables, receiving them as arguments when needed. Its init_buffers method now incrementally allocates space only for newly encountered weight names.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request refactors the LoRAManager and LoRAMemoryPool to support dynamic LoRA loading. The changes involve separating mutable and immutable states, decoupling updates from initialization, and supporting incremental updates. The code includes TODO comments that suggest further improvements and optimizations.

python/sglang/srt/lora/lora_manager.py

gemini-code-assist · 2025-06-21T08:31:03Z

python/sglang/srt/lora/lora_manager.py

+        # Please note that the following update operations are "monotonic" by design, meaning that we update
+        # multiple places to support the new weight names when the first adapter targeting such weight names
+        # is loaded. However, we never "rollback" the support (e.g., convert LoRA layer back to base layer)
+        # even if the associated adapters are unloaded later for both simplicity and practicality reasons: the
+        # list of LoRA weight names is expected to be extremely finite and stable.


This comment is very long and complex. Consider simplifying the explanation or breaking it into smaller, more manageable comments. It might also be helpful to provide a link to a design document or issue that explains the reasoning behind this design choice.

python/sglang/srt/model_executor/model_runner.py

Fridge003 · 2025-06-21T17:43:24Z

Thanks Lifu, awesome work!

Fridge003

LGTM

…namic LoRA loading support (sgl-project#7412)

lifuhuang added 7 commits June 21, 2025 01:38

Refactor LoRAManager and LoRAMemoryPool for dynamic LoRA loading supp…

a861dfb

…ort.

Refactor LoRAManager and LoRAMemoryPool for dynamic LoRA loading supp…

25c3844

…ort.

Merge remote-tracking branch 'origin/lifuhuang/dynamic-lora' into lif…

6012d21

…uhuang/dynamic-lora

Checkpoint.

ee25dd6

Merge remote-tracking branch 'origin/main' into lifuhuang/dynamic-lora

fd46ee1

Reset files.

09d6b29

Update.

66fbf1d

lifuhuang requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and Fridge003 as code owners June 21, 2025 08:29

gemini-code-assist bot reviewed Jun 21, 2025

View reviewed changes

Merge branch 'main' into lifuhuang/dynamic-lora

dc16f49

gemini-code-assist bot reviewed Jun 21, 2025

View reviewed changes

Merge branch 'main' into lifuhuang/dynamic-lora

8469e3a

Fridge003 approved these changes Jun 21, 2025

View reviewed changes

Fridge003 mentioned this pull request Jun 10, 2025

Development Roadmap (2025 H1) #4042

Open

67 tasks

zhyncs merged commit 1998ce4 into main Jun 21, 2025
74 of 98 checks passed

zhyncs deleted the lifuhuang/dynamic-lora branch June 21, 2025 23:09

Fridge003 mentioned this pull request Jun 8, 2025

[Feature] Lora Development Roadmap #2929

Open

16 tasks

lifuhuang mentioned this pull request Jun 22, 2025

Support dynamic LoRA loading / unloading in engine/server API #7446

Merged

6 tasks

whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Jun 24, 2025

Refactor LoRAManager and LoRAMemoryPool state management logic for dy…

532ec10

…namic LoRA loading support (sgl-project#7412)

yilian49 pushed a commit to yilian49/sglang that referenced this pull request Jun 24, 2025

Refactor LoRAManager and LoRAMemoryPool state management logic for dy…

543ee0c

…namic LoRA loading support (sgl-project#7412)

whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Jun 24, 2025

Refactor LoRAManager and LoRAMemoryPool state management logic for dy…

ca2d455

…namic LoRA loading support (sgl-project#7412)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

Uh oh!

lifuhuang commented Jun 21, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jun 21, 2025

Uh oh!

Uh oh!

Fridge003 commented Jun 21, 2025

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

Uh oh!

Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

Uh oh!

Conversation

lifuhuang commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmark

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fridge003 commented Jun 21, 2025

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lifuhuang commented Jun 21, 2025 •

edited

Loading