Skip to content

Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 21, 2025

Conversation

lifuhuang
Copy link
Collaborator

@lifuhuang lifuhuang commented Jun 21, 2025

Motivation

This is the first PR for supporting #2686.

Due to the complexity of the change, I plan to introduce the change in 2-3 pull requests to simplify code reviews, verification, and conflict resolution. At high level, they are expected to happen in following steps:

  1. (This PR) Refactor LoRAManager and LoRAMemoryPool for cleaner separation of mutable and immutable state in a way that's functionally equivalent to the existing code.
  2. Expose loading and unloading APIs in engine/server and add e2e CI tests.
  3. Perf optimization and corner case handling.

This PR maps to the step 1 above.

Modifications

  1. Separate mutable and immutable states in LoRAManager to simply mutable state management.
  2. Decouple updates of mutable states (e.g., adapters, memory buffer) from the initialization of LoRAManager and LoRAMemoryPool, such that they can be called multiple times.
  3. Ensure LoRAManager and LoRAMemoryPool supports incremental update to avoid excessive perf overheads during loading/unloading, states of loaded adapters can be reused across loads/unloads.
  4. Minor clean-ups.

Benchmark

I ran MMMU benchmark as a sanity check to demonstrate functional equivalency (difference is randomization error):

Framework MMMU
This PR 54.7
main 54.4

image

Checklist

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @lifuhuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents the foundational first step towards enabling dynamic loading and unloading of LoRA adapters. I've focused on fundamentally restructuring the LoRAManager and LoRAMemoryPool classes to decouple their static initialization from the dynamic management of LoRA adapter states. This allows for incremental updates to the system's LoRA configurations and memory buffers, paving the way for future APIs that will expose dynamic LoRA operations to users, ultimately enhancing system flexibility and resource utilization.

Highlights

  • LoRAManager Refactoring: I've refactored the LoRAManager class to clearly separate its mutable and immutable states. This involved removing the initial lora_paths parameter from its constructor and introducing a dedicated init_state method for mutable components.
  • Dynamic LoRA Operations: New load_lora_adapters and unload_lora_adapters methods have been added to LoRAManager. These methods enable dynamic addition and removal of LoRA models, allowing for incremental updates to the system's LoRA configurations.
  • Incremental State Updates: A central update_state_from_configs method has been introduced in LoRAManager. This orchestrator intelligently updates all relevant internal states, including LoRA adapters in CPU memory, weight names, LoRA modules, and memory buffers, whenever LoRA configurations change.
  • LoRAMemoryPool Adaptations: The LoRAMemoryPool has been modified to support dynamic buffer allocation. It no longer holds max_lora_dim or lora_modules as static instance variables, receiving them as arguments when needed. Its init_buffers method now incrementally allocates space only for newly encountered weight names.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the LoRAManager and LoRAMemoryPool to support dynamic LoRA loading. The changes involve separating mutable and immutable states, decoupling updates from initialization, and supporting incremental updates. The code includes TODO comments that suggest further improvements and optimizations.

Comment on lines +334 to +338
# Please note that the following update operations are "monotonic" by design, meaning that we update
# multiple places to support the new weight names when the first adapter targeting such weight names
# is loaded. However, we never "rollback" the support (e.g., convert LoRA layer back to base layer)
# even if the associated adapters are unloaded later for both simplicity and practicality reasons: the
# list of LoRA weight names is expected to be extremely finite and stable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is very long and complex. Consider simplifying the explanation or breaking it into smaller, more manageable comments. It might also be helpful to provide a link to a design document or issue that explains the reasoning behind this design choice.

@Fridge003
Copy link
Collaborator

Thanks Lifu, awesome work!

Copy link
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fridge003 Fridge003 mentioned this pull request Jun 10, 2025
67 tasks
@zhyncs zhyncs merged commit 1998ce4 into main Jun 21, 2025
74 of 98 checks passed
@zhyncs zhyncs deleted the lifuhuang/dynamic-lora branch June 21, 2025 23:09
whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Jun 24, 2025
yilian49 pushed a commit to yilian49/sglang that referenced this pull request Jun 24, 2025
whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants