-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support #7412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uhuang/dynamic-lora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lifuhuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request represents the foundational first step towards enabling dynamic loading and unloading of LoRA adapters. I've focused on fundamentally restructuring the LoRAManager
and LoRAMemoryPool
classes to decouple their static initialization from the dynamic management of LoRA adapter states. This allows for incremental updates to the system's LoRA configurations and memory buffers, paving the way for future APIs that will expose dynamic LoRA operations to users, ultimately enhancing system flexibility and resource utilization.
Highlights
- LoRAManager Refactoring: I've refactored the
LoRAManager
class to clearly separate its mutable and immutable states. This involved removing the initiallora_paths
parameter from its constructor and introducing a dedicatedinit_state
method for mutable components. - Dynamic LoRA Operations: New
load_lora_adapters
andunload_lora_adapters
methods have been added toLoRAManager
. These methods enable dynamic addition and removal of LoRA models, allowing for incremental updates to the system's LoRA configurations. - Incremental State Updates: A central
update_state_from_configs
method has been introduced inLoRAManager
. This orchestrator intelligently updates all relevant internal states, including LoRA adapters in CPU memory, weight names, LoRA modules, and memory buffers, whenever LoRA configurations change. - LoRAMemoryPool Adaptations: The
LoRAMemoryPool
has been modified to support dynamic buffer allocation. It no longer holdsmax_lora_dim
orlora_modules
as static instance variables, receiving them as arguments when needed. Itsinit_buffers
method now incrementally allocates space only for newly encountered weight names.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request refactors the LoRAManager and LoRAMemoryPool to support dynamic LoRA loading. The changes involve separating mutable and immutable states, decoupling updates from initialization, and supporting incremental updates. The code includes TODO comments that suggest further improvements and optimizations.
# Please note that the following update operations are "monotonic" by design, meaning that we update | ||
# multiple places to support the new weight names when the first adapter targeting such weight names | ||
# is loaded. However, we never "rollback" the support (e.g., convert LoRA layer back to base layer) | ||
# even if the associated adapters are unloaded later for both simplicity and practicality reasons: the | ||
# list of LoRA weight names is expected to be extremely finite and stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lifu, awesome work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…namic LoRA loading support (sgl-project#7412)
…namic LoRA loading support (sgl-project#7412)
…namic LoRA loading support (sgl-project#7412)
Motivation
This is the first PR for supporting #2686.
Due to the complexity of the change, I plan to introduce the change in 2-3 pull requests to simplify code reviews, verification, and conflict resolution. At high level, they are expected to happen in following steps:
This PR maps to the step 1 above.
Modifications
LoRAManager
to simply mutable state management.LoRAManager
andLoRAMemoryPool
, such that they can be called multiple times.LoRAManager
andLoRAMemoryPool
supports incremental update to avoid excessive perf overheads during loading/unloading, states of loaded adapters can be reused across loads/unloads.Benchmark
I ran MMMU benchmark as a sanity check to demonstrate functional equivalency (difference is randomization error):
Checklist