-
Notifications
You must be signed in to change notification settings - Fork 645
Fixing DoRA docs, adding to mem opt tutorial #1918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
73ee454
adding docs
SalmanMohammadi 1c3cc39
whoops
SalmanMohammadi 9bc6350
whoops2
SalmanMohammadi 847bee3
fixing one more thing
SalmanMohammadi 318500f
missed one more thing
SalmanMohammadi b208c94
whoops...aroonie?
SalmanMohammadi 0448ed7
OnElast thing
SalmanMohammadi 9c4f0bb
removing nightly ref
SalmanMohammadi 0d76a65
cmon chief
SalmanMohammadi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ To make things easy, we've summarized these components in the following table: | |
":ref:`glossary_opt_in_bwd`", "Helps reduce memory usage when using stateful optimizers, particularly when full-finetuning large models with high gradient memory usage. This is not compatible with ``gradient_accumulation_steps``, so training may slow down due to reduced model throughput." | ||
":ref:`glossary_lora`", "When you want to significantly reduce the number of trainable parameters, saving gradient and optimizer memory during training, and significantly speeding up training." | ||
":ref:`glossary_qlora`", "When you need even more memory savings than LoRA, at the potential cost of some training speed. Useful for very large models or limited hardware." | ||
":ref:`glossary_dora`", "Like LoRA, DoRA can provide significant memory savings and training speed-ups. DoRA may improve performance over LoRA, particularly when using small rank updates." | ||
|
||
|
||
.. note:: | ||
|
@@ -108,7 +109,7 @@ checkpointing, where all activations will either be recomputed later in the back | |
|
||
To enable activation offloading, use the ``enable_activation_offloading`` config entry or flag | ||
in our lora finetuning single device recipe, e.g. ``enable_activation_offloading=True``. To allow | ||
usage of streams, make sure you are on a torch version later than PyTorch 2.5.0.dev20240907. | ||
usage of streams, make sure you are on a torch version later than PyTorch 2.5.0. | ||
|
||
.. _glossary_grad_accm: | ||
|
||
|
@@ -278,6 +279,7 @@ These are all specified under the ``model`` flag or config entry, i.e: | |
.. code-block:: yaml | ||
|
||
model: | ||
_component_: torchtune.models.llama3.lora_llama3_8b | ||
apply_lora_to_mlp: True | ||
model.lora_attn_modules: ["q_proj", "k_proj", "v_proj"] | ||
|
||
|
@@ -292,7 +294,24 @@ Secondly, parameters which control the scale of the impact of LoRA on the model: | |
to your specific use case. Typically, one jointly changes ``lora_rank`` and ``lora_alpha`` together, where ``lora_alpha ~= 2*lora_rank``. | ||
* ``lora_dropout`` introduces dropout in the LoRA layers to help regularize training. We default to 0.0 for all of our models. | ||
|
||
As above, these parameters are also specified under the ``model`` flag or config entry. | ||
As above, these parameters are also specified under the ``model`` flag or config entry: | ||
|
||
.. code-block:: bash | ||
|
||
tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ | ||
model.apply_lora_to_mlp=True \ | ||
model.lora_attn_modules=["q_proj","k_proj","v_proj"] \ | ||
model.lora_rank=32 \ | ||
model.lora_alpha=64 | ||
|
||
.. code-block:: yaml | ||
|
||
model: | ||
_component_: torchtune.models.llama3.lora_llama3_8b | ||
apply_lora_to_mlp: True | ||
lora_attn_modules: ["q_proj", "k_proj", "v_proj"] | ||
lora_rank: 32 | ||
lora_alpha: 64 | ||
|
||
.. note:: | ||
|
||
|
@@ -323,18 +342,98 @@ You can finetune using QLoRA with any of our LoRA recipes, i.e. recipes with the | |
QLoRA-enabled model builders, which we support for all our models, and also use the ``qlora_`` prefix, e.g. | ||
the :func:`torchtune.models.llama3.llama3_8b` model has a corresponding :func:`torchtune.models.llama3.qlora_llama3_8b`. | ||
We aim to provide a comprehensive set of configurations to allow you to get started with training with QLoRA quickly, | ||
just specify any config with ``_qlora`` in its name, e.g: | ||
just specify any config with ``_qlora`` in its name. | ||
|
||
All the rest of the LoRA parameters remain the same for QLoRA - check out the section above on :ref:`LoRA <glossary_lora>` | ||
to see how to configure these parameters. | ||
|
||
To configure from the command line: | ||
|
||
.. code-block:: bash | ||
|
||
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device | ||
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device \ | ||
model.apply_lora_to_mlp=True \ | ||
model.lora_attn_modules=["q_proj","k_proj","v_proj"] \ | ||
model.lora_rank=32 \ | ||
model.lora_alpha=64 | ||
|
||
|
||
All the rest of the LoRA parameters remain the same for QLoRA - check out the section above on :ref:`LoRA <glossary_lora>` | ||
to see how to configure. | ||
or, by modifying a config: | ||
|
||
.. code-block:: yaml | ||
|
||
model: | ||
_component_: torchtune.models.qlora_llama3_8b | ||
apply_lora_to_mlp: True | ||
lora_attn_modules: ["q_proj", "k_proj", "v_proj"] | ||
lora_rank: 32 | ||
lora_alpha: 64 | ||
|
||
.. _glossary_dora: | ||
|
||
Weight-Decomposed Low-Rank Adaptation (DoRA) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
*What's going on here?* | ||
|
||
`DoRA <https://arxiv.org/abs/2402.09353>`_ is another PEFT technique which builds on-top of LoRA by | ||
further decomposing the pre-trained weights into two components: magnitude and direction. The magnitude component | ||
is a scalar vector that adjusts the scale, while the direction component corresponds to the original LoRA decomposition and | ||
updates the orientation of weights. | ||
|
||
DoRA adds a small overhead to LoRA training due to the addition of the magnitude parameter, but it has been shown to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perf or memory overhead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not 100% but there's an added parameter and extra computation so I'd say both |
||
improve the performance of LoRA, particularly at low ranks. | ||
|
||
*Sounds great! How do I use it?* | ||
|
||
Much like LoRA and QLoRA, you can finetune using DoRA with any of our LoRA recipes. We use the same model builders for LoRA | ||
as we do for DoRA, so you can use the ``lora_`` version of any model builder with ``use_dora=True``. For example, to finetune | ||
:func:`torchtune.models.llama3.llama3_8b` with DoRA, you would use :func:`torchtune.models.llama3.lora_llama3_8b` with ``use_dora=True``: | ||
|
||
.. code-block:: bash | ||
|
||
tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ | ||
model.use_dora=True | ||
|
||
.. code-block:: yaml | ||
|
||
model: | ||
_component_: torchtune.models.lora_llama3_8b | ||
use_dora: True | ||
|
||
Since DoRA extends LoRA, the parameters for :ref:`customizing LoRA <glossary_lora>` are identical. You can also quantize the base model weights like in :ref:`glossary_qlora` by using ``quantize=True`` to reap | ||
even more memory savings! | ||
|
||
.. code-block:: bash | ||
|
||
tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ | ||
model.apply_lora_to_mlp=True \ | ||
model.lora_attn_modules=["q_proj","k_proj","v_proj"] \ | ||
model.lora_rank=16 \ | ||
model.lora_alpha=32 \ | ||
model.use_dora=True \ | ||
model.quantize_base=True | ||
|
||
.. code-block:: yaml | ||
|
||
model: | ||
_component_: torchtune.models.lora_llama3_8b | ||
apply_lora_to_mlp: True | ||
lora_attn_modules: ["q_proj", "k_proj", "v_proj"] | ||
lora_rank: 16 | ||
lora_alpha: 32 | ||
use_dora: True | ||
quantize_base: True | ||
|
||
|
||
.. note:: | ||
|
||
Under the hood, we've enabled DoRA by adding the :class:`~torchtune.modules.peft.DoRALinear` module, which we swap | ||
out for :class:`~torchtune.modules.peft.LoRALinear` when ``use_dora=True``. | ||
|
||
.. _glossary_distrib: | ||
|
||
|
||
.. TODO | ||
|
||
.. Distributed | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know when someone would choose dora over lora?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly not sure, according to the paper it's just straight up better