Fix for Qwen with Yarn #85

giulio98 · 2025-06-21T13:20:49Z

PR description

Fixes #84

Checklist

Tests are working (make test)
Code is formatted correctly (make style, on errors try fix with make format)
Copyright header is included
All commits are signed-off using git commit -s
(new press) mypress_press.py is in the presses directory
(new press) MyPress is in __init__.py
(new press) README.md is updated with a 1 liner about the new press in the Available presses section
(new press) new press is in the default_presses list in tests/default_presses.py

Signed-off-by: giulio98 <[email protected]>

kvpress/presses/finch_press.py

evaluation/evaluate.py

alessiodevoto · 2025-06-30T10:08:26Z

Hi @giulio98 , thanks for opening this PR! 🙂 I think the issue and the way you are addressing it make sense 👍 I left some comments concerning the rerotation logic and some style fixes, let me know what you think !

alessiodevoto · 2025-07-02T09:21:17Z

@giulio98 we updated the main, so please don't forget to pull before the next commit :)

Signed-off-by: giulio98 <[email protected]>

alessiodevoto · 2025-07-07T08:55:02Z

@giulio98 thanks for updating the code! I left a couple of additional comments, after addressing those and the style (make style is failing) we'll be able to merge !

Signed-off-by: giulio98 <[email protected]>

giulio98 · 2025-07-07T09:53:31Z

Hi @alessiodevoto I run the make style, however I cannot see your latest additional comments.

kvpress/presses/key_rerotation_press.py

alessiodevoto · 2025-07-07T10:26:03Z

@giulio98 forgot to publish the review 🤦 here it is

Signed-off-by: giulio98 <[email protected]>

giulio98 · 2025-07-07T11:28:40Z

@alessiodevoto Thanks for the feedback! I renamed the variables using the convention used in other files:
B -> bsz
H -> num_key_value_heads
L -> n_kept
D -> d

giulio98 · 2025-07-07T11:39:38Z

@alessiodevoto noticed here: https://github.com/huggingface/transformers/blob/b283d52f7f89d9cf3c77cfef233c4cbf700959ff/src/transformers/models/llama/modeling_llama.py#L98 that i have to multiply the cos and sin by attention_scaling.

Edit: By multiplying cos and sin by attention scaling I get again a drop in performance (7.47 f1 score on narrativeqa).
@alessiodevoto is it correct in this context to multiply by the attention scaling?

My thought is: I don't have to apply attention scaling at this stage because was already applied in the prefill and if i do again is like applying the scaling two times, but let me know your thoughts.

Signed-off-by: giulio98 <[email protected]>

alessiodevoto · 2025-07-07T13:22:14Z

Hi @giulio98 , nice catch, let's see. When we get the keys , they are already rotated with
k_rope = (k * cos * attention_scaling) + (rotate_half(k) * sin * attention_scaling) = ((k * cos) + (rotate_half(k) * sin)) * attention_scaling.
We compute a new sine and cosine that shift the rotation again, new_sin and new_cos. If we apply the scaling again, we have
k_embed = ((k_rope * new_cos) + (rotate_half(k_rope) * new_sin)) * attention_scaling. Given that attention_scaling was already accounted for in k_rope, if we multiply it again we would end up with k_embed = ((k * composed_cos) + (rotate_half(k) * composed_sin)) * attention_scaling^2, so we would be applying the a power of the scaling. Therefore, I think we should remove it!

Signed-off-by: giulio98 <[email protected]>

giulio98 · 2025-07-07T13:42:56Z

Hi @alessiodevoto , thanks! I thought the same, maybe to be sure we should extend the test in tests/presses/test_key_rerotation_press_rope.py to also tests other rope variant (like yarn).

The test should:

create random unrotated_keys of length q_len (e.g 8 ctx length).
define selected_indices of length n_kept (e.g [0, 1, 5, 7]).

then compute rerotated_keys in two ways.
First way (reference)

gather from unrotated_keys using selected_indices.
apply rope to get reference_rotated_keys.

Second way (with KeyRerotationPress)

apply rope on the full unrotated_keys to have rotated_keys.
call KeyRerotationPress (rotate_keys) function passing rotated_keys and selected_indices to obtain obtained_rotated_keys.

Compare reference_rotated_keys with obtained_rotated_keys.

Signed-off-by: giulio98 <[email protected]>

giulio98 · 2025-07-07T15:05:19Z

Hi @alessiodevoto I extended the test to include a case with YaRN as well. As expected, the test passes when I don’t multiply by attention_scaling during the rerotation, and it fails when I do multiply.

This confirms our reasoning: since attention_scaling was already included in k_rope, applying it again in the rerotation would effectively square it (i.e., attention_scaling^2), which is not desired.

alessiodevoto · 2025-07-07T15:39:50Z

LGTM! Thanks for this adding this!

add experimental scripts

30c0f44

Signed-off-by: giulio98 <[email protected]>

alessiodevoto requested changes Jun 30, 2025

View reviewed changes

giulio98 added 2 commits July 4, 2025 14:08

Fix: implemented suggestions

0a2d155

Signed-off-by: giulio98 <[email protected]>

Merge branch 'main' into fix/yarn

f8ef858

giulio98 requested a review from alessiodevoto July 4, 2025 14:30

Fix: style

cfb8ee0

Signed-off-by: giulio98 <[email protected]>

alessiodevoto requested changes Jul 7, 2025

View reviewed changes

kvpress/presses/key_rerotation_press.py Outdated Show resolved Hide resolved

kvpress/presses/key_rerotation_press.py Outdated Show resolved Hide resolved

Renamed variables

0a6e1b1

Signed-off-by: giulio98 <[email protected]>

Multiply by attention scaling in cos and sin

6a278b3

Signed-off-by: giulio98 <[email protected]>

dont multiply by attention scaling in cos and sin

1832a63

Signed-off-by: giulio98 <[email protected]>

giulio98 added 2 commits July 7, 2025 14:53

extended test on rerotation with yarn

2724257

Signed-off-by: giulio98 <[email protected]>

fix: style

50815d4

Signed-off-by: giulio98 <[email protected]>

alessiodevoto approved these changes Jul 7, 2025

View reviewed changes

alessiodevoto merged commit 2770562 into NVIDIA:main Jul 7, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for Qwen with Yarn #85

Fix for Qwen with Yarn #85

Uh oh!

giulio98 commented Jun 21, 2025 •

edited by alessiodevoto

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alessiodevoto commented Jun 30, 2025

Uh oh!

alessiodevoto commented Jul 2, 2025

Uh oh!

alessiodevoto commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

alessiodevoto commented Jul 7, 2025 •

edited

Loading

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025 •

edited

Loading

Uh oh!

alessiodevoto commented Jul 7, 2025 •

edited

Loading

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

alessiodevoto commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Fix for Qwen with Yarn #85

Fix for Qwen with Yarn #85

Uh oh!

Conversation

giulio98 commented Jun 21, 2025 • edited by alessiodevoto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alessiodevoto commented Jun 30, 2025

Uh oh!

alessiodevoto commented Jul 2, 2025

Uh oh!

alessiodevoto commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

alessiodevoto commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alessiodevoto commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

giulio98 commented Jul 7, 2025

Uh oh!

alessiodevoto commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

giulio98 commented Jun 21, 2025 •

edited by alessiodevoto

Loading

alessiodevoto commented Jul 7, 2025 •

edited

Loading

giulio98 commented Jul 7, 2025 •

edited

Loading

alessiodevoto commented Jul 7, 2025 •

edited

Loading