-
Notifications
You must be signed in to change notification settings - Fork 46
Fix for Qwen with Yarn #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: giulio98 <[email protected]>
Hi @giulio98 , thanks for opening this PR! 🙂 I think the issue and the way you are addressing it make sense 👍 I left some comments concerning the rerotation logic and some style fixes, let me know what you think ! |
@giulio98 we updated the main, so please don't forget to pull before the next commit :) |
Signed-off-by: giulio98 <[email protected]>
@giulio98 thanks for updating the code! I left a couple of additional comments, after addressing those and the style ( |
Signed-off-by: giulio98 <[email protected]>
Hi @alessiodevoto I run the make style, however I cannot see your latest additional comments. |
@giulio98 forgot to publish the review 🤦 here it is |
Signed-off-by: giulio98 <[email protected]>
@alessiodevoto Thanks for the feedback! I renamed the variables using the convention used in other files: |
@alessiodevoto noticed here: https://github.com/huggingface/transformers/blob/b283d52f7f89d9cf3c77cfef233c4cbf700959ff/src/transformers/models/llama/modeling_llama.py#L98 that i have to multiply the cos and sin by Edit: By multiplying cos and sin by attention scaling I get again a drop in performance (7.47 f1 score on narrativeqa). My thought is: I don't have to apply attention scaling at this stage because was already applied in the prefill and if i do again is like applying the scaling two times, but let me know your thoughts. |
Signed-off-by: giulio98 <[email protected]>
Hi @giulio98 , nice catch, let's see. When we get the keys , they are already rotated with |
Signed-off-by: giulio98 <[email protected]>
Hi @alessiodevoto , thanks! I thought the same, maybe to be sure we should extend the test in The test should:
then compute
Second way (with
Compare |
Signed-off-by: giulio98 <[email protected]>
Signed-off-by: giulio98 <[email protected]>
Hi @alessiodevoto I extended the test to include a case with YaRN as well. As expected, the test passes when I don’t multiply by This confirms our reasoning: since |
LGTM! Thanks for this adding this! |
PR description
Fixes #84
Checklist
git commit -s
mypress_press.py
is in thepresses
directoryMyPress
is in__init__.py
README.md
is updated with a 1 liner about the new press in the Available presses sectiondefault_presses
list intests/default_presses.py