-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: remove unnecessary movement of eval logits to cpu #2824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe update removes the explicit assignment of the "eval_accumulation_steps" parameter in the training arguments within the causal builder module. This parameter, previously set to mirror "gradient_accumulation_steps", is no longer included during the construction of training arguments. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
From reading the trainer evaluation loop, I don't think we need this config. The VRAM consumed by just an eval pass is very small compared to training, so this config would just cause unnecessary gpu->cpu movement. However, it should not be the cause for increased vram usage as initially stated. |
Description
Edit: See #2824 (comment)
There has been reports about eval taking more vram than training. I suspect it's this config which we set to grad accu. Would love some tests for this!
This config does not seem to be running mini-batches as we think?
https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.eval_accumulation_steps
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit