Skip to content

Created GRPOTrainerWithEval subclass for different evaluation reward functions #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: working-grpo-2025-03-12
Choose a base branch
from

Conversation

jamesbraza
Copy link

This PR creates a GRPOTrainer subclass GRPOTrainerWithEval that adds support for optional eval_reward_processing_classes.

It should be backwards compatible with GRPOTrainer.

The only caveat here is I didn't comprehensively think about args.reward_weights.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces a new subclass, GRPOTrainerWithEval, which extends the GRPOTrainer functionality to support evaluation reward functions while maintaining backward compatibility.

  • New GRPOTrainerWithEval subclass accepts separate evaluation reward functions and processing classes.
  • Configuration handling is unified through the use of an instance attribute (_model_init_kwargs) and a dedicated helper method (_make_reward_processing_classes).
  • The diff adds strict checking in zip calls to enforce matching lengths of reward functions and processing classes.

Reviewed Changes

File Description
trl/trainer/grpo_trainer.py Introduces GRPOTrainerWithEval and refactors reward processing and model init kwargs

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@jamesbraza jamesbraza changed the base branch from working-grpo-2025-03-10 to working-grpo-2025-03-11 March 11, 2025 19:22
@jamesbraza jamesbraza force-pushed the working-grpo-2025-03-11 branch from 8a73624 to d9c185a Compare March 11, 2025 23:19
@jamesbraza jamesbraza changed the base branch from working-grpo-2025-03-11 to working-grpo-2025-03-12 March 12, 2025 18:37
@shirinyamani
Copy link

Have you tested this against the new multi-task reward_func setup?
Cuz the restrict=True will not work with the current setup which allows reward_funcs to return None

@jamesbraza
Copy link
Author

Have you tested this against the new multi-task reward_func setup? Cuz the restrict=True will not work with the current setup which allows reward_funcs to return None

Hi @shirinyamani thanks for the comment, no we stopped rebasing atop trl's main because we were getting broken too often.

Perhaps if we rebased for newer features in trl, splitting out the eval reward function (what this PR does) would have to change due to this restrict=True thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants