Created `GRPOTrainerWithEval` subclass for different evaluation reward functions #9

jamesbraza · 2025-03-10T20:33:19Z

This PR creates a GRPOTrainer subclass GRPOTrainerWithEval that adds support for optional eval_reward_processing_classes.

It should be backwards compatible with GRPOTrainer.

The only caveat here is I didn't comprehensively think about args.reward_weights.

Copilot

PR Overview

This PR introduces a new subclass, GRPOTrainerWithEval, which extends the GRPOTrainer functionality to support evaluation reward functions while maintaining backward compatibility.

New GRPOTrainerWithEval subclass accepts separate evaluation reward functions and processing classes.
Configuration handling is unified through the use of an instance attribute (_model_init_kwargs) and a dedicated helper method (_make_reward_processing_classes).
The diff adds strict checking in zip calls to enforce matching lengths of reward functions and processing classes.

Reviewed Changes

File	Description
trl/trainer/grpo_trainer.py	Introduces GRPOTrainerWithEval and refactors reward processing and model init kwargs

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

trl/trainer/grpo_trainer.py

shirinyamani · 2025-03-28T17:28:24Z

Have you tested this against the new multi-task reward_func setup?
Cuz the restrict=True will not work with the current setup which allows reward_funcs to return None

jamesbraza · 2025-03-28T17:39:44Z

Have you tested this against the new multi-task reward_func setup? Cuz the restrict=True will not work with the current setup which allows reward_funcs to return None

Hi @shirinyamani thanks for the comment, no we stopped rebasing atop trl's main because we were getting broken too often.

Perhaps if we rebased for newer features in trl, splitting out the eval reward function (what this PR does) would have to change due to this restrict=True thing.

jamesbraza added the enhancement New feature or request label Mar 10, 2025

jamesbraza requested a review from Copilot March 10, 2025 20:33

jamesbraza self-assigned this Mar 10, 2025

jamesbraza mentioned this pull request Mar 10, 2025

Created GRPOTrainerWithEval subclass for different evaluation reward functions #8

Closed

Copilot AI reviewed Mar 10, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

jamesbraza force-pushed the grpo-with-eval branch from bd32f49 to 646cd14 Compare March 10, 2025 20:43

jamesbraza changed the base branch from working-grpo-2025-03-10 to working-grpo-2025-03-11 March 11, 2025 19:22

jamesbraza force-pushed the grpo-with-eval branch from 646cd14 to d178296 Compare March 11, 2025 19:24

jamesbraza force-pushed the working-grpo-2025-03-11 branch from 8a73624 to d9c185a Compare March 11, 2025 23:19

jamesbraza force-pushed the grpo-with-eval branch from d178296 to 05bedba Compare March 11, 2025 23:19

jamesbraza added 3 commits March 12, 2025 11:36

Added strict flag to zip

47c98fc

Decomposed _make_reward_processing_classes method

87e2c61

Created GRPOTrainerWithEval subclasses for adding eval functions

3b1a796

jamesbraza force-pushed the grpo-with-eval branch from 05bedba to 3b1a796 Compare March 12, 2025 18:37

jamesbraza changed the base branch from working-grpo-2025-03-11 to working-grpo-2025-03-12 March 12, 2025 18:37

jamesbraza mentioned this pull request Mar 28, 2025

GRPOTrainer can crash with AttributeError for Callable reward_func.__name__ huggingface/trl#3049

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions #9

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions #9

Uh oh!

jamesbraza commented Mar 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

shirinyamani commented Mar 28, 2025

Uh oh!

jamesbraza commented Mar 28, 2025

Uh oh!

Uh oh!

Created GRPOTrainerWithEval subclass for different evaluation reward functions #9

Are you sure you want to change the base?

Created GRPOTrainerWithEval subclass for different evaluation reward functions #9

Uh oh!

Conversation

jamesbraza commented Mar 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

Uh oh!

shirinyamani commented Mar 28, 2025

Uh oh!

jamesbraza commented Mar 28, 2025

Uh oh!

Uh oh!

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions #9

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions #9