Calculate advantage in Explorer#148
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've been working on enhancing how advantages are calculated within the Explorer component. My main focus has been on introducing a new GroupAdvantageStrategy that allows for advantage calculations on grouped experiences. To support this, I've refactored existing advantage functions (GRPO and OPMD) to be compatible with this new strategy and updated the core AddStrategy interface to provide more detailed metrics. Additionally, I've made a small but important improvement to how vLLM versions are handled to increase robustness.
Highlights
- New Group Advantage Strategy: I've introduced a new abstract base class,
GroupAdvantageStrategy, which provides a structured way to calculate advantages for groups of experiences. This class defines abstract methods for grouping experiences and calculating group-specific advantages. - Concrete Implementations for Grouped Advantages: I've added two concrete implementations of
GroupAdvantageStrategy:GRPOAddStrategyandOPMDAddStrategy. These new strategies leverage the existingGRPOAdvantageFnandOPMDAdvantageFnto calculate advantages within grouped experiences. - Refactored Advantage Functions: I've refactored the
GRPOAdvantageFnandOPMDAdvantageFnto be more flexible. They now accept aList[Experience]as input and return aTuplecontaining the modified experiences and a dictionary of metrics, making them compatible with the newAddStrategyinterface. - Updated AddStrategy Interface: The
AddStrategyinterface has been updated to include metrics in its return type. Theaddmethod now returns aTupleof the number of experiences added and a dictionary of metrics, providing more comprehensive feedback on the addition process. - Improved vLLM Version Handling: I've improved the handling of vLLM versions by introducing a new
get_vllm_versionfunction. This function robustly parses the vLLM version, including a fallback for self-compiled versions that might not have a standard version string, ensuring better compatibility and error handling.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a GroupAdvantageStrategy to calculate advantages on the Explorer side, along with concrete implementations for GRPO and OPMD. It also refactors existing advantage functions to be compatible with the new strategy. The changes are a good step towards unifying advantage calculations. I've found a few critical issues and a high-severity issue that should be addressed.
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the advantage calculation system to move advantage computation from the trainer to the explorer side. The key changes involve introducing new GroupAdvantageAddStrategy classes and simplifying the configuration by removing algorithm-specific advantage functions.
- Introduces
GroupAdvantageAddStrategywith concrete implementations for GRPO and OPMD algorithms - Moves advantage calculation logic from trainer to explorer's add strategy
- Consolidates utility functions by moving
to_data_protoandcompute_data_metricsfrom converter to utils module
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| trinity/trainer/verl_trainer.py | Updates imports and removes advantage-related conditional logic |
| trinity/trainer/verl/utils.py | New utility module containing data conversion and metrics computation functions |
| trinity/trainer/verl/converter.py | Deleted file with functions moved to utils module |
| trinity/trainer/trainer.py | Removes monitor close call in shutdown method |
| trinity/explorer/explorer.py | Updates add strategy call to handle returned metrics |
| trinity/common/models/api/vllm_patch.py | Refactors VLLM version handling with error handling for invalid versions |
| trinity/cli/launcher.py | Updates launcher output messages formatting |
| trinity/algorithm/algorithm.py | Changes algorithm defaults to use new add_strategy configuration |
| trinity/algorithm/add_strategy/add_strategy.py | Major refactor introducing GroupAdvantageStrategy and concrete implementations |
| tests/trainer/trainer_test.py | Updates test configuration to use new add_strategy parameter |
|
/unittest-all |
|
/unittest-all |
Summary
Tests
Github Test Reporter by CTRF 💚 |
yanxi-chen
left a comment
There was a problem hiding this comment.
Please see the inline comments, otherwise lgtm
Description
GroupAdvantageAddStrategyto calculate advantages in Explorer sideAdvantageFnto be compatible withAddStrategy.AdvantageFnChecklist
Please check the following items before code is ready to be reviewed.