Renaming and Organization of RL algorithms in preparation for Development #83

jdchang1 · 2025-06-05T16:44:20Z

I apologize in advance since this shuffling/renaming will require a bit of updating of yamls and dependencies. My main goal is to make the organization of the code a bit more semantically meaningful and hopefully clearer to onboard. Here is a summary of the main changes

Removed all algorithm specific references like DPO and PPO when we are using these as general purpose abstractions of families of algorithms
Moved algorithm code to an algorithms directory
shuffled some files to more meaningful directories. I.e. buffers.py => data rather than keep only in online
unified between offline, online, and reward_modeling directories to use model_methods.py to implement forward and loss variants.
Split online_rl_loss to policy_loss and critic_loss
Added OnPolicyEnum and Algorithm_Type Enums for loss_type similar to PairwiseOfflineEnum
Made online_rl_loss just return return_dict similar to other algorithm pipelines

MCLI run names for each pipeline:

Reward Training: reward-reorg-aWY7m3
Offline Training: rebel-reorg-nboVUO
Online Training: grpo-reorg-fK2HDm

MLFlow Link

compose_rl/algorithms/offline/model.py

gupta-abhay

looks good - will be good to maybe run a test for dpo / grpo with main and this branch with the appropriate yaml changes to show things look good.

bcui-db

LGTM! Thank you! As mentioned above, let's just test RMs, DPO, and an online RL training flow to make sure things still work.

jdchang1 · 2025-06-14T04:52:52Z

MCLI run names for each pipeline:

Reward Training: reward-reorg-aWY7m3
Offline Training: rebel-reorg-nboVUO
Online Training: grpo-reorg-fK2HDm

MLFlow Link

bcui-db

Discussed offline, and overall LGTM! thank you for this! I'd also want @gupta-abhay or @dakinggg to stamp to just be sure, but from my end LGTM

gupta-abhay · 2025-06-16T19:55:19Z

LGTM! Please update yamls here (and let's figure out a path away from the bwd compatible stuff asap - unless TAO production is hinging on that old setup).

Huge thanks for doing this :)

jdchang1 added 11 commits June 5, 2025 12:39

reorg

2eadf4b

fix scripts import

8e0b986

fix imports in tests

6b0b949

fix reward manager imports

796b890

import fix

4b22915

backwards compat in pyproject

3ee2549

style lint

7a5b596

fix registry test

77d018c

update test names to be aligned with new names

d157289

update online rl loss types to Enum

129c2ee

fix model init test for ppo

9a18867

jdchang1 marked this pull request as ready for review June 5, 2025 18:16

jdchang1 requested review from bcui-db, dakinggg, gupta-abhay and abaheti95 as code owners June 5, 2025 18:16

jdchang1 added 2 commits June 5, 2025 14:26

for backwards compatibility

ecd7119

Merge branch 'main' into reorg

96d2eb4

dakinggg reviewed Jun 5, 2025

View reviewed changes

compose_rl/algorithms/offline/model.py Outdated Show resolved Hide resolved

jdchang1 added 4 commits June 5, 2025 14:37

style fix

9b3a963

unified naming scheme across alg types

094435d

fix model name updates

19007de

naming fix

b1d3973

gupta-abhay reviewed Jun 5, 2025

View reviewed changes

bcui-db reviewed Jun 5, 2025

View reviewed changes

jdchang1 added 5 commits June 13, 2025 13:46

Merge branch 'main' into reorg

bd3ea7e

unified to enums for loss type

a9231c5

fix test

11d38ae

fix test

39c6761

style

2002b6f

jdchang1 added 10 commits June 13, 2025 16:52

organize PG loss

13d87d1

more org

c342130

lint

23e1727

backwards compatibility with orl_eval

78bb625

orl backwards compat

a86d5af

enum compatibility

af98a95

isort

e4d60b2

resolve circular import

80cbded

fixed my dumb mistake

7ab6f27

loss fix

d727f55

jdchang1 requested review from gupta-abhay, dakinggg and bcui-db June 14, 2025 04:57

jdchang1 added 4 commits June 16, 2025 12:07

generation reorg

a2873e1

pre comit

13b7013

add enum to generation utils

2de97d2

remove init

4c4595c

bcui-db approved these changes Jun 16, 2025

View reviewed changes

gupta-abhay approved these changes Jun 16, 2025

View reviewed changes

update yaml

422fbfc

jdchang1 merged commit 6bcfeba into main Jun 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Renaming and Organization of RL algorithms in preparation for Development #83

Renaming and Organization of RL algorithms in preparation for Development #83

Uh oh!

jdchang1 commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

gupta-abhay left a comment

Uh oh!

bcui-db left a comment

Uh oh!

jdchang1 commented Jun 14, 2025

Uh oh!

bcui-db left a comment

Uh oh!

gupta-abhay commented Jun 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Renaming and Organization of RL algorithms in preparation for Development #83

Renaming and Organization of RL algorithms in preparation for Development #83

Uh oh!

Conversation

jdchang1 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gupta-abhay left a comment

Choose a reason for hiding this comment

Uh oh!

bcui-db left a comment

Choose a reason for hiding this comment

Uh oh!

jdchang1 commented Jun 14, 2025

Uh oh!

bcui-db left a comment

Choose a reason for hiding this comment

Uh oh!

gupta-abhay commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdchang1 commented Jun 5, 2025 •

edited

Loading

gupta-abhay commented Jun 16, 2025 •

edited

Loading