Skip to content

Eladlev/R1-reason-action

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reason-Action Online RL

The goal of this project to demonstrate the effectiveness of reason-action online RL training vs only reasoning.

Background

DeepSeek R1 paper demonstrates the effectiveness of training using a very basic CoT template prompt + an online RL with respect to a simple reward function. In this recent work, they demonstrate that this training paradigm also works well with small models with a small amount of data. We know the advantages of ReACT prompting over simple CoT. The assumption is that using the same recipe of online RL-training with ReACT will perform better than reasoning model or even reasoning model training + inference time ReACT prompting

The code is based on simpleRL-reason

About

Training according to deep-seek R1 recipe with reason-action model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • Other 2.0%