- [2025/06/27] 🎉 Code released
- [2025/06/13] 🎉 HomePage released
This repository is based on verl commit 81a15ed7 (2025/04/03) and requires FSDP with vLLM>=0.8.2. Please refer to verl installation for setup instructions. Additionally, install Math-Verify as the verifier: pip install math-verify
We provide training scripts for PAG and baseline methods including SCoRe and Direct_MultiTurn:
- PAG:
bash quick_start/qwen1p5b_pag.sh
- SCoRe:
bash quick_start/qwen1p5b_SCoRe.sh
- Direct_MultiTurn:
bash quick_start/qwen1p5b_multiturn.sh
The evaluation pipeline follows the same procedure as training, please refer to quick_start/evaluation.sh
for more details.
For debugging purposes, we provide two multi-turn test scripts:
tests/multi_turn/run_vllm_spmd_pag_rollout.py
tests/multi_turn/run_vllm_spmd_direct_multiturn.py
If you encounter CUDA errors during debugging, try commenting out self.inference_engine.sleep(level=1)
in:
verl/workers/rollout/vllm_rollout/vllm_pag_rollout_spmd.py
verl/workers/rollout/vllm_rollout/vllm_multiturn_rollout_spmd.py
Note that this is only for debugging purposes.
If you find this project helpful, please cite:
@article{jiang2025pag,
title={PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier},
author={Jiang, Yuhua and Xiong, Yuwen and Yuan, Yufeng and Xin, Chao and Xu, Wenyuan and Yue, Yu and Zhao, Qianchuan and Yan, Lin},
journal={arXiv preprint arXiv:2506.10406},
year={2025}
}