Skip to content

Jackory/Policy-As-GenVerifier

Repository files navigation

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Paper HomePage

News

  • [2025/06/27] 🎉 Code released
  • [2025/06/13] 🎉 HomePage released

Installation

This repository is based on verl commit 81a15ed7 (2025/04/03) and requires FSDP with vLLM>=0.8.2. Please refer to verl installation for setup instructions. Additionally, install Math-Verify as the verifier: pip install math-verify

Quick Start

We provide training scripts for PAG and baseline methods including SCoRe and Direct_MultiTurn:

  • PAG: bash quick_start/qwen1p5b_pag.sh
  • SCoRe: bash quick_start/qwen1p5b_SCoRe.sh
  • Direct_MultiTurn: bash quick_start/qwen1p5b_multiturn.sh

The evaluation pipeline follows the same procedure as training, please refer to quick_start/evaluation.sh for more details.

For debugging purposes, we provide two multi-turn test scripts:

  • tests/multi_turn/run_vllm_spmd_pag_rollout.py
  • tests/multi_turn/run_vllm_spmd_direct_multiturn.py

If you encounter CUDA errors during debugging, try commenting out self.inference_engine.sleep(level=1) in:

  • verl/workers/rollout/vllm_rollout/vllm_pag_rollout_spmd.py
  • verl/workers/rollout/vllm_rollout/vllm_multiturn_rollout_spmd.py

Note that this is only for debugging purposes.

Citation

If you find this project helpful, please cite:

@article{jiang2025pag,
  title={PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier},
  author={Jiang, Yuhua and Xiong, Yuwen and Yuan, Yufeng and Xin, Chao and Xu, Wenyuan and Yue, Yu and Zhao, Qianchuan and Yan, Lin},
  journal={arXiv preprint arXiv:2506.10406},
  year={2025}
}

About

Implementation of "PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages