PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

News

[2025/06/27] 🎉 Code released
[2025/06/13] 🎉 HomePage released

Installation

This repository is based on verl commit 81a15ed7 (2025/04/03) and requires FSDP with vLLM>=0.8.2. Please refer to verl installation for setup instructions. Additionally, install Math-Verify as the verifier: pip install math-verify

Quick Start

We provide training scripts for PAG and baseline methods including SCoRe and Direct_MultiTurn:

PAG: bash quick_start/qwen1p5b_pag.sh
SCoRe: bash quick_start/qwen1p5b_SCoRe.sh
Direct_MultiTurn: bash quick_start/qwen1p5b_multiturn.sh

The evaluation pipeline follows the same procedure as training, please refer to quick_start/evaluation.sh for more details.

For debugging purposes, we provide two multi-turn test scripts:

tests/multi_turn/run_vllm_spmd_pag_rollout.py
tests/multi_turn/run_vllm_spmd_direct_multiturn.py

If you encounter CUDA errors during debugging, try commenting out self.inference_engine.sleep(level=1) in:

verl/workers/rollout/vllm_rollout/vllm_pag_rollout_spmd.py
verl/workers/rollout/vllm_rollout/vllm_multiturn_rollout_spmd.py

Note that this is only for debugging purposes.

Citation

If you find this project helpful, please cite:

@article{jiang2025pag,
  title={PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier},
  author={Jiang, Yuhua and Xiong, Yuwen and Yuan, Yufeng and Xin, Chao and Xu, Wenyuan and Yue, Yu and Zhao, Qianchuan and Yan, Lin},
  journal={arXiv preprint arXiv:2506.10406},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
datasets		datasets
docker		docker
quick_start		quick_start
scripts		scripts
tests/multi_turn		tests/multi_turn
verl		verl
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

News

Installation

Quick Start

Citation

About

Uh oh!

Releases

Packages

Languages

License

Jackory/Policy-As-GenVerifier

Folders and files

Latest commit

History

Repository files navigation

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

News

Installation

Quick Start

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages