Skip to content

WangHanLinHenry/SPA-RL-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPA-RL-Agent

arXiv Paper   Homepage

The repository contains the codes for our paper "SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution"

This paper introduces Stepwise Progress Attribution (SPA), a novel reward redistribution framework that provides fine-grained intermediate rewards by decomposing delayed rewards into incremental, per-step contributions, enabling more effective reinforcement learning for complex, multi-step agent tasks (Webshop, ALFWorld and VirtualHome).

✨Key advantages include:

  1. Fine-grained Reward Redistribution: Effectively decomposes delayed rewards into intermediate rewards, reflecting incremental progress toward task completion.
  2. Effective Multi-turn RL Training: Utilizes these intermediate rewards in PPO, achieving outstanding performance on complex long-horizon tasks.

🎉News

  • [2025.05.28] 🔥 Release our paper on arXiv. See here.
  • [2025.05.26] 🚀 SPA-RL-Agent Repo launched!

📝Contents

⚙️ Setup

# Create virtual environment for agentic evaluation
conda create -n SPA python=3.9
conda activate SPA
pip install -r requirements.txt

# Download webshop environment setting
cd envs/webshop
pip install -e .
python -m spacy download en_core_web_lg
conda install -y -c conda-forge openjdk=11

# Download data for WebShop environment
gdown https://drive.google.com/uc?id=1G_0ccLWn5kZE5rpeyAdh_YuoNzvBUjT9
gdown https://drive.google.com/uc?id=11zOUDkJSgGhYin9NxQtG8PVpDsika86y
unzip data.zip
mkdir search_index
unzip indexes.zip -d search_index/

# Download data for ALFWorld environment
cd ../..
cd eval_agent/data/alfworld
gdown https://drive.google.com/uc?id=1y7Vqeo0_xm9d3I07vZaP6qbPFtyuJ6kI
unzip alfworld_data.zip

# Download data for VirtualHome environment
cd ../..
gdown https://drive.google.com/uc?id=1kZKWkWhtJ-DneqfS1Nb_FybR1RBxPeef
unzip virtualhome_master.zip

# Download expert trajectories for Webshop, ALFWorld and VirtualHome environment
cd ../..
gdown https://drive.google.com/uc?id=1_tBMDixZcIjKuv-LExNllha-YIRxhKIq
unzip data.zip

Create another virtual environment for RL training due to package conflicts:

conda create -n RL_train python=3.10
conda activate RL_train
pip install -r ppo/requirements.txt

⛏️ Usage

🤖 Base Agent SFT Training

conda activate SPA
cd sft
# For ALFWorld environment
bash alfworld_llama3b.sh
# For Webshop environment
bash webshop_llama3b.sh
# For VirtualHome environment
bash virtualhome_llama3b.sh

⚠️ Note that the bash scripts provide the hyperparameters to reproduce our results. You should modify the settings, such as the model path, according to your own environment.

🕹️ Explored Trajectories Collection

cd ..
# For ALFWorld environment
bash exploration/alfworld/my_generate_response.sh
# For WebShop environment
bash exploration/webshop/my_generate_response_webshop.sh

📈 Progress Estimator Training

To orgainize the exploration data for progress estimator training, please run the following scripts firstly.

python prm/data_org.py

Then you could run the following script to train progress estimator.

deepspeed --include=localhost:0,1,2,3 prm/train_our_progress_model.py

🤷‍♂️ Stepwise Progress Prediction

python prm/inference_prm.py

💪🏽 RL Training

To organize the inference data for RL training, please run the following script first.

python prm/rl_data_org.py

Next, execute the following script to perform reinforcement learning training using LoRA:

conda activate RL_train
bash ppo/train_ppo.sh

🎮 Evaluation

Before evaluation, we need to merge the LoRA weights with the original LLM weights to obtain the final model:

python ppo/merge.py

Then, we would run the evaluation scripts:

conda activate SPA
# For ALFWorld environment
bash eval/llama3_2_3b_eval_alfworld.sh
# For WebShop environment
bash eval/llama3_2_3b_eval_webshop.sh
# For VirtualHome environment
bash eval/llama3_2_3b_eval_virtualhome.sh

Running Baselines

TODO

🌹 Acknowledgement

Our code implementation is based on ETO and steptool. We thank them for their great work.

Also very thankful for my wonderful co-authors: Chak Tou Leong, Jiashuo Wang, Jian Wang, Wenjie Li.

🙏 Citation

If you find our work useful in your research, please consider citing our paper:

@article{wang2025spa,
  title={SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution},
  author={Wang, Hanlin and Leong, Chak Tou and Wang, Jiashuo and Wang, Jian and Li, Wenjie},
  journal={arXiv preprint arXiv:2505.20732},
  year={2025}
}

About

Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published