RL training for terminal agents. The agent interacts with Docker-hosted environments and is trained with GRPO (optional PRM).
This workflow has two independent components:
- Training machine runs task router + Ray + training scripts, and connects to workers via
WORKER_URLS - Remote workers: run the pool server and execute tasks (Docker required): remote/README.md
- Training machine: GPU node/cluster with the required training dependencies.
- Remote workers: Docker-capable hosts reachable from the training machine (default pool server port 18081). Setup: remote/README.md.
Follow remote/README.md on each worker to start pool_server (it should be reachable at e.g. http://<worker-ip>:18081).
From a directory of your choice:
git clone https://github.com/Gen-Verse/OpenClaw-RL.git
cd OpenClaw-RLDownload a supported dataset under terminal-rl/dataset/:
export DATASET_DIR="terminal-rl/dataset"
python terminal-rl/data_utils/download.py seta_envConvert tasks into training JSONL:
python terminal-rl/data_utils/convert_task_to_dataset.py \
--tasks_dir terminal-rl/dataset/seta_envThe seta_env dataset corresponds to the task dataset published in: camel-ai/seta-env.
On the training machine, set the required environment variables:
# Hugging Face cache / model paths
export HF_HOME="/path/to/huggingface"
export MODEL_CKPT="/path/to/model"
export REF_LOAD="/path/to/reference_model_dir"
export SAVE_CKPT="/path/to/save/checkpoints"
# Dataset + workers
export ROLLOUT_PROMPT_DATA="/path/to/train.jsonl"
export WORKER_URLS="http://worker1:18081,http://worker2:18081"
# Logging
export WANDB_KEY="your-wandb-key"Then run (from repo root):
bash terminal-rl/terminal_qwen3_8b_rl.shTo enable PRM scoring with the 2-node script, add:
export PRM_ENABLE=1
export PRM_MODEL_PATH="/path/to/prm-model"
export PRM_M=3
export PRM_STEP_COEF=1.0
export PRM_TEMPERATURE=0.0
export PRM_MAX_NEW_TOKENS=4096
# Optional: use an external PRM endpoint instead of framework-hosted engines
export PRM_SGLANG_URL="http://<prm-router-ip>:<prm-router-port>"Then run:
bash terminal-rl/terminal_qwen3_8b_prm_rl_2nodes.shWORKER_URLSmust point to already-running pool servers.- As an example, one rollout agent implementation in this repo is based on CAMEL (see
terminal-rl/agent/camel_agent.pyand CAMEL).