Terminal RL

RL training for terminal agents. The agent interacts with Docker-hosted environments and is trained with GRPO (optional PRM).

This workflow has two independent components:

Training machine runs task router + Ray + training scripts, and connects to workers via WORKER_URLS
Remote workers: run the pool server and execute tasks (Docker required): remote/README.md

Prerequisites

Training machine: GPU node/cluster with the required training dependencies.
Remote workers: Docker-capable hosts reachable from the training machine (default pool server port 18081). Setup: remote/README.md.

Instructions

0. Start remote workers (pool server)

Follow remote/README.md on each worker to start pool_server (it should be reachable at e.g. http://<worker-ip>:18081).

1. Clone the repo

From a directory of your choice:

git clone https://github.com/Gen-Verse/OpenClaw-RL.git
cd OpenClaw-RL

2. Prepare dataset (download + convert)

Download a supported dataset under terminal-rl/dataset/:

export DATASET_DIR="terminal-rl/dataset"
python terminal-rl/data_utils/download.py seta_env

Convert tasks into training JSONL:

python terminal-rl/data_utils/convert_task_to_dataset.py \
  --tasks_dir terminal-rl/dataset/seta_env

The seta_env dataset corresponds to the task dataset published in: camel-ai/seta-env.

3. Run training

On the training machine, set the required environment variables:

# Hugging Face cache / model paths
export HF_HOME="/path/to/huggingface"
export MODEL_CKPT="/path/to/model"
export REF_LOAD="/path/to/reference_model_dir"
export SAVE_CKPT="/path/to/save/checkpoints"

# Dataset + workers
export ROLLOUT_PROMPT_DATA="/path/to/train.jsonl"
export WORKER_URLS="http://worker1:18081,http://worker2:18081"

# Logging
export WANDB_KEY="your-wandb-key"

Then run (from repo root):

bash terminal-rl/terminal_qwen3_8b_rl.sh

PRM training (optional)

To enable PRM scoring with the 2-node script, add:

export PRM_ENABLE=1
export PRM_MODEL_PATH="/path/to/prm-model"
export PRM_M=3
export PRM_STEP_COEF=1.0
export PRM_TEMPERATURE=0.0
export PRM_MAX_NEW_TOKENS=4096
# Optional: use an external PRM endpoint instead of framework-hosted engines
export PRM_SGLANG_URL="http://<prm-router-ip>:<prm-router-port>"

Then run:

bash terminal-rl/terminal_qwen3_8b_prm_rl_2nodes.sh

Notes

WORKER_URLS must point to already-running pool servers.
As an example, one rollout agent implementation in this repo is based on CAMEL (see terminal-rl/agent/camel_agent.py and CAMEL).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminal RL

Prerequisites

Instructions

0. Start remote workers (pool server)

1. Clone the repo

2. Prepare dataset (download + convert)

3. Run training

PRM training (optional)

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Terminal RL

Prerequisites

Instructions

0. Start remote workers (pool server)

1. Clone the repo

2. Prepare dataset (download + convert)

3. Run training

PRM training (optional)

Notes