Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
This repository contains the training and evaluation code for FastWAM on LIBERO / RoboTwin.
- File Structure
- Environment Setup
- Model Preparation
- Dataset Download
- Inference with Released Checkpoints
- Training
- Inference with Your Trained Checkpoints
- Acknowledgements
- BibTeX
FastWAM/
├── configs/
│ ├── data/ # Dataset configs (LIBERO, RoboTwin, etc.)
│ ├── model/ # Model architecture and component configs
│ └── task/ # Task-level configs (training task names)
├── scripts/
│ ├── train.py
│ ├── train_zero1.sh # Deepspeed zero1 training entrypoint
│ ├── preprocess_action_dit_backbone.py # Preprocess ActionDiT backbone before training
│ └── precompute_text_embeds.py # Precompute T5 text embedding cache before training
├── experiments/
│ ├── libero/
│ │ └── run_libero_manager.py
│ └── robotwin/
│ └── run_robotwin_manager.py
├── src/fastwam/ # Core code
├── runs/ # Training outputs (ckpt, logs)
├── checkpoints/ # Pretrained or external checkpoints
├── data/ # Data directory
└── evaluate_results/ # Inference / evaluation results
conda create -n fastwam python=3.10 -y
conda activate fastwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .This step is required before both training and inference.
Step 1: set the Wan model directory first (opional, default ./checkpoints):
mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"Step 2: pre-generate the ActionDiT backbone (interpolated from Wan22 DiT):
# uncond (fastwam)
python scripts/preprocess_action_dit_backbone.py \
--model-config configs/model/fastwam.yaml \
--output checkpoints/ActionDiT_linear_interp_Wan22_alphascale_1024hdim.pt \
--device cuda \
--dtype bfloat16The preprocessed LIBERO dataset used by Fast-WAM is available at:
Download all compressed files first, then extract them all:
mkdir -p data/libero_mujoco3.3.2
cd data/libero_mujoco3.3.2
# Run after downloading all 4 tar.gz files
for f in *.tar.gz; do
tar -xzf "$f"
doneThe extracted directory structure should be:
data/libero_mujoco3.3.2/
├── libero_10_no_noops_lerobot/
├── libero_goal_no_noops_lerobot/
├── libero_object_no_noops_lerobot/
└── libero_spatial_no_noops_lerobot/
The preprocessed RoboTwin dataset used by Fast-WAM is available at:
Download all split archive files first, then concatenate and extract:
mkdir -p data/robotwin2.0
cd data/robotwin2.0
# Run after downloading all robotwin2.0.tar.gz.part-* files
cat robotwin2.0.tar.gz.part-* | tar -xzf -The extracted directory structure should be:
data/robotwin2.0/
└── robotwin2.0/
├── data/
├── meta/
└── videos/
If you also keep:
data/robotwin2.0/dataset_stats.json
in the root directory, it can be used directly as the statistics file for the current configs in this repo. You can also recompute it.
The released checkpoints and their corresponding dataset stats are available on Hugging Face.
Optional: download released checkpoints and dataset stats from Hugging Face:
pip install -U huggingface_hub
huggingface-cli download yuanty/fastwam \
libero_uncond_2cam224.pt \
libero_uncond_2cam224_dataset_stats.json \
robotwin_uncond_3cam_384.pt \
robotwin_uncond_3cam_384_dataset_stats.json \
--local-dir ./checkpoints/fastwam_releaseAfter downloading, the local directory is expected to contain:
checkpoints/fastwam_release/
├── libero_uncond_2cam224.pt
├── libero_uncond_2cam224_dataset_stats.json
├── robotwin_uncond_3cam_384.pt
└── robotwin_uncond_3cam_384_dataset_stats.json
Before running the LIBERO benchmark, install the official LIBERO environment first
from the LIBERO repository.
Then run this final step:
pip install mujoco==3.3.2The mujoco environment should ideally stay consistent with the LIBERO data version.
We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin.
You still need to follow the official RoboTwin instructions from the
RoboTwin repository to finish environment installation and download the required assets, then create the policy symlink:
ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"Optional: evaluate released LIBERO checkpoint:
The released LIBERO / RoboTwin evaluation managers default to 8 GPUs
(MULTIRUN.num_gpus=8 in configs/sim_libero.yaml and configs/sim_robotwin.yaml).
If you want to evaluate with fewer GPUs, pass a smaller value such as
MULTIRUN.num_gpus=4.
python experiments/libero/run_libero_manager.py \
task=libero_uncond_2cam224_1e-4 \
ckpt=./checkpoints/fastwam_release/libero_uncond_2cam224.pt \
EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/libero_uncond_2cam224_dataset_stats.json \
MULTIRUN.num_gpus=8Optional: evaluate released RoboTwin checkpoint:
python experiments/robotwin/run_robotwin_manager.py \
task=robotwin_uncond_3cam_384_1e-4 \
ckpt=./checkpoints/fastwam_release/robotwin_uncond_3cam_384.pt \
EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/robotwin_uncond_3cam_384_dataset_stats.json \
MULTIRUN.num_gpus=8For faster RoboTwin evaluation, we have enabled EVALUATION.skip_get_obs_within_replan=true in configs/sim_robotwin.yaml.
This skips RGB rendering while consecutively executing an action chunk within one replan window, which speeds up evaluation but makes the saved video look very low-FPS.
Set it to false if you want to save a fully rendered video.
Note: We evaluate with unseen instructions, following Motus. Lingbot-VA uses seen instructions instead. You can try EVALUATION.instruction_type=seen to use seen instructions, which should theoretically improve performance by one or two points.
Use scripts/precompute_text_embeds.py to precompute embeddings for each training task:
# LIBERO
python scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4
# RoboTwin
python scripts/precompute_text_embeds.py task=robotwin_uncond_3cam_384_1e-4For multi-GPU:
torchrun --standalone --nproc_per_node=8 scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4When running a new task for the first time, set pretrained_norm_stats in the corresponding configs/data/*.yaml to null first.
After one training run, a dataset_stats.json file will be generated in the current run directory (for example, runs/{task_name}/{run_id}/dataset_stats.json).
You can then update pretrained_norm_stats to that file path for subsequent runs.
# LIBERO
bash scripts/train_zero1.sh 8 task=libero_uncond_2cam224_1e-4
# RoboTwin
bash scripts/train_zero1.sh 8 task=robotwin_uncond_3cam_384_1e-4For LIBERO, we train on a single node with 8 GPUs. For RoboTwin, we use 64 GPUs to accelerate training. You can try reducing the GPU count or training epochs.
The mujoco environment should ideally stay consistent with the LIBERO data version. Then run LIBERO evaluation:
# LIBERO
python experiments/libero/run_libero_manager.py task={task_name} ckpt={ckpt_path}We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin.
You still need to follow the official RoboTwin instructions from the
RoboTwin repository.
Finish installation and download the required assets, then create the policy symlink:
ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"Then run RoboTwin evaluation:
python experiments/robotwin/run_robotwin_manager.py task={task_name} ckpt={ckpt_path}Common task_name examples:
libero_uncond_2cam224_1e-4
robotwin_uncond_3cam_384_1e-4
The RoboTwin evaluation code in this repository is adapted from the official RoboTwin repository. We thank the RoboTwin team for releasing their codebase and assets.
If you find our work helpful, please consider citing:
@article{yuan2026fastwam,
title={Fast-WAM: Do World Action Models Need Test-time Future Imagination?},
author={Tianyuan Yuan and Zibin Dong and Yicheng Liu and Hang Zhao},
journal={arXiv preprint arXiv:2603.16666},
year={2026},
url={https://arxiv.org/abs/2603.16666}
}