FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

This repository contains the training and evaluation code for FastWAM on LIBERO / RoboTwin.

Index

File Structure
Environment Setup
Model Preparation
Dataset Download
Inference with Released Checkpoints
Training
Inference with Your Trained Checkpoints
Acknowledgements
BibTeX

File Structure

FastWAM/
├── configs/
│   ├── data/                 # Dataset configs (LIBERO, RoboTwin, etc.)
│   ├── model/                # Model architecture and component configs
│   └── task/                 # Task-level configs (training task names)
├── scripts/
│   ├── train.py
│   ├── train_zero1.sh        # Deepspeed zero1 training entrypoint
│   ├── preprocess_action_dit_backbone.py  # Preprocess ActionDiT backbone before training
│   └── precompute_text_embeds.py  # Precompute T5 text embedding cache before training
├── experiments/
│   ├── libero/
│   │   └── run_libero_manager.py
│   └── robotwin/
│       └── run_robotwin_manager.py
├── src/fastwam/              # Core code
├── runs/                     # Training outputs (ckpt, logs)
├── checkpoints/              # Pretrained or external checkpoints
├── data/                     # Data directory
└── evaluate_results/         # Inference / evaluation results

Environment Setup

conda create -n fastwam python=3.10 -y
conda activate fastwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .

Model Preparation

This step is required before both training and inference.

Step 1: set the Wan model directory first (opional, default ./checkpoints):

mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"

Step 2: pre-generate the ActionDiT backbone (interpolated from Wan22 DiT):

# uncond (fastwam)
python scripts/preprocess_action_dit_backbone.py \
  --model-config configs/model/fastwam.yaml \
  --output checkpoints/ActionDiT_linear_interp_Wan22_alphascale_1024hdim.pt \
  --device cuda \
  --dtype bfloat16

Dataset Download

LIBERO

The preprocessed LIBERO dataset used by Fast-WAM is available at:

https://huggingface.co/datasets/yuanty/LIBERO-fastwam

Download all compressed files first, then extract them all:

mkdir -p data/libero_mujoco3.3.2
cd data/libero_mujoco3.3.2

# Run after downloading all 4 tar.gz files
for f in *.tar.gz; do
  tar -xzf "$f"
done

The extracted directory structure should be:

data/libero_mujoco3.3.2/
├── libero_10_no_noops_lerobot/
├── libero_goal_no_noops_lerobot/
├── libero_object_no_noops_lerobot/
└── libero_spatial_no_noops_lerobot/

RoboTwin

The preprocessed RoboTwin dataset used by Fast-WAM is available at:

https://huggingface.co/datasets/yuanty/robotwin2.0-fastwam

Download all split archive files first, then concatenate and extract:

mkdir -p data/robotwin2.0
cd data/robotwin2.0

# Run after downloading all robotwin2.0.tar.gz.part-* files
cat robotwin2.0.tar.gz.part-* | tar -xzf -

The extracted directory structure should be:

data/robotwin2.0/
└── robotwin2.0/
    ├── data/
    ├── meta/
    └── videos/

If you also keep:

data/robotwin2.0/dataset_stats.json

in the root directory, it can be used directly as the statistics file for the current configs in this repo. You can also recompute it.

Inference with Released Checkpoints

The released checkpoints and their corresponding dataset stats are available on Hugging Face.

Optional: download released checkpoints and dataset stats from Hugging Face:

pip install -U huggingface_hub

huggingface-cli download yuanty/fastwam \
  libero_uncond_2cam224.pt \
  libero_uncond_2cam224_dataset_stats.json \
  robotwin_uncond_3cam_384.pt \
  robotwin_uncond_3cam_384_dataset_stats.json \
  --local-dir ./checkpoints/fastwam_release

After downloading, the local directory is expected to contain:

checkpoints/fastwam_release/
├── libero_uncond_2cam224.pt
├── libero_uncond_2cam224_dataset_stats.json
├── robotwin_uncond_3cam_384.pt
└── robotwin_uncond_3cam_384_dataset_stats.json

Before running the LIBERO benchmark, install the official LIBERO environment first from the LIBERO repository. Then run this final step:

pip install mujoco==3.3.2

The mujoco environment should ideally stay consistent with the LIBERO data version.

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository to finish environment installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Optional: evaluate released LIBERO checkpoint:

The released LIBERO / RoboTwin evaluation managers default to 8 GPUs (MULTIRUN.num_gpus=8 in configs/sim_libero.yaml and configs/sim_robotwin.yaml). If you want to evaluate with fewer GPUs, pass a smaller value such as MULTIRUN.num_gpus=4.

python experiments/libero/run_libero_manager.py \
  task=libero_uncond_2cam224_1e-4 \
  ckpt=./checkpoints/fastwam_release/libero_uncond_2cam224.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/libero_uncond_2cam224_dataset_stats.json \
  MULTIRUN.num_gpus=8

Optional: evaluate released RoboTwin checkpoint:

python experiments/robotwin/run_robotwin_manager.py \
  task=robotwin_uncond_3cam_384_1e-4 \
  ckpt=./checkpoints/fastwam_release/robotwin_uncond_3cam_384.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/robotwin_uncond_3cam_384_dataset_stats.json \
  MULTIRUN.num_gpus=8

For faster RoboTwin evaluation, we have enabled EVALUATION.skip_get_obs_within_replan=true in configs/sim_robotwin.yaml. This skips RGB rendering while consecutively executing an action chunk within one replan window, which speeds up evaluation but makes the saved video look very low-FPS. Set it to false if you want to save a fully rendered video.

Note: We evaluate with unseen instructions, following Motus. Lingbot-VA uses seen instructions instead. You can try EVALUATION.instruction_type=seen to use seen instructions, which should theoretically improve performance by one or two points.

Training

1) Precompute T5 embedding cache before training

Use scripts/precompute_text_embeds.py to precompute embeddings for each training task:

# LIBERO
python scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

# RoboTwin
python scripts/precompute_text_embeds.py task=robotwin_uncond_3cam_384_1e-4

For multi-GPU:

torchrun --standalone --nproc_per_node=8 scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

2) Training (using `fastwam` as an example)

When running a new task for the first time, set pretrained_norm_stats in the corresponding configs/data/*.yaml to null first. After one training run, a dataset_stats.json file will be generated in the current run directory (for example, runs/{task_name}/{run_id}/dataset_stats.json). You can then update pretrained_norm_stats to that file path for subsequent runs.

# LIBERO
bash scripts/train_zero1.sh 8 task=libero_uncond_2cam224_1e-4

# RoboTwin
bash scripts/train_zero1.sh 8 task=robotwin_uncond_3cam_384_1e-4

For LIBERO, we train on a single node with 8 GPUs. For RoboTwin, we use 64 GPUs to accelerate training. You can try reducing the GPU count or training epochs.

Inference with Your Trained Checkpoints

The mujoco environment should ideally stay consistent with the LIBERO data version. Then run LIBERO evaluation:

# LIBERO
python experiments/libero/run_libero_manager.py task={task_name} ckpt={ckpt_path}

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository. Finish installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Then run RoboTwin evaluation:

python experiments/robotwin/run_robotwin_manager.py task={task_name} ckpt={ckpt_path}

Common task_name examples:

libero_uncond_2cam224_1e-4
robotwin_uncond_3cam_384_1e-4

Acknowledgements

The RoboTwin evaluation code in this repository is adapted from the official RoboTwin repository. We thank the RoboTwin team for releasing their codebase and assets.

BibTeX

If you find our work helpful, please consider citing:

@article{yuan2026fastwam,
  title={Fast-WAM: Do World Action Models Need Test-time Future Imagination?},
  author={Tianyuan Yuan and Zibin Dong and Yicheng Liu and Hang Zhao},
  journal={arXiv preprint arXiv:2603.16666},
  year={2026},
  url={https://arxiv.org/abs/2603.16666}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
experiments		experiments
scripts		scripts
src/fastwam		src/fastwam
third_party/RoboTwin		third_party/RoboTwin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastWAM

Index

File Structure

Environment Setup

Model Preparation

Dataset Download

LIBERO

RoboTwin

Inference with Released Checkpoints

Training

1) Precompute T5 embedding cache before training

2) Training (using `fastwam` as an example)

Inference with Your Trained Checkpoints

Acknowledgements

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

FastWAM

Index

File Structure

Environment Setup

Model Preparation

Dataset Download

LIBERO

RoboTwin

Inference with Released Checkpoints

Training

1) Precompute T5 embedding cache before training

2) Training (using fastwam as an example)

Inference with Your Trained Checkpoints

Acknowledgements

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

2) Training (using `fastwam` as an example)

Packages