CALVIN Benchmark

CALVIN is a benchmark for evaluating vision-language models in robotic long-horizon manipulation tasks.

Method	Mode	Setting	AVG	CKPT
UniVLA	video sft	ABCD->D	4.63 (5x:4.71)	huggingface

Environment Setup

We follow the RoboVLMs repository for environment setup. This setup is only for evaluation. The following steps are required to set up the environment:

# Install dependencies
cd reference/RoboVLMs

# This will install the required environment and download the calvin dataset.
bash scripts/setup_calvin.sh

# Only for rendering environment.
bash scripts/setup_calvin_vla.sh

# Check if the environment is set up correctly
python eval/calvin/env_test.py

Dataset Preparation

# 1. process the dataset
python tools/process/calvin_process.py

# 2. extract the vq tokens, need to change the dataset & output path
bash scripts/tokenizer/extract_vq_emu3.sh 

# 3. pickle generation for training
python tools/pickle_gen/pickle_generation_calvin.py

Model Training

FAST Tokenizer

You can fit the FAST tokenizer on the corresponding dataset. Also, you can adjust the scale in tokenizer for more fine-grained tokenization.

python tools/action_tokenizer/fit_fast.py

bash scripts/simulator/calvin/train_calvin_abcd_video.sh

Model Evaluation

cd reference/RoboVLMs

# 8 GPUs inference
bash scripts/run_eval_calvin_univla.sh ${CKPT_PATH} 

# above command will generate the 8 results (if use 8 GPUs) in the `results` folder, calculate the final average score
python tools/evaluation/calvin_score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CALVIN Benchmark

Environment Setup

Dataset Preparation

Model Training

FAST Tokenizer

Model Evaluation

FilesExpand file tree

calvin.md

Latest commit

History

calvin.md

File metadata and controls

CALVIN Benchmark

Environment Setup

Dataset Preparation

Model Training

FAST Tokenizer

Model Evaluation