Skip to content

Latest commit

 

History

History
61 lines (44 loc) · 1.88 KB

File metadata and controls

61 lines (44 loc) · 1.88 KB

CALVIN Benchmark

CALVIN is a benchmark for evaluating vision-language models in robotic long-horizon manipulation tasks.

Method Mode Setting AVG CKPT
UniVLA video sft ABCD->D 4.63 (5x:4.71) huggingface

Environment Setup

We follow the RoboVLMs repository for environment setup. This setup is only for evaluation. The following steps are required to set up the environment:

# Install dependencies
cd reference/RoboVLMs

# This will install the required environment and download the calvin dataset.
bash scripts/setup_calvin.sh

# Only for rendering environment.
bash scripts/setup_calvin_vla.sh

# Check if the environment is set up correctly
python eval/calvin/env_test.py

Dataset Preparation

# 1. process the dataset
python tools/process/calvin_process.py

# 2. extract the vq tokens, need to change the dataset & output path
bash scripts/tokenizer/extract_vq_emu3.sh 

# 3. pickle generation for training
python tools/pickle_gen/pickle_generation_calvin.py

Model Training

FAST Tokenizer

You can fit the FAST tokenizer on the corresponding dataset. Also, you can adjust the scale in tokenizer for more fine-grained tokenization.

python tools/action_tokenizer/fit_fast.py
bash scripts/simulator/calvin/train_calvin_abcd_video.sh

Model Evaluation

cd reference/RoboVLMs

# 8 GPUs inference
bash scripts/run_eval_calvin_univla.sh ${CKPT_PATH} 

# above command will generate the 8 results (if use 8 GPUs) in the `results` folder, calculate the final average score
python tools/evaluation/calvin_score.py