MMRL: Multi-Modal Representation Learning for Vision-Language Models (CVPR2025) & MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models (arXiv)

This repository provides the official PyTorch implementation for our CVPR 2025 paper:
MMRL: Multi-Modal Representation Learning for Vision-Language Models
and our arXiv extension:
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models

📄 MMRL Paper Link
📄 MMRL++ Paper Link

📢 News

🗓️ 2025/05/21: MMRL++ code is released!
🗓️ 2025/05/15: MMRL++ arXiv version is available.
🗓️ 2025/03/11: MMRL arXiv version is available.
🗓️ 2025/03/04: MMRL code is released!
🗓️ 2025/02/27: MMRL is accepted by CVPR 2025 🎉

🔧 Installation

MMRL and MMRL++ build upon CoOp and MaPLe. Please refer to the CoOp repository for dataset setup instructions. We sincerely appreciate their contributions!

To set up the runtime environment, you can follow the guidelines provided in the CoOp repository or use the step-by-step instructions below (recommended) to create and configure your environment.

Setup conda environment (recommended).

# Create a conda environment
conda create -y -n mmrl python=3.10

# Activate the environment
conda activate mmrl

# Install torch (requires version >= 1.8.1) and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

Install Dassl library.

# Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation

# Clone this repo
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
cd Dassl.pytorch/

# Install dependencies
pip install -r requirements.txt

# Install this library (no need to re-build if the source code is modified)
python setup.py develop
cd ..

Clone MMRL code repository

git clone https://github.com/yunncheng/MMRL.git
cd MMRL/

🚀 Running the Code

We provide various scripts for different experimental settings. The main scripts are:

base_to_novel.sh (Base-to-Novel Generalization)
cross_datasets.sh (Cross-Dataset Evaluation and Domain Generalization)
few_shot.sh (Few-Shot Learning)
Detailed bash scripts in scripts/mmrl and scripts/mmrlpp

To run the experiments, navigate to the MMRL root directory and execute the corresponding script. Make sure to replace DATA with the path to your dataset in scripts/mmrl and scripts/mmrlpp.

Base-to-Novel Generalization

Run the following command:

bash base_to_novel.sh

You can modify configurations in:

trainer/config.py
configs/trainers/MMRL/vit_b16.yaml
configs/trainers/MMRL/vit_b16_imagenet.yaml
configs/trainers/MMRLpp/vit_b16.yaml
configs/trainers/MMRLpp/vit_b16_imagenet.yaml

Cross-Dataset Evaluation and Domain Generalization

Run the following command:

bash cross_datasets.sh

You can adjust configurations in:

trainer/config.py
configs/trainers/MMRL/vit_b16_cross_datasets.yaml
configs/trainers/MMRLpp/vit_b16_cross_datasets.yaml
scripts/mmrl/cross_datasets_train.sh
scripts/mmrl/cross_datasets_test.sh

Note: Ensure that the REP_DIM value remains consistent between training on ImageNet and testing on other datasets when runing MMRL.

Few-Shot Learning

Run the following command:

bash few_shot.sh

Configurations can be adjusted in:

trainer/config.py
configs/trainers/MMRL/vit_b16_few_shot.yaml
configs/trainers/MMRL/vit_b16_imagenet.yaml
configs/trainers/MMRLpp/vit_b16_few_shot.yaml
configs/trainers/MMRLpp/vit_b16_imagenet.yaml

✨ MMRL++

MMRL++ is an extension of MMRL that introduces:

Shared-Residual Representation Aligner (SRRA): A parameter-efficient design for gradient and information sharing.
Progressive Representation Composition (PRC): Enhances intra-modal interaction via inter-layer instance-specific semantic flow.

It achieves stronger generalization with fewer trainable parameters while maintaining or improving performance across multiple benchmarks.

📄 Read the MMRL++ paper here: https://arxiv.org/abs/2505.10088

🧩 Model Zoo

You can find the trained MMRL and MMRL++ model weights and corresponding log files at Model / Logs.

Please Note: We have fixed some naming bugs in the code while uploading the weights. Therefore, if you wish to use our trained weights, please ensure you are using the latest open-source code.

📌 Citation

If you find this repository useful for your research, please consider citing:

@inproceedings{guo2025mmrl,
      title={Mmrl: Multi-modal representation learning for vision-language models},
      author={Guo, Yuncheng and Gu, Xiaodong},
      booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
      pages={25015--25025},
      year={2025}
}

@misc{guo2025mmrlparameterefficientinteractionawarerepresentation,
      title={MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models}, 
      author={Yuncheng Guo and Xiaodong Gu},
      year={2025},
      eprint={2505.10088},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.10088}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clip		clip
configs		configs
datasets		datasets
lpclip		lpclip
scripts		scripts
trainers		trainers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_to_novel.sh		base_to_novel.sh
cross_datasets.sh		cross_datasets.sh
few_shot.sh		few_shot.sh
parse_test_res.py		parse_test_res.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMRL: Multi-Modal Representation Learning for Vision-Language Models (CVPR2025) & MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models (arXiv)

📢 News

🔧 Installation

🚀 Running the Code

Base-to-Novel Generalization

Cross-Dataset Evaluation and Domain Generalization

Few-Shot Learning

✨ MMRL++

🧩 Model Zoo

📌 Citation

About

Uh oh!

Releases

Packages

Languages

License

yunncheng/MMRL

Folders and files

Latest commit

History

Repository files navigation

MMRL: Multi-Modal Representation Learning for Vision-Language Models (CVPR2025) & MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models (arXiv)

📢 News

🔧 Installation

🚀 Running the Code

Base-to-Novel Generalization

Cross-Dataset Evaluation and Domain Generalization

Few-Shot Learning

✨ MMRL++

🧩 Model Zoo

📌 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages