Paper | Project Page | Model | Data
- 2025.3.16 The RAP dataset is now available. Access it here.🔥🔥
- 2025.2.27 RAP is accepted by CVPR 2025!🎉🎉
- 2024.11.24 Release code and model weights.
![]() |
---|
Introduce some user-specific concepts to our RAP-MLLM, it can remember them and achieve excellent performance in a variety of personalized multimodal generation tasks. |
Visit our Project Page for more demostrations.
- Clone the repo into a local folder.
git clone https://github.com/Hoar012/RAP-MLLM.git
cd RAP-MLLM
- Install packages.
conda create -n rap python=3.10 -y
conda activate rap
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install -r requirements.txt
Pretrained model weights are available on Hugging Face.
RAP-LLaVA: RAP-LLaVA-13b; RAP-Phi3-V: RAP-Phi3-mini
Build Your Personal Database:
Each concept record in the database can be structured with the following format:
{
"concept_dict": {
"<concept>": {
"name": "concept_name",
"image": "image_path",
"info": "",
"category": ""
}
},
"path_to_concept": {
"image_path": "<concept>",
}
}
We provide an example of the database in example_database
.
CLI Demo:
python cli.py --model-path Hoar012/RAP-LLaVA-13b --image-file /path/to/test_image --retrieval --database example_database --topK 1
Please check Data for more detail.
We provide the training scripts with DeepSpeed below. Try training on your own dataset!
Model | RAP-LLaVA | RAP-Phi3-V | LLaVA-LoRA |
---|---|---|---|
Script | script | script | script |
Please download the test data used in the paper from the repositories of MyVLM and Yo'LLaVA.
python eval/caption.py --eval-file /path/to/eval_file --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 2
The eval-file
records the image paths to be evaluated and their corresponding target concepts, formatted as follows:
{
"/path/to/image": [
"target_concept"
],
}
python eval/VQA.py --eval-file eval/yollava-visual-qa.json --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 1
python eval/recognition.py --eval-file eval/recognition_test.json --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 1
@InProceedings{Hao_2025_CVPR,
author = {Hao, Haoran and Han, Jiaming and Li, Changsheng and Li, Yu-Feng and Yue, Xiangyu},
title = {RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {14538-14548}
}