Reproducible scaling laws for contrastive language-image learning [arXiv]
by Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev [arXiv:2212.07143] (CVPR 2023)
In this repository, we provide the code for reproducing the experiments on large-scale CLIP pre-training and transfer to various downstream tasks for the paper "Reproducible scaling laws for contrastive language-image learning", together with pointers to open artefacts from this study (pre-trained models and all the intermediate checkpoints).
For follow-up work, see Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets, arXiv:2506.04598, research repo
Pre-trained openCLIP models across various scales, final checkpoints: https://huggingface.co/laion/scaling-laws-openclip/tree/main
All intermediate checkpoints for the pre-trained models (useful for studying training dynamics): https://huggingface.co/laion/scaling-laws-openclip/tree/main/full_checkpoints
You may also check other related resources:
- the OpenCLIP repository that points to the pre-trained models used in this study
- the LAION-400m and LAION-5B composition instructions, the datasets used for openCLIP pre-training in this study
- The instructions are also valid for the updated safety revision of those datasets, Re-LAION. Follow Re-LAION-research or Re-LAION-research-safe to obtain Re-LAION datasets for openCLIP experiments.
- CLIP Benchmarking, transfer evaluation used in this study
To reproduce scaling plots from the paper, see the figures notebook.
First, you need to clone the repo and install the requirements.
git clone https://github.com/LAION-AI/scaling-laws-openclip
cd scaling-laws-openclip
pip install -r requirements.txt
We provide a script, download_models.py
, to download all pre-trained models used in the paper.
To download all the 29 models used in the paper, use :
python download_models.py
You can also download a subset of the models. For instance:
python download_models.py --samples_seen 3B 13B --model ViT-B-32 --data 80M 400M 2B
will only download ViT-B/32 models with samples seen of 3B or 13B, trained on any of 80M/400M/2B LAION datasets.
Once you download the pre-trained models, you can also use them in OpenCLIP. Following is an example with ViT-H/14.
First, you need to download the model:
> python download_models.py --samples_seen 34B --model ViT-H-14 --data 2B
'Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt' downloaded.
Once the model is downloaded, it is possible to directly use it in OpenCLIP:
import torch
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt')
For a complete example, see the inference notebook.
If you find this work helpful, please cite our paper:
@inproceedings{cherti2023reproducible,
title={Reproducible scaling laws for contrastive language-image learning},
author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={2818--2829},
year={2023}
}
We would like to express gratitude to all the people who are working on making code, models and data publicly available, advancing community based research and making research more reproducible. Specifically, we would like to thank all the members of the LAION discord server community that was pivotal for the effort to compose LAION-400m and LAION-5B datasets without which this study would be impossible, and openAI for making their pre-trained CLIP models publicly available. We want to thank Hugging Face for providing hosting space for open datasets and models and Stability AI for providing supercomputing resources and storage space.
The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. for funding this work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster \citep{JUWELSBooster2020} at Jülich Supercomputing Centre (JSC). We also acknowledge storage resources on JUST granted and operated by JSC, as well as computing resources from the Helmholtz Data Federation (HDF). Further thank goes for support provided by JSC supercomputing facility administration team, especially to Damian Alvarez for his endurance and patience during the long "de-micing" sessions on JUWELS Booster.
Mehdi Cherti and Jenia Jitsev acknowledge partial funding by the Federal Ministry of Education and Research of Germany under BMBF grant no. 01IS22094B WestAI - AI Service Center West.
Special thanks goes also to Richard Vencu (LAION, Stability AI) for his on-going dedication towards enabling a HPC system and infrastructure around it that can be used by broad community of researchers and citizen scientists.