diff --git a/docs/source/models/supported_models.md b/docs/source/models/supported_models.md index 0fdffbeefd0..13161cb477f 100644 --- a/docs/source/models/supported_models.md +++ b/docs/source/models/supported_models.md @@ -1107,6 +1107,10 @@ This limitation exists because the model's mixed attention pattern (bidirectiona To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have to pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM. ::: +:::{warning} +For improved output quality of `AllenAI/Molmo-7B-D-0924` (especially in object localization tasks), we recommend using the pinned dependency versions listed in (including `vllm==0.7.0`). These versions match the environment that achieved consistent results on both A10 and L40 GPUs. +::: + :::{note} The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`HwwwH/MiniCPM-V-2`) for now. For more details, please see: diff --git a/requirements/molmo.txt b/requirements/molmo.txt new file mode 100644 index 00000000000..8450e29b6e7 --- /dev/null +++ b/requirements/molmo.txt @@ -0,0 +1,20 @@ +# Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40) +torch==2.5.1 +torchvision==0.20.1 +transformers==4.48.1 +tokenizers==0.21.0 +tiktoken==0.7.0 +vllm==0.7.0 + +# Optional but recommended for improved performance and stability +triton==3.1.0 +xformers==0.0.28.post3 +uvloop==0.21.0 +protobuf==5.29.3 +openai==1.60.2 +opencv-python-headless==4.11.0.86 +pillow==10.4.0 + +# Installed FlashAttention (for float16 only) +flash-attn>=2.5.6 # Not used in float32, but should be documented +