vllm-project · vllm-bot · Apr 24, 2025 · Apr 23, 2025 · Apr 23, 2025 · Apr 23, 2025
@@ -1081,6 +1081,12 @@ Pan-and-scan image pre-processing is currently supported on V0 (but not V1).
 You can enable it by passing `--mm-processor-kwargs '{"do_pan_and_scan": True}'`.
 :::
 
+### AllenAI Molmo-7B-D-0924 (multi-modal)
+
+⚠️ Accuracy Note: For improved output quality (especially in object localization tasks), we recommend using the pinned dependency versions listed in [`requirements/molmo.txt`](https://github.com/vllm-project/vllm/blob/main/requirements/molmo.txt).  
+These versions match the environment that achieved consistent results on both A10 and L40 GPUs.  
+_Note: This setup currently works with `vllm==0.7.0`._
+
 :::{warning}
 Both V0 and V1 support `Gemma3ForConditionalGeneration` for text-only inputs.
 However, there are differences in how they handle text + image inputs:

diff --git a/requirements/molmo.txt b/requirements/molmo.txt
@@ -0,0 +1,20 @@
+# Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)
+torch==2.5.1
+torchvision==0.20.1
+transformers==4.48.1
+tokenizers==0.21.0
+tiktoken==0.7.0
+vllm==0.7.0
+
+# Optional but recommended for improved performance and stability
+triton==3.1.0
+xformers==0.0.28.post3
+uvloop==0.21.0
+protobuf==5.29.3
+openai==1.60.2
+opencv-python-headless==4.11.0.86
+pillow==10.4.0
+
+# Installed FlashAttention (for float16 only)
+flash-attn>=2.5.6  # Not used in float32, but should be documented
+