Skip to content

[vLLM] add vLLM offline inference#47

Merged
aikx merged 6 commits intobaaivision:mainfrom
zhaoyinglia:inference_vllm
Nov 19, 2025
Merged

[vLLM] add vLLM offline inference#47
aikx merged 6 commits intobaaivision:mainfrom
zhaoyinglia:inference_vllm

Conversation

@zhaoyinglia
Copy link
Copy Markdown
Collaborator

@zhaoyinglia zhaoyinglia commented Nov 19, 2025

PR Description

Add vLLM backend support to enable efficient inference for Emu3.5 AR.
New features include:

  • Batch scheduler between cond_input and uncond_input.
  • Customized logits processor: ClassifierFreeGuidanceLogitsForVisualTokenProcessor.

Performance is presented below, as reported in the technical report.

截屏2025-11-19 14 07 50
  • Usage
# Requires Python 3.12 or higher.
pip install -r requirements/vllm.txt # vllm==0.11.0, toch==2.8.0+cu128
pip install flash_attn==2.8.3 --no-build-isolation

cd Emu3.5
python src/patch/apply.py # apply all *.patch files based on vllm-0.11.0
# 🖼️ Text-to-Image (T2I) task
CUDA_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_t2i.py

# 🔄 Any-to-Image (X2I) task
CUDA_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_x2i.py

# 🎯 Visual Guidance task
CUDA_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_guidance.py

# 📖 Visual Narrative task
CUDA_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_narrative.py
  • Note:
    vLLM's gpu_memory_utilization for kv_cache defaults to 0.7 on an 80GiB device. Adjust as needed for your hardware.

@aikx aikx merged commit 1e40e5b into baaivision:main Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants