fix: complete RayClusterFleet example for multi-node vLLM inference #954

ModiIntel · 2025-04-08T09:13:30Z

Summary
This PR updates the existing RayClusterFleet example for qwen-coder-7b-instruct to make it fully functional for multi-node inference with vLLM and Ray.

What's fixed or added:
Corrected the command and args for the Ray head pod to properly start vllm serve

Explicitly set tensor-parallel-size and distributed-executor-backend

Added the aibrix-runtime sidecar container

Added a missing Service to expose port 8000 (vLLM)

Added an HTTPRoute to route OpenAI-compatible requests via Gateway API

Added the environment variable PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to handle memory fragmentation

Ensured Ray workers are correctly bootstrapped and terminated

Why this matters
The original example didn't include necessary components like the service or routing layer and lacked Ray-aware vLLM initialization. This update makes it a solid reference for users aiming to use distributed inference with multiple GPU nodes.

Tested and validated on an EKS cluster with 3x g4dn.2xlarge nodes.

Signed-off-by: ModiIntel <[email protected]>

ModiIntel · 2025-04-08T09:53:15Z

fix: complete RayClusterFleet example for multi-node vLLM inference

Jeffwan

thanks! this is super helpful.

/lgtm

…llm-project#954) Signed-off-by: ModiIntel <[email protected]>

fix: complete RayClusterFleet example for multi-node vLLM inference

014c2a5

Signed-off-by: ModiIntel <[email protected]>

ModiIntel force-pushed the feat/rayclusterfleet-distributed-inference-example branch from 033292a to 014c2a5 Compare April 8, 2025 09:16

Jeffwan approved these changes Apr 8, 2025

View reviewed changes

Jeffwan merged commit c94029b into vllm-project:main Apr 8, 2025
3 checks passed

ModiIntel deleted the feat/rayclusterfleet-distributed-inference-example branch April 9, 2025 07:44

gangmuk pushed a commit to gangmuk/aibrix-gangmuk that referenced this pull request Jun 21, 2025

fix: complete RayClusterFleet example for multi-node vLLM inference (v…

95e0f63

…llm-project#954) Signed-off-by: ModiIntel <[email protected]>

Yaegaki1Erika pushed a commit to Yaegaki1Erika/aibrix that referenced this pull request Jul 23, 2025

fix: complete RayClusterFleet example for multi-node vLLM inference (v…

b248ee2

…llm-project#954) Signed-off-by: ModiIntel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: complete RayClusterFleet example for multi-node vLLM inference #954

fix: complete RayClusterFleet example for multi-node vLLM inference #954

Uh oh!

ModiIntel commented Apr 8, 2025

Uh oh!

ModiIntel commented Apr 8, 2025

Uh oh!

Jeffwan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: complete RayClusterFleet example for multi-node vLLM inference #954

fix: complete RayClusterFleet example for multi-node vLLM inference #954

Uh oh!

Conversation

ModiIntel commented Apr 8, 2025

Uh oh!

ModiIntel commented Apr 8, 2025

Uh oh!

Jeffwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants