Skip to content

Conversation

@ModiIntel
Copy link
Contributor

Summary
This PR updates the existing RayClusterFleet example for qwen-coder-7b-instruct to make it fully functional for multi-node inference with vLLM and Ray.

What's fixed or added:
Corrected the command and args for the Ray head pod to properly start vllm serve

Explicitly set tensor-parallel-size and distributed-executor-backend

Added the aibrix-runtime sidecar container

Added a missing Service to expose port 8000 (vLLM)

Added an HTTPRoute to route OpenAI-compatible requests via Gateway API

Added the environment variable PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to handle memory fragmentation

Ensured Ray workers are correctly bootstrapped and terminated

Why this matters
The original example didn't include necessary components like the service or routing layer and lacked Ray-aware vLLM initialization. This update makes it a solid reference for users aiming to use distributed inference with multiple GPU nodes.

Tested and validated on an EKS cluster with 3x g4dn.2xlarge nodes.

@ModiIntel ModiIntel force-pushed the feat/rayclusterfleet-distributed-inference-example branch from 033292a to 014c2a5 Compare April 8, 2025 09:16
@ModiIntel
Copy link
Contributor Author

fix: complete RayClusterFleet example for multi-node vLLM inference

Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! this is super helpful.

/lgtm

@Jeffwan Jeffwan merged commit c94029b into vllm-project:main Apr 8, 2025
3 checks passed
@ModiIntel ModiIntel deleted the feat/rayclusterfleet-distributed-inference-example branch April 9, 2025 07:44
gangmuk pushed a commit to gangmuk/aibrix-gangmuk that referenced this pull request Jun 21, 2025
Yaegaki1Erika pushed a commit to Yaegaki1Erika/aibrix that referenced this pull request Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants