Open
Description
Your current environment
4 nodes of DGX 8xH100 each.
4 nodes of DGX 8xH100 each. Current main of vLLM.
🐛 Describe the bug
I try to run DSR1 with DEP32 with HT kernels, my run cmd looks like this (on node 2):
VLLM_ALL2ALL_BACKEND="deepep_high_throughput" \
VLLM_USE_DEEP_GEMM=1 \
VLLM_RANDOMIZE_DP_DUMMY_INPUTS=1 \
vllm serve deepseek-ai/DeepSeek-R1 \
--data_parallel_size 32 \
--data-parallel-size-local 8 \
--enable-expert-parallel \
--max-model-len 10240 \
--enforce-eager \
--data-parallel-address eos0391 \
--data-parallel-rpc-port 13345 \
--data-parallel-start-rank 8 \
--headless \
| tee ./dsr1_dep32_node2.log
I am able to run with max model len 128, but not even 10240 on 32 H100s due to OOM, which doesn't seem right. Low latency kernel works fine. Is that expected?
I have seen #19298 was addressing this issue but I still get OOM errors @varun-sundar-rabindranath .
I have attached the logs. OOM error in logs from node 4.
dsr1_dep32_node4.log
dsr1_dep32_node3.log
dsr1_dep32_node2.log
dsr1_dep32_node1.log
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.