Skip to content

[V1][Frontend] Improve Shutdown And Logs #11737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 172 commits into from
Apr 17, 2025

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jan 4, 2025

SUMMARY:

  • Prior to this PR, if we encountered an error in a background process, we kill the process tree immediately, which means that we cannot cleanup resources and cannot return good status codes to clients. This PR overhauls the Error handling to instead shut down the background processes and raise Errors that allow us to return proper HTTP status codes to users
  • Prior to this PR, we were not properly shutting down when Errors occured during startup, especially in the TP case
  • Prior to this PR, we used signals to catch errors from background processes. Due to limitations of Python, this prevented us from running outside the main thread. This is a problem for deployments in TritonServer

DESIGN:

  • for errors during startup, we wrap __init__ code with try...catch and push FAILED over the ready PIPE. This works well since the parent processes are waiting for confirmation
  • for errors during runtime, we wrap the busy loops with try..catch and push failure messages over the existing IPC mechanisms.

One weakness is that issues with the ipc mechanisms themselves are not handled explicitly

  • Curious if anyone has ideas on this
  • This can be a follow on task

TEST MATRIX:

  • AsyncLLM, TP=1 + TP>1 --- runtime and startup
  • LLM (MP), TP=1, TP>1 --- runtime and startup
  • LLM (no-MP), TP=1, TP>1 --- runtime and startup

Fixes: #12690

Copy link

github-actions bot commented Jan 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Jan 4, 2025
Copy link

mergify bot commented Jan 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-neuralmagic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 4, 2025
@mergify mergify bot removed the needs-rebase label Jan 4, 2025
@robertgshaw2-redhat robertgshaw2-redhat marked this pull request as ready for review January 4, 2025 16:29
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Frontend] Improve API Server Error Messages [Frontend] Improve API Server Error Logs Jan 4, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Frontend] Improve API Server Error Logs [V1][Frontend] Improve Error Handling Shutdown And Logs Jan 4, 2025
@robertgshaw2-redhat
Copy link
Collaborator Author

Here is what the server logs look like for:

  • TP=2, 1000 concurrent streaming requests
  • Simulate illegal memory access on RANK 1 after 200 steps of the engine
...
INFO:     127.0.0.1:45354 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45360 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45368 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45372 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45388 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45394 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-04 17:21:02 core.py:247] RUNNING: 306 | WAITING: 628
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] WorkerProc hit an exception: %s
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] Traceback (most recent call last):
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 397, in worker_busy_loop
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     output = getattr(self.worker, method)(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/worker/gpu_worker.py", line 204, in execute_model
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/worker/gpu_model_runner.py", line 615, in execute_model
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     hidden_states = self.model(
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]                     ^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/model_executor/models/llama.py", line 571, in forward
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     raise RuntimeError("ERROR IN LLAMA!")
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] RuntimeError: ERROR IN LLAMA!
ERROR 01-04 17:21:04 core.py:200] EngineCore hit an exception: Traceback (most recent call last):
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 193, in run_engine_core
ERROR 01-04 17:21:04 core.py:200]     engine_core.run_busy_loop()
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 231, in run_busy_loop
ERROR 01-04 17:21:04 core.py:200]     outputs = self.step()
ERROR 01-04 17:21:04 core.py:200]               ^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 124, in step
ERROR 01-04 17:21:04 core.py:200]     output = self.model_executor.execute_model(scheduler_output)
ERROR 01-04 17:21:04 core.py:200]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 167, in execute_model
ERROR 01-04 17:21:04 core.py:200]     model_output = self.collective_rpc("execute_model",
ERROR 01-04 17:21:04 core.py:200]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 161, in collective_rpc
ERROR 01-04 17:21:04 core.py:200]     raise e
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 150, in collective_rpc
ERROR 01-04 17:21:04 core.py:200]     raise result
ERROR 01-04 17:21:04 core.py:200] RuntimeError: ERROR IN LLAMA!
ERROR 01-04 17:21:04 core.py:200] 
CRITICAL 01-04 17:21:04 async_llm.py:65] AsyncLLM got fatal signal from worker process, shutting down. See stack trace for root cause.
CRITICAL 01-04 17:21:05 launcher.py:91] Engine failed, terminating server.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1067793]

njhill added 3 commits April 11, 2025 18:42
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 12, 2025
@njhill njhill mentioned this pull request Apr 14, 2025
1 task
@njhill
Copy link
Member

njhill commented Apr 14, 2025

I think this is ready to land now, with an issue to be opened for some remaining follow-on tasks. The current failing CI tests (kernel-related etc) I am fairly certain are unrelated and are issues on main. Let's get agreement to merge as soon as the main branch issues are fixed and the tests are green again. Thanks for all of the great work @robertgshaw2-redhat @afeldman-nm

@njhill
Copy link
Member

njhill commented Apr 16, 2025

Kernel CI test failures are unrelated.

@vllm-bot vllm-bot merged commit 2b05b8c into vllm-project:main Apr 17, 2025
64 of 69 checks passed
@njhill njhill deleted the api-server-error-handling branch April 17, 2025 03:23
lionelvillard pushed a commit to lionelvillard/vllm that referenced this pull request Apr 17, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
@JaheimLee
Copy link

I encountered a timeout error when using torch compile. Why use timeout: Optional[float] = 180.0?

@njhill
Copy link
Member

njhill commented Apr 22, 2025

@JaheimLee sorry about this. Yes the default timeout here is too low in some cases, will be fixing it shortly.

Fix: #17000

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Agata Dobrzyniewicz <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: V1 cannot be run in Triton Inference Server Backend