-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[Feat] Performance - Don't create 1 task for every hanging request alert #11385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.
- Removed the old per-request hanging alert test
- Introduced
AlertingHangingRequestCheck
with in-memory caching and periodic checks - Refactored
response_taking_too_long
to enqueue requests rather than sleep per call
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/logging_callback_tests/test_alerting.py | Removed the legacy hanging request test |
litellm/types/integrations/slack_alerting.py | Added constants and HangingRequestData model |
litellm/proxy/utils.py | Scheduled single background task for hanging-request checks |
litellm/integrations/SlackAlerting/slack_alerting.py | Refactored response_taking_too_long to use the new checker |
litellm/integrations/SlackAlerting/hanging_request_check.py | New class handling batching, TTL, and alert dispatch |
litellm/caching/in_memory_cache.py | Added async_get_oldest_n_keys helper |
Comments suppressed due to low confidence (3)
tests/logging_callback_tests/test_alerting.py:146
- There is no test covering the new
AlertingHangingRequestCheck
workflow; consider adding tests that enqueue requests and simulate time passage to verify alerts are sent correctly.
@pytest.mark.asyncio
litellm/integrations/SlackAlerting/slack_alerting.py:458
- The signature of
response_taking_too_long
no longer accepts atype
parameter, but existing tests or call sites may still pass it. Update or overload this method to prevent breaking consumers.
async def response_taking_too_long(
litellm/caching/in_memory_cache.py:238
- The
List
type is used but not imported; addfrom typing import List
to avoid aNameError
.
async def async_get_oldest_n_keys(self, n: int) -> List[str]:
…ert (#11385) * feat: add async_get_oldest_n_keys in memory cache * fix: add add_request_to_hanging_request_check * test: alerting * feat: v2 hanging request check * fix: HangingRequestData * fix: AlertingHangingRequestCheck * fix: check_for_hanging_requests * fix: use correct metadata location for hanging requests * fix: formatting alert * test hanging request check * fix: add guard flags for background tasks alerting
…ert (BerriAI#11385) * feat: add async_get_oldest_n_keys in memory cache * fix: add add_request_to_hanging_request_check * test: alerting * feat: v2 hanging request check * fix: HangingRequestData * fix: AlertingHangingRequestCheck * fix: check_for_hanging_requests * fix: use correct metadata location for hanging requests * fix: formatting alert * test hanging request check * fix: add guard flags for background tasks alerting
[Feat] Performance - Don't create 1 task for every hanging request alert
This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/
directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit
Type
🧹 Refactoring
✅ Test
Changes