Skip to content

[Feat] Performance - Don't create 1 task for every hanging request alert #11385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 4, 2025

Conversation

ishaan-jaff
Copy link
Contributor

@ishaan-jaff ishaan-jaff commented Jun 3, 2025

[Feat] Performance - Don't create 1 task for every hanging request alert

This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.

  • Removed the old per-request hanging alert test
  • Refactored response_taking_too_long to enqueue requests rather than sleep per call

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring
✅ Test

Changes

Copy link

vercel bot commented Jun 3, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 4, 2025 2:20am

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.

  • Removed the old per-request hanging alert test
  • Introduced AlertingHangingRequestCheck with in-memory caching and periodic checks
  • Refactored response_taking_too_long to enqueue requests rather than sleep per call

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/logging_callback_tests/test_alerting.py Removed the legacy hanging request test
litellm/types/integrations/slack_alerting.py Added constants and HangingRequestData model
litellm/proxy/utils.py Scheduled single background task for hanging-request checks
litellm/integrations/SlackAlerting/slack_alerting.py Refactored response_taking_too_long to use the new checker
litellm/integrations/SlackAlerting/hanging_request_check.py New class handling batching, TTL, and alert dispatch
litellm/caching/in_memory_cache.py Added async_get_oldest_n_keys helper
Comments suppressed due to low confidence (3)

tests/logging_callback_tests/test_alerting.py:146

  • There is no test covering the new AlertingHangingRequestCheck workflow; consider adding tests that enqueue requests and simulate time passage to verify alerts are sent correctly.
@pytest.mark.asyncio

litellm/integrations/SlackAlerting/slack_alerting.py:458

  • The signature of response_taking_too_long no longer accepts a type parameter, but existing tests or call sites may still pass it. Update or overload this method to prevent breaking consumers.
async def response_taking_too_long(

litellm/caching/in_memory_cache.py:238

  • The List type is used but not imported; add from typing import List to avoid a NameError.
async def async_get_oldest_n_keys(self, n: int) -> List[str]:

@ishaan-jaff ishaan-jaff merged commit a1f3a1c into main Jun 4, 2025
42 of 46 checks passed
ishaan-jaff added a commit that referenced this pull request Jun 4, 2025
…ert (#11385)

* feat: add async_get_oldest_n_keys in memory cache

* fix: add add_request_to_hanging_request_check

* test: alerting

* feat: v2 hanging request check

* fix: HangingRequestData

* fix: AlertingHangingRequestCheck

* fix: check_for_hanging_requests

* fix: use correct metadata location for hanging requests

* fix: formatting alert

* test hanging request check

* fix: add guard flags for background tasks alerting
stefan-- pushed a commit to stefan--/litellm that referenced this pull request Jun 12, 2025
…ert (BerriAI#11385)

* feat: add async_get_oldest_n_keys in memory cache

* fix: add add_request_to_hanging_request_check

* test: alerting

* feat: v2 hanging request check

* fix: HangingRequestData

* fix: AlertingHangingRequestCheck

* fix: check_for_hanging_requests

* fix: use correct metadata location for hanging requests

* fix: formatting alert

* test hanging request check

* fix: add guard flags for background tasks alerting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant