[Feat] Performance - Don't create 1 task for every hanging request alert #11385

ishaan-jaff · 2025-06-03T23:57:56Z

[Feat] Performance - Don't create 1 task for every hanging request alert

This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.

Removed the old per-request hanging alert test
Refactored response_taking_too_long to enqueue requests rather than sleep per call

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring
✅ Test

Changes

vercel · 2025-06-03T23:58:00Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 4, 2025 2:20am

Copilot

Pull Request Overview

This PR adds a unified mechanism to batch and check for hanging LLM requests instead of spawning an alert task per request.

Removed the old per-request hanging alert test
Introduced AlertingHangingRequestCheck with in-memory caching and periodic checks
Refactored response_taking_too_long to enqueue requests rather than sleep per call

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/logging_callback_tests/test_alerting.py	Removed the legacy hanging request test
litellm/types/integrations/slack_alerting.py	Added constants and `HangingRequestData` model
litellm/proxy/utils.py	Scheduled single background task for hanging-request checks
litellm/integrations/SlackAlerting/slack_alerting.py	Refactored `response_taking_too_long` to use the new checker
litellm/integrations/SlackAlerting/hanging_request_check.py	New class handling batching, TTL, and alert dispatch
litellm/caching/in_memory_cache.py	Added `async_get_oldest_n_keys` helper

Comments suppressed due to low confidence (3)

tests/logging_callback_tests/test_alerting.py:146

There is no test covering the new AlertingHangingRequestCheck workflow; consider adding tests that enqueue requests and simulate time passage to verify alerts are sent correctly.

@pytest.mark.asyncio

litellm/integrations/SlackAlerting/slack_alerting.py:458

The signature of response_taking_too_long no longer accepts a type parameter, but existing tests or call sites may still pass it. Update or overload this method to prevent breaking consumers.

async def response_taking_too_long(

litellm/caching/in_memory_cache.py:238

The List type is used but not imported; add from typing import List to avoid a NameError.

async def async_get_oldest_n_keys(self, n: int) -> List[str]:

litellm/proxy/utils.py

…ert (#11385) * feat: add async_get_oldest_n_keys in memory cache * fix: add add_request_to_hanging_request_check * test: alerting * feat: v2 hanging request check * fix: HangingRequestData * fix: AlertingHangingRequestCheck * fix: check_for_hanging_requests * fix: use correct metadata location for hanging requests * fix: formatting alert * test hanging request check * fix: add guard flags for background tasks alerting

…ert (BerriAI#11385) * feat: add async_get_oldest_n_keys in memory cache * fix: add add_request_to_hanging_request_check * test: alerting * feat: v2 hanging request check * fix: HangingRequestData * fix: AlertingHangingRequestCheck * fix: check_for_hanging_requests * fix: use correct metadata location for hanging requests * fix: formatting alert * test hanging request check * fix: add guard flags for background tasks alerting

ishaan-jaff added 4 commits June 3, 2025 16:54

feat: add async_get_oldest_n_keys in memory cache

d3fe402

fix: add add_request_to_hanging_request_check

2911ff0

test: alerting

fafb880

feat: v2 hanging request check

00fb143

vercel bot deployed to Preview June 3, 2025 23:57 View deployment

ishaan-jaff added 3 commits June 3, 2025 19:00

fix: HangingRequestData

457d027

fix: AlertingHangingRequestCheck

7ecacaf

fix: check_for_hanging_requests

366126f

vercel bot deployed to Preview June 4, 2025 02:04 View deployment

ishaan-jaff requested a review from Copilot June 4, 2025 02:04

Copilot AI reviewed Jun 4, 2025

View reviewed changes

litellm/proxy/utils.py Show resolved Hide resolved

fix: use correct metadata location for hanging requests

33ee0f4

vercel bot deployed to Preview June 4, 2025 02:08 View deployment

fix: formatting alert

feac7cf

vercel bot deployed to Preview June 4, 2025 02:10 View deployment

test hanging request check

0d0036e

vercel bot deployed to Preview June 4, 2025 02:15 View deployment

fix: add guard flags for background tasks alerting

0ff3ea8

vercel bot deployed to Preview June 4, 2025 02:20 View deployment

ishaan-jaff merged commit a1f3a1c into main Jun 4, 2025
42 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat] Performance - Don't create 1 task for every hanging request alert #11385

[Feat] Performance - Don't create 1 task for every hanging request alert #11385

Uh oh!

ishaan-jaff commented Jun 3, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Feat] Performance - Don't create 1 task for every hanging request alert #11385

[Feat] Performance - Don't create 1 task for every hanging request alert #11385

Uh oh!

Conversation

ishaan-jaff commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!