fix(litellm): Avoid double span exits when streaming#5933
fix(litellm): Avoid double span exits when streaming#5933alexander-alderman-webb wants to merge 10 commits intomasterfrom
Conversation
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Bug Fixes 🐛Anthropic
Pydantic Ai
Other
Internal Changes 🔧
Other
🤖 This preview updates automatically when you update the PR. |
Codecov Results 📊✅ 48 passed | Total: 48 | Pass Rate: 100% | Execution Time: 11.07s 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ❌ Patch coverage is 0.00%. Project has 15724 uncovered lines. Files with missing lines (1)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
- Coverage 25.96% 25.95% -0.01%
==========================================
Files 191 191 —
Lines 21230 21235 +5
Branches 6980 6984 +4
==========================================
+ Hits 5511 5511 —
- Misses 15719 15724 +5
- Partials 524 524 —Generated by Codecov Action |
| is_streaming = kwargs.get("stream") | ||
| # Callback is fired multiple times when streaming a response. | ||
| # Streaming flag checked at https://github.com/BerriAI/litellm/blob/33c3f13443eaf990ac8c6e3da78bddbc2b7d0e7a/litellm/litellm_core_utils/litellm_logging.py#L1603 | ||
| if is_streaming is not True or "complete_streaming_response" in kwargs: |
There was a problem hiding this comment.
Failure callback lacks streaming-aware span handling
The _success_callback now correctly pops the span from metadata to prevent double exits during streaming, but _failure_callback still uses .get() instead of .pop() and doesn't check for streaming status. If failure_callback is also invoked multiple times during streaming (similar to success callbacks), or if both callbacks fire for the same request, this could still cause double span exits or resource leaks.
Verification
Read the full litellm.py file (lines 1-332). Compared _success_callback (lines 173-240) with _failure_callback (lines 243-264). The success callback now uses metadata.pop("_sentry_span", None) at line 238 and checks streaming status at line 234-237, while the failure callback at line 250 still uses _get_metadata_dict(kwargs).get("_sentry_span") without popping and has no streaming checks. Verified the PR is specifically addressing streaming double-exit issues.
Identified by Warden code-review · CHR-G5Z
Description
Only exit the span on the final invocation of
_success_callbackwhenlitellmstreams a response.The
litellm.success_callbackcallbacks are fired multiple times when streaming a response withlitellm.Avoid an unhandled SDK exception by using
litellmlike a user would in the relevant test.Issues
Reminders
tox -e linters.feat:,fix:,ref:,meta:)