Skip to content

fix(litellm): Avoid double span exits when streaming#5933

Draft
alexander-alderman-webb wants to merge 10 commits intomasterfrom
webb/litellm/close-spans
Draft

fix(litellm): Avoid double span exits when streaming#5933
alexander-alderman-webb wants to merge 10 commits intomasterfrom
webb/litellm/close-spans

Conversation

@alexander-alderman-webb
Copy link
Copy Markdown
Contributor

@alexander-alderman-webb alexander-alderman-webb commented Apr 1, 2026

Description

Only exit the span on the final invocation of _success_callback when litellm streams a response.

The litellm.success_callback callbacks are fired multiple times when streaming a response with litellm.

Avoid an unhandled SDK exception by using litellm like a user would in the relevant test.

Issues

Reminders

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (ai) Redact base64 data URLs in image_url content blocks by ericapisani in #5953
  • (integrations) Instrument pyreqwest tracing by servusdei2018 in #5682

Bug Fixes 🐛

Anthropic

  • Capture exceptions for stream() calls by alexander-alderman-webb in #5950
  • Stop setting transaction status when child span fails by alexander-alderman-webb in #5717
  • Only finish relevant spans in .create() patches by alexander-alderman-webb in #5716

Pydantic Ai

  • Adapt import for new library versions by alexander-alderman-webb in #5984
  • Use first-class hooks when available by alexander-alderman-webb in #5947

Other

  • (litellm) Avoid double span exits when streaming by alexander-alderman-webb in #5933
  • (wsgi) Respect HTTP_X_FORWARDED_PROTO in request.url construction by sl0thentr0py in #5963

Internal Changes 🔧

  • (ai) Remove gen_ai.tool.type span attribute by ericapisani in #5964
  • (anthropic) Separate sync and async .create() patches by alexander-alderman-webb in #5715
  • (openai) Split token counting by API for easier deprecation by ericapisani in #5930
  • (opentelemetry) Ignore mypy error by alexander-alderman-webb in #5927
  • 🤖 Update test matrix with new releases (04/13) by github-actions in #5983
  • Fix license metadata in setup.py by sl0thentr0py in #5934
  • Update validate-pr workflow by stephanie-anderson in #5931

Other

  • Handle None span context in the span processor and pin tokenizers version for anthropic tests on Python 3.8 by alexander-alderman-webb in #5967

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Codecov Results 📊

48 passed | Total: 48 | Pass Rate: 100% | Execution Time: 11.07s

📊 Comparison with Base Branch

Metric Change
Total Tests
Passed Tests
Failed Tests
Skipped Tests

✨ No test changes detected

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 15724 uncovered lines.
❌ Project coverage is 25.95%. Comparing base (base) to head (head).

Files with missing lines (1)
File Patch % Lines
litellm.py 0.00% ⚠️ 143 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    25.96%    25.95%    -0.01%
==========================================
  Files          191       191         —
  Lines        21230     21235        +5
  Branches      6980      6984        +4
==========================================
+ Hits          5511      5511         —
- Misses       15719     15724        +5
- Partials       524       524         —

Generated by Codecov Action

@alexander-alderman-webb alexander-alderman-webb marked this pull request as ready for review April 1, 2026 13:56
@alexander-alderman-webb alexander-alderman-webb requested a review from a team as a code owner April 1, 2026 13:56
@alexander-alderman-webb alexander-alderman-webb marked this pull request as draft April 1, 2026 14:01
@alexander-alderman-webb alexander-alderman-webb marked this pull request as ready for review April 1, 2026 14:10
@alexander-alderman-webb alexander-alderman-webb marked this pull request as draft April 1, 2026 14:20
@alexander-alderman-webb alexander-alderman-webb marked this pull request as ready for review April 10, 2026 14:00
@alexander-alderman-webb alexander-alderman-webb marked this pull request as draft April 10, 2026 14:47
is_streaming = kwargs.get("stream")
# Callback is fired multiple times when streaming a response.
# Streaming flag checked at https://github.com/BerriAI/litellm/blob/33c3f13443eaf990ac8c6e3da78bddbc2b7d0e7a/litellm/litellm_core_utils/litellm_logging.py#L1603
if is_streaming is not True or "complete_streaming_response" in kwargs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure callback lacks streaming-aware span handling

The _success_callback now correctly pops the span from metadata to prevent double exits during streaming, but _failure_callback still uses .get() instead of .pop() and doesn't check for streaming status. If failure_callback is also invoked multiple times during streaming (similar to success callbacks), or if both callbacks fire for the same request, this could still cause double span exits or resource leaks.

Verification

Read the full litellm.py file (lines 1-332). Compared _success_callback (lines 173-240) with _failure_callback (lines 243-264). The success callback now uses metadata.pop("_sentry_span", None) at line 238 and checks streaming status at line 234-237, while the failure callback at line 250 still uses _get_metadata_dict(kwargs).get("_sentry_span") without popping and has no streaming checks. Verified the PR is specifically addressing streaming double-exit issues.

Identified by Warden code-review · CHR-G5Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant