fix: remove api_key_alias label from high-cardinality prometheus metrics #12099

colesmcintosh · 2025-06-27T11:43:38Z

Title

Remove api_key_alias label from high-cardinality prometheus metrics

Relevant issues

Fixes LIT-48

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

Removed api_key_alias label from 4 high-cardinality Prometheus metrics to prevent Datadog throttling:
- litellm_deployment_failure_responses
- litellm_deployment_total_requests
- litellm_proxy_total_requests
- litellm_proxy_failed_requests
Updated corresponding unit tests to reflect the label removal
This improves reliability of system health dashboards by reducing metric cardinality

Test Results

All prometheus integration tests pass: poetry run pytest tests/test_litellm/integrations/test_prometheus.py -v (14 passed)
Updated unit tests pass: Fixed 4 out of 5 failing tests related to the label removal

Remove api_key_alias label from the following metrics to reduce cardinality and prevent Datadog throttling: - litellm_deployment_failure_responses - litellm_deployment_total_requests - litellm_proxy_total_requests - litellm_proxy_failed_requests This change helps create more reliable system health dashboards by reducing metric cardinality caused by unique API key aliases.

vercel · 2025-06-27T11:43:43Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 27, 2025 4:13pm

krrishdholakia · 2025-06-28T04:23:24Z

Why do this?

colesmcintosh · 2025-06-28T15:57:01Z

@krrishdholakia This fixes LIT-48. The api_key_alias label is causing high cardinality issues with these 4 metrics:

litellm_proxy_total_requests_metric - Tracks every client request
litellm_proxy_failed_requests_metric - Tracks every failed request
litellm_deployment_failure_responses - Tracks every LLM API failure
litellm_deployment_total_requests - Tracks every LLM API call

Since these metrics fire on every single request, and each unique api_key_alias value creates a new time series, the cardinality explodes quickly:

100 unique API key aliases × 4 metrics = 400 base time series
With other label combinations (model, team, etc.), this multiplies further
Result: Potentially millions of unique time series

This causes:

Datadog throttling - Metrics get dropped when cardinality limits are exceeded
Incomplete dashboards - System health monitoring becomes unreliable
Increased costs - More time series = higher monitoring costs

The api_key_alias is still available on lower-volume metrics like spend tracking and budget metrics, so you can still track per-key usage where it matters most.

vercel bot had a problem deploying to Preview June 27, 2025 11:44 Failure

colesmcintosh marked this pull request as ready for review June 27, 2025 11:44

Merge branch 'BerriAI:main' into LIT-48-remove-api-key-alias-label

d57f725

vercel bot deployed to Preview June 27, 2025 15:41 View deployment

Merge branch 'BerriAI:main' into LIT-48-remove-api-key-alias-label

b0f8aa4

vercel bot deployed to Preview June 27, 2025 16:13 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: remove api_key_alias label from high-cardinality prometheus metrics #12099

fix: remove api_key_alias label from high-cardinality prometheus metrics #12099

Uh oh!

colesmcintosh commented Jun 27, 2025

Uh oh!

vercel bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

krrishdholakia commented Jun 28, 2025

Uh oh!

colesmcintosh commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

fix: remove api_key_alias label from high-cardinality prometheus metrics #12099

Are you sure you want to change the base?

fix: remove api_key_alias label from high-cardinality prometheus metrics #12099

Uh oh!

Conversation

colesmcintosh commented Jun 27, 2025

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Test Results

Uh oh!

vercel bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krrishdholakia commented Jun 28, 2025

Uh oh!

colesmcintosh commented Jun 28, 2025

Uh oh!

Uh oh!

vercel bot commented Jun 27, 2025 •

edited

Loading