Skip to content

Exemplars UX improvements#5158

Merged
ruslan-mikhailov merged 15 commits intografana:mainfrom
ruslan-mikhailov:exemplars-improvements
Jun 2, 2025
Merged

Exemplars UX improvements#5158
ruslan-mikhailov merged 15 commits intografana:mainfrom
ruslan-mikhailov:exemplars-improvements

Conversation

@ruslan-mikhailov
Copy link
Copy Markdown
Contributor

@ruslan-mikhailov ruslan-mikhailov commented May 23, 2025

What this PR does:

  1. Distribute exemplars over time.
    a. Change the approach to calculating requested exemplars from frontend to querier. Instead of equal share, calculate it based on block time range. The biggest block will receive more exemplars than a small block
    b. Change sampling algorithm by fixing number of sampling buckets

Before the change:
image

After the change:
image

Results:
Before:
Pasted image 20250523145320

After:
Pasted image 20250523144347

  1. Hard limit returned exemplars from query-frontend during final aggregation.

Which issue(s) this PR fixes:
Fixes #4856
Fixes #4632

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ruslan-mikhailov ruslan-mikhailov force-pushed the exemplars-improvements branch 7 times, most recently from 74e9718 to 147cca9 Compare May 23, 2025 14:33
@ruslan-mikhailov ruslan-mikhailov marked this pull request as ready for review May 23, 2025 14:48

exemplarsPerBlock := s.exemplarsPerShard(uint32(len(metas)), searchReq.Exemplars)
for _, m := range metas {
exemplarsPerBlock := s.exemplarsPerShard(metas, searchReq.Exemplars)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also scale the number of exemplars for generatorRequest based on the time range? I think it sends over the whole limit, which was also a reason exemplars seemed to clump in recent data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! With bucket fix, it still should distribute fairly, but by cutting the exemplars parameter, we can reduce the load to generators.

I pushed exemplar splitting between generator and querier in a separate commit. Could you please take a look?

@knylander-grafana
Copy link
Copy Markdown
Contributor

knylander-grafana commented May 23, 2025

@ruslan-mikhailov Do we need to update the docs so people understand the changes you've made in this PR? Maybe this doc? https://grafana.com/docs/tempo/latest/traceql/metrics-queries/#exemplars

Also, have you seen this PR? #5129

Comment thread integration/e2e/api/query_range_test.go
Comment thread integration/e2e/limits_test.go
Copy link
Copy Markdown
Contributor

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Determine the number of requested exemplars from a block
based on its time range, not equally
@ruslan-mikhailov ruslan-mikhailov force-pushed the exemplars-improvements branch from f7d231c to 4b72208 Compare June 2, 2025 08:37
@ruslan-mikhailov
Copy link
Copy Markdown
Contributor Author

+ rebase from latest main

@ruslan-mikhailov
Copy link
Copy Markdown
Contributor Author

+ linter fixes

@ruslan-mikhailov ruslan-mikhailov force-pushed the exemplars-improvements branch from 833a0e8 to 6b9bafe Compare June 2, 2025 08:46
@ruslan-mikhailov ruslan-mikhailov merged commit a75b0bb into grafana:main Jun 2, 2025
20 checks passed
@ruslan-mikhailov ruslan-mikhailov deleted the exemplars-improvements branch June 2, 2025 09:52
ruslan-mikhailov added a commit to ruslan-mikhailov/tempo that referenced this pull request Jun 2, 2025
* TraceQL Metrics: more fair exemplars distribution

Determine the number of requested exemplars from a block
based on its time range, not equally

* TraceQL Metrics: distribute exemplars over time

UX improvement

* TraceQL Metrics: Parametrise exemplars limit

* TraceQL Metrics: pass exemplars limit to combiners

* TraceQL Metrics: hard limit exemplars

* TraceQL Metrics: restore default behaviour

* TraceQL Metrics: e2e test for number of returned exemplars

* TraceQL Metrics: state bug in quantile_over_time

* Changelog

* Stabilise flaky test

* nolint for integer overflow

* minor linter fixes

* TraceQL Metrics: share exemplars between generator and backend

This will reduce load on generators

* Add comment with link to bug

* minor linter fixes
ruslan-mikhailov added a commit that referenced this pull request Jun 2, 2025
* TraceQL Metrics: more fair exemplars distribution

Determine the number of requested exemplars from a block
based on its time range, not equally

* TraceQL Metrics: distribute exemplars over time

UX improvement

* TraceQL Metrics: Parametrise exemplars limit

* TraceQL Metrics: pass exemplars limit to combiners

* TraceQL Metrics: hard limit exemplars

* TraceQL Metrics: restore default behaviour

* TraceQL Metrics: e2e test for number of returned exemplars

* TraceQL Metrics: state bug in quantile_over_time

* Changelog

* Stabilise flaky test

* nolint for integer overflow

* minor linter fixes

* TraceQL Metrics: share exemplars between generator and backend

This will reduce load on generators

* Add comment with link to bug

* minor linter fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TraceQL Metrics exemplar distribution is skewed Query exemplars field not respected

4 participants