Vulture: fix for query_end_cutoff#7018
Conversation
| resp, err := client.MetricsQueryRange( | ||
| fmt.Sprintf(`{.%s = "%s"} | count_over_time()`, attr.Key, util.StringifyAnyValue(attr.Value)), | ||
| start, end, "1m", 0, | ||
| start, end, "10s", 0, |
There was a problem hiding this comment.
I was thinking of a config parameter, but won't it make configuration too complex? 10 seconds I think a good middle ground.
There was a problem hiding this comment.
Pull request overview
Adjusts tempo-vulture’s TraceQL metrics range query resolution to avoid false negatives when Tempo’s query_end_cutoff truncates the most recent bucket, particularly for traces near the long write backoff window.
Changes:
- Change
MetricsQueryRangestep from1mto10sin vulture’s metrics checker.
| resp, err := client.MetricsQueryRange( | ||
| fmt.Sprintf(`{.%s = "%s"} | count_over_time()`, attr.Key, util.StringifyAnyValue(attr.Value)), | ||
| start, end, "1m", 0, | ||
| start, end, "10s", 0, | ||
| ) |
There was a problem hiding this comment.
Changing the metrics query step from 1m to 10s increases the number of samples returned per call by ~6x (for the 1h window used here), and vulture runs this on a ticker (default 10s). That can noticeably increase query load/response sizes in environments with many vulture instances. Would it make sense to (a) derive the step from a config/flag (or at least a named constant with a short rationale about query_end_cutoff), or (b) avoid a dense range query by switching this check to an instant query / smaller time window so the cutoff workaround doesn’t translate into extra steady-state load?
What this PR does: it fixes vulture metrics check when query_end_cutoff is enabled. With step=1m, for traces that are close to
longWriteBackoff(default is 1 minute), request will hit the cutoff and remove the last bucket. In order to solve the problem, I just reduce step duration to 10 seconds.I ran it for some time to confirm it solves the problem:

Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]