enhancement: Improve live store readiness state management#6238
enhancement: Improve live store readiness state management#6238oleg-kozlyuk-grafana merged 11 commits intografana:mainfrom
Conversation
|
|
||
| // ReadinessMaxWait is the maximum time to wait for catching up at startup. | ||
| // If this timeout is exceeded, the live-store becomes ready anyway. | ||
| // Only used if ReadinessTargetLag > 0. Default: 30m. |
There was a problem hiding this comment.
Can this creates a read outage if both zones have lag?
There was a problem hiding this comment.
It should not. The wait is bounded by ReadinessMaxWait, so if warpstream is very slow, live store will simply fall back to old behavior after a while.
EDIT: also, this behavior is only triggered at start - before live-store is marked ready, i.e. any behavior after startup sequence is unchanged
| } | ||
|
|
||
| // Wait for catch-up before marking ready (if enabled) | ||
| if err := s.waitForCatchUp(ctx); err != nil { |
There was a problem hiding this comment.
Should this not be in the starting method?
There was a problem hiding this comment.
starting should be finished before running, and a lot of live-store init is done in running. I would have to move a lot of other logic there as well. I'm not confident yet I can pull this off without breaking some subtle behavior
| // Calculate current lag | ||
| lag := s.calculateTimeLag() | ||
| if lag == nil { | ||
| level.Debug(s.logger).Log("msg", "catch-up lag could not be determined, waiting") |
There was a problem hiding this comment.
I think this can block forever. If there is no lag (nil) we should set the status to ready to serve
There was a problem hiding this comment.
Wait time is bounded by ReadinessMaxWait. Nil does not mean "no lag", it means "no data" - if there is actually no lag (empty partition or caught up), the function returns 0
| // Technically shaped as a nested loop, but we only have one partition (at the moment) | ||
| for _, fetch := range fetches { | ||
| for _, topic := range fetch.Topics { | ||
| for _, partition := range topic.Partitions { |
There was a problem hiding this comment.
Since we only handle one partition is not more precise to do a direct lookup like this?
lag = fetch[topic][partition]
There was a problem hiding this comment.
The effect would be the same, unless there is a specific functional concern?
| if err := t.liveStore.CheckReady(r.Context()); err != nil { | ||
| http.Error(w, "LiveStore not ready: "+err.Error(), http.StatusServiceUnavailable) | ||
| return | ||
| } |
There was a problem hiding this comment.
Annotations don't catch e2e tests yet
|
@codex can you review this for me as well? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86722a4ffe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for _, partition := range topic.Partitions { | ||
| // HighWatermark guaranteed to be >= LastStableOffset | ||
| lag := partition.HighWatermark - partition.LastStableOffset | ||
| r.lag.Store(lag) |
There was a problem hiding this comment.
Compute consumer lag from committed/fetch offsets
Here the lag is computed as HighWatermark - LastStableOffset, but LastStableOffset is a transactional commit boundary rather than the consumer’s position. For non‑transactional topics (the common case), LSO equals the HighWatermark, so this sets lag to 0 even when the consumer is far behind. That makes waitForCatchUp treat the partition as caught up (lag <= 1000) and the live store can report ready immediately, defeating the readiness gating. Consider deriving lag from the committed offset (or current fetch offset) vs end offset instead.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex create a PR fixing this to correct calculation
There was a problem hiding this comment.
Summary
- Updated live store lag calculation to use the committed/processed offset (with clamping) instead of LastStableOffset, ensuring readiness reflects true consumer lag. modules/livestore/partition_reader.goL82-L146
Testing
⚠️ not run (not requested)
mapno
left a comment
There was a problem hiding this comment.
Nice improvement. I have one Q about the initial lag calculation.
| // Check if we are near end or partition is empty | ||
| // Arbitrary value picked to shortcut calculations | ||
| if lag <= 1000 { | ||
| level.Debug(s.logger).Log( | ||
| "msg", "At or close to partition end", | ||
| "lag", lag) | ||
| return &zero | ||
| } |
There was a problem hiding this comment.
This can be any amount of time, right? Should it be based on the time since the last record's ts?
There was a problem hiding this comment.
This is a heuristic to shortcut the wait in cases when lag may be seen as higher than it is for different reasons - e.g. clock skew. The logic here is that live-store should be able to process this amount of traces in roughly a second (I am basing this off of observations of our prod metrics, where lag of ~300k records corresponds to ~200s of timestamp lag). Hence, when we are within ~1000 records, we are already within the target that this change pursues.
# Conflicts: # CHANGELOG.md # example/docker-compose/ingest-storage/docker-compose.yaml
* Promote vParquet5 to stable preview (#6219) * Promote vParquet5 to stable preview * changelog * chore(deps): update anchore/sbom-action action to v0.21.1 (#6222) | datasource | package | from | to | | ----------- | ------------------- | ------- | ------- | | github-tags | anchore/sbom-action | v0.21.0 | v0.21.1 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [BUGFIX] Fix tag handling when non-attribute runes are present (#6223) * fix Signed-off-by: Joe Elliott <number101010@gmail.com> * test cases Signed-off-by: Joe Elliott <number101010@gmail.com> * fix integration test Signed-off-by: Joe Elliott <number101010@gmail.com> * more test cases Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * i will have your head, linter Signed-off-by: Joe Elliott <number101010@gmail.com> * fix unit tests Signed-off-by: Joe Elliott <number101010@gmail.com> * fixed integration tests Signed-off-by: Joe Elliott <number101010@gmail.com> * we call this "marty's test" Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog (#6226) Signed-off-by: Joe Elliott <number101010@gmail.com> * chore(deps): update golang:alpine docker digest to e689855 (#6231) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/alloy docker tag to v1.12.2 (#6232) | datasource | package | from | to | | ---------- | ------------- | ------- | ------- | | docker | grafana/alloy | v1.12.1 | v1.12.2 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * v2.10 doc for metrics-generator for 5954, 5788, 5710 (#6192) * [external mode] Use v1 trace query (#6236) * [external mode] Use v1 trace query * update CHANGELOG * update docs * Update comments * fix(deps): update module cloud.google.com/go/storage to v1.59.0 (#6241) | datasource | package | from | to | | ---------- | --------------------------- | ------- | ------- | | go | cloud.google.com/go/storage | v1.58.0 | v1.59.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/parquet-go/parquet-go to v0.27.0 (#6243) | datasource | package | from | to | | ---------- | -------------------------------- | ------- | ------- | | go | github.com/parquet-go/parquet-go | v0.26.4 | v0.27.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update actions/cache digest to 8b402f5 (#6240) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update golang:alpine docker digest to d9b2e14 (#6249) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/writers-toolkit digest to 15eed0b (#6251) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * enhancement: remove metrics summary panels (#6221) * enhancement: remove metrics summary panels * changelog * fix changelog * [bugfix] avg_over_time calculation fix (#6252) * fix(deps): update module golang.org/x/net to v0.49.0 (#6253) | datasource | package | from | to | | ---------- | ---------------- | ------- | ------- | | go | golang.org/x/net | v0.48.0 | v0.49.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module golang.org/x/tools to v0.41.0 (#6254) | datasource | package | from | to | | ---------- | ------------------ | ------- | ------- | | go | golang.org/x/tools | v0.40.0 | v0.41.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/writers-toolkit digest to 5b17f04 (#6256) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/azure/azure-sdk-for-go/sdk/azcore to v1.21.0 (#6257) | datasource | package | from | to | | ---------- | -------------------------------------------- | ------- | ------- | | go | github.com/Azure/azure-sdk-for-go/sdk/azcore | v1.20.0 | v1.21.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [DOC] Updates for 2.8 and 2.9 patches rel notes (#6248) * Updates for 2.8 and 2.9 patches rel notes * Apply suggestions from code review * Update docs/sources/tempo/release-notes/v2-9.md Co-authored-by: J Pham <94262131+ie-pham@users.noreply.github.com> --------- Co-authored-by: J Pham <94262131+ie-pham@users.noreply.github.com> * Update MinIO broken link in s3.md (#5968) * Update MinIO broken link in s3.md Updated the link for MinIO client credentials configuration file to the latest documentation. * Update docs/sources/tempo/configuration/hosted-storage/s3.md --------- Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Tempo 2.7 Changelog: Add proper breaking change warning (#6265) * Add proper breaking change warning Signed-off-by: Joe Elliott <number101010@gmail.com> * Update CHANGELOG.md --------- Signed-off-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Consolidate docker-compose for Tempo 3.0 (#6183) * remove deprecated modes Signed-off-by: Joe Elliott <number101010@gmail.com> * consolidate alloy and local storage Signed-off-by: Joe Elliott <number101010@gmail.com> * distributed example Signed-off-by: Joe Elliott <number101010@gmail.com> * consolidated the vulture example into single-binary Signed-off-by: Joe Elliott <number101010@gmail.com> * compactor => scheduler/worker Signed-off-by: Joe Elliott <number101010@gmail.com> * debug! Signed-off-by: Joe Elliott <number101010@gmail.com> * multi-tenant Signed-off-by: Joe Elliott <number101010@gmail.com> * otel collector -> multitenant Signed-off-by: Joe Elliott <number101010@gmail.com> * fork all shared Signed-off-by: Joe Elliott <number101010@gmail.com> * readmes Signed-off-by: Joe Elliott <number101010@gmail.com> * enable mcp Signed-off-by: Joe Elliott <number101010@gmail.com> * clean up compose references Signed-off-by: Joe Elliott <number101010@gmail.com> * restore local readme Signed-off-by: Joe Elliott <number101010@gmail.com> * Add vulture to all examples Signed-off-by: Joe Elliott <number101010@gmail.com> * prometheus config Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * feat: add a new metric to measure batch weight (#6261) * feat: add a new metric to measure batch weight * fix panic in test * remove std/null * fix(deps): update module cloud.google.com/go/storage to v1.59.1 (#6269) | datasource | package | from | to | | ---------- | --------------------------- | ------- | ------- | | go | cloud.google.com/go/storage | v1.59.0 | v1.59.1 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [jsonnet] Update memcached to latest (#6268) * update memcached to latest Signed-off-by: Joe Elliott <number101010@gmail.com> * gen jsonnet Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Introduce VPA for several components and flesh out structure (#6224) * Introduce VPA for backend-scheduler and flesh out structure * Include additional components * Add memcached VPA and associated config for CPU scaling * [DOC] Doc for minInt and maxInt 5982 (#6237) * Doc for minInt and maxInt 5982 * Update docs/sources/tempo/traceql/construct-traceql-queries.md * [DOC] Release notes for 2.10 (#6242) * Update docs/sources/tempo/release-notes/v2-10.md * fix(deps): update module google.golang.org/api to v0.260.0 (#6270) | datasource | package | from | to | | ---------- | --------------------- | -------- | -------- | | go | google.golang.org/api | v0.259.0 | v0.260.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * Add span_multiplier_key to user-configurable overrides API (#6260) * Add span_multiplier_key to user-configurable overrides API Add span_multiplier_key as a per-tenant configurable setting for both SpanMetrics and ServiceGraphs processors through the user-configurable overrides API. This allows tenants to specify the attribute key used for span multiplier values to compensate for head-based sampling. * changelog * docs * change: Enable traceQl metrics queries in Vulture by default (#6275) * change: Enable traceQl metrics queries in Vulture by default * changelog * chore(deps): update grafana/writers-toolkit digest to b25ced9 (#6277) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/grafana/tanka to v0.36.3 (#6278) | datasource | package | from | to | | ---------- | ------------------------ | ------- | ------- | | go | github.com/grafana/tanka | v0.36.2 | v0.36.3 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * feat(distributor): add support for trace push middlewares (#6250) * feat(distributor): add support for trace push middlewares * feat(distributor): implement trace push middleware support * feat(distributor): implement trace push middleware support * feat(distributor): implement trace push middleware support * feat(distributor): implement trace push middleware support * feat(distributor): update trace push middleware to fail open on errors * refactor(distributor): wove TracePushMiddleware * Use sonic lib for dedicated columns (#6262) * [Tempo 3.0] Remove compactor and v2 block encoding code (#6273) * o7 compactor Signed-off-by: Joe Elliott <number101010@gmail.com> * wip: o7 v2 Signed-off-by: Joe Elliott <number101010@gmail.com> * test cleanup Signed-off-by: Joe Elliott <number101010@gmail.com> * remove compactor from jsonnet Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * gen manifest Signed-off-by: Joe Elliott <number101010@gmail.com> * config cleanup Signed-off-by: Joe Elliott <number101010@gmail.com> * docs Signed-off-by: Joe Elliott <number101010@gmail.com> * jsonnet Signed-off-by: Joe Elliott <number101010@gmail.com> * remove encoding Signed-off-by: Joe Elliott <number101010@gmail.com> * remove data encoding Signed-off-by: Joe Elliott <number101010@gmail.com> * lint and cleanup Signed-off-by: Joe Elliott <number101010@gmail.com> * unflake TestWorker\? Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * [jsonnet/microservices]: protect VPA config rendering field references (#6281) * Compile jsonnet * [jsonnet/microservices]: protect VPA config rendering field references * Drop missing fields for lint * Add jsonnet compilation for extras (#6272) * Add compiled jsonnet for "with extras" to validate extra options * Add withInet6 to demonstrate new compiled approach * Add config to enable VPA on components * [DOC] Update Parquet schema doc (#6220) * Update Parquet schema doc * Update schema example to full section, collapsible * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Martin Disibio <mdisibio@gmail.com> --------- Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix typo in dedicated_columns.md (#6284) * change: increase default max duration for metrics queries to one day (#6285) * change: increase default max duration for metrics queries to one day * changelog * manifest * remove std folder * Trigger Build * fix(deps): update module github.com/klauspost/compress to v1.18.3 (#6287) | datasource | package | from | to | | ---------- | ----------------------------- | ------- | ------- | | go | github.com/klauspost/compress | v1.18.2 | v1.18.3 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [Tempo 3.0] Clean up targets (#6283) * remove ssb target and move all-3.0 to all Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * wait a little longer Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Introduce PDB for several components (#6271) * Introduce PDB for several components * Include pod-disruption-budget.libsonnet * Allow configuration of maxUnavailable * Adhere to config standard * Add pdb configuration to compiled jsonnet extras * fix(deps): update module github.com/olekukonko/tablewriter to v1.1.3 (#6291) | datasource | package | from | to | | ---------- | --------------------------------- | ------ | ------ | | go | github.com/olekukonko/tablewriter | v1.1.2 | v1.1.3 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * Doc for vulture (#6274) * Doc for vulture * Apply suggestions from code review * Apply suggestions from code review * chore(deps): update golang docker tag to v1.25.6 (#6294) | datasource | package | from | to | | ---------- | ------- | ------ | ------ | | docker | golang | 1.25.5 | 1.25.6 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * Remove duplicate dimension validation (#6288) * config: remove duplicate dimensions validation we have identified some valid use cases such as using multiple SDKs using different conventions, such as {"deployment_environment", "deployment.environment"} it still makes sense to keep it for dimension mappings as these can cause confusion * changelog * traceql: add comment on ParseIdentifier usage (#6289) * [Bugfix] instant rate (#6205) Pass instant param to detect if request is instant or not * change: expose otlp grpc and http ports for docker examples (#6296) * change: expose otlp grpc and http ports for docker examples * fix typo exposing the port * changelog * chore(deps): update grafana/writers-toolkit digest to d94b5a5 (#6295) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update otel/opentelemetry-collector docker tag to v0.143.1 (#6267) | datasource | package | from | to | | ---------- | ---------------------------- | ------- | ------- | | docker | otel/opentelemetry-collector | 0.139.0 | 0.143.1 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/alloy docker tag to v1.12.2 (#6266) | datasource | package | from | to | | ---------- | ------------- | ------- | ------- | | docker | grafana/alloy | v1.12.1 | v1.12.2 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update docker.redpanda.com/redpandadata/console docker tag to v3 (#6177) | datasource | package | from | to | | ---------- | ---------------------------------------- | ------- | ------ | | docker | docker.redpanda.com/redpandadata/console | v2.8.10 | v3.5.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * changelog (#6301) Signed-off-by: Joe Elliott <number101010@gmail.com> * renovate: automatically patch release versions (#6290) * chore(deps): update grafana/shared-workflows/ action to (#6140) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(config): migrate Renovate config (#6311) * chore(config): migrate config .github/renovate.json5 * Update Renovate config validation action version --------- Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: Oleg V. Kozlyuk <oleg.kozliuk@grafana.com> * enhancement: Improve live store readiness state management (#6238) * Add readiness_target_lag/readiness_max_wait parameters * Add ready check for livestore into app readiness endpoint * Adding e2e test to check new behaviors * Add readiness_target_lag to ingest-storage exampple * e2e fix attempt * Relaxed threshold, clarified .gitignore * Test & docker fixes * Improved handling of empty partition * Review notes * Do not set error on shutdown * chore(deps): update grafana/shared-workflows/ action to (#6314) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix: CHANGELOG.md bad merge (#6319) * chore(deps): update alpine:3.23 docker digest to 2510918 (#6324) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update otel/opentelemetry-collector docker tag to v0.144.0 (#6317) | datasource | package | from | to | | ---------- | ---------------------------- | ------- | ------- | | docker | otel/opentelemetry-collector | 0.143.1 | 0.144.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/alicebob/miniredis/v2 to v2.36.0 (#6327) | datasource | package | from | to | | ---------- | -------------------------------- | ------- | ------- | | go | github.com/alicebob/miniredis/v2 | v2.35.0 | v2.36.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore: Fixing flaky tests (#6326) * Fixed flaky servicegraphs test * Fixed flaky TestGenerateFakeSearchResponsePossibility/possibility_0.25 test * make fmt * Fix flaky search_sharder_test * Attempt at fixing flaky TestWorker test * Make lint happy * chore(deps): update golang:1.25.6-alpine docker digest to 660f0b8 (#6325) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [vParquet5] Allow up to 20 dedicated string columns and add new 3% target heuristic (#6282) * Allow up to 20 dedicated strings in vparquet5, and update tempo-cli analysis tools with new 3% target heurstic for strings, similar to how we are doing for ints * Fix num int attr default back to 5 * changelog * Cleanup leftover int fields. Validate dedicated column overrides and treat too many columns as a warning * Lint to return err last * fix(deps): update github.com/twmb/franz-go/pkg/kfake digest to e0832fc (#6331) | datasource | package | from | to | | ---------- | ---------------------------------- | ---------------------------------- | ---------------------------------- | | go | github.com/twmb/franz-go/pkg/kfake | v0.0.0-20251227070528-0c71f7e25fa1 | v0.0.0-20260121195810-e0832fcbdccb | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update anchore/sbom-action action to v0.22.0 (#6333) | datasource | package | from | to | | ----------- | ------------------- | ------- | ------- | | github-tags | anchore/sbom-action | v0.21.1 | v0.22.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module google.golang.org/api to v0.261.0 (#6334) | datasource | package | from | to | | ---------- | --------------------- | -------- | -------- | | go | google.golang.org/api | v0.260.0 | v0.261.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/twmb/franz-go/pkg/kadm to v1.17.2 (#6332) | datasource | package | from | to | | ---------- | --------------------------------- | ------- | ------- | | go | github.com/twmb/franz-go/pkg/kadm | v1.17.1 | v1.17.2 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix: check-fmt failing on changes in ./tools/vendor/ (#6335) * chore(deps): update actions/cache digest to cdf6c1f (#6336) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/writers-toolkit digest to 5822a24 (#6341) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/bytedance/sonic to v1.15.0 (#6342) | datasource | package | from | to | | ---------- | -------------------------- | ------- | ------- | | go | github.com/bytedance/sonic | v1.14.2 | v1.15.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [jsonnet/microservices]: configure zone-aware PDB for live-store (#6328) * [jsonnet/microservices]: configure zone-aware PDB for live-store * Compile jsonnet * [jsonnet]: ensure with-extras compiled output is checked (#6329) * [jsonnet]: ensure with-extras compiled output is checked * Ensure we check for untracked files * receiver: pass TracerProvider to shim (#6339) * fix(deps): update module google.golang.org/api to v0.262.0 (#6346) | datasource | package | from | to | | ---------- | --------------------- | -------- | -------- | | go | google.golang.org/api | v0.261.0 | v0.262.0 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update golang:1.25.6-alpine docker digest to 98e6cff (#6349) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * chore(deps): update grafana/shared-workflows/ action to (#6337) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * fix(deps): update module github.com/alicebob/miniredis/v2 to v2.36.1 (#6351) | datasource | package | from | to | | ---------- | -------------------------------- | ------- | ------- | | go | github.com/alicebob/miniredis/v2 | v2.36.0 | v2.36.1 | Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * [bugfix] Fix live-store deadlock occurring after a complete block failure (#6338) * chore(deps): update grafana/writers-toolkit digest to 9b3c337 (#6354) Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> * change: remove all traces of ingesters from the dashboards (#6352) * change: remove all traces of ingesters from the dashboards * changelog * fix live store matcher * feat: expose a new histogram metric to track the jobs per query distribution (#6343) * track jobs per query with a new metric. Only trace lookup and search * track jobs for searching tags * add proper prefix * changelog --------- Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Signed-off-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Martin Disibio <martin.disibio@grafana.com> Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com> Co-authored-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> Co-authored-by: Javi <javiermolinar@live.com> Co-authored-by: Ruslan Mikhailov <195758209+ruslan-mikhailov@users.noreply.github.com> Co-authored-by: J Pham <94262131+ie-pham@users.noreply.github.com> Co-authored-by: Lu Shueh Chou <evanlu361425@gmail.com> Co-authored-by: Zach Leslie <zach.leslie@grafana.com> Co-authored-by: Carles Garcia <carles.garciacabot@grafana.com> Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Oleg V. Kozlyuk <oleg.kozliuk@grafana.com> Co-authored-by: Yuna Verheyden <yuna.verheyden@posteo.net>
What this PR does:
This PR improves live-store behavior on initialization in 2 dimensions:
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]