Skip to content

[TraceQL Metrics] Add local disk caching for generator completed blocks#3799

Merged
mdisibio merged 6 commits intografana:mainfrom
mdisibio:generator-query-range-cache
Jun 21, 2024
Merged

[TraceQL Metrics] Add local disk caching for generator completed blocks#3799
mdisibio merged 6 commits intografana:mainfrom
mdisibio:generator-query-range-cache

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Jun 20, 2024

What this PR does:
Generators service metrics queries against recent data, which is done frequently and sensitive to user-facing performance. This adds a cache of query responses for local completed blocks which is typically the last 5-20 minutes.

The generators already have a similar cache for the metrics summary api, but this takes a different approach. Whereas the summary api cache is in-memory, this is disk-based and writes new caching files to the block folder. These files are unique to each request (query + params), and exist for the lifetime of the block. They aren't flushed to object storage, and they get automatically cleaned up when the block is deleted. (Note - this has precedent in the flushed file that is written to block folders to track their flushed status) Reasons to avoid the in-memory cache is that generators are already usually memory-intensive, and the cache is often inadequate as-is. Example block contents:

pwd = /tempo/generator/traces/single-tenant/blocks/single-tenant/
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/bloom-0
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_14197044629227963207.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_18089096181261108941.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_433499706678913880.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_4989462974766826143.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/data.parquet
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/flushed
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/index
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/meta.json

There are additional changes in this PR which proved necessary:

  • Generator RF1 block meta times weren't right. They always flushed traces with the timestamp of "now" so the metas were off by roughly trace_idle_period + trace_flush_period. This fixes it to flush proper times.
  • As part of that, it also requires that the generator traces WAL has a real ingestion_slack time. This fixes the default to match ingesters of 2 minutes. NOTE - This is different than the slack time for metrics.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment thread modules/generator/processor/localblocks/query_range.go Outdated
Comment thread modules/generator/processor/localblocks/query_range.go Outdated
@mdisibio mdisibio enabled auto-merge (squash) June 21, 2024 14:57
@mdisibio mdisibio merged commit b29d56c into grafana:main Jun 21, 2024
mapno pushed a commit that referenced this pull request Jun 24, 2024
…ks (#3799)

* working version

* Fix start/end meta of generator-flushed blocks, and config default. Cleanup/dedupe timerange logic.

* Add tests

* lint

* changelog

* review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants