Description
Context
6 months ago, @fpetkovski presented Deep Dive into Long Term Metrics for Planet-Scale Commerce, a talk aimed at presenting how Shopify leveraged Thanos and Parquet to deliver great performances at scale. This talk made an impression on the community and efforts were started to design a similar system. Such an effort is one from @MichaHoffmann in https://github.com/cloudflare/parquet-tsdb-poc, which demonstrated how such a system could work. This last project spun up a community effort in https://github.com/prometheus-community/parquet-common that aims at creating a common layer to represent Prometheus metrics and storage in Parquet files.
This project is currently alpha but starts seeing implementation, for instance in Cortex: cortexproject/cortex#6743.
The aim of this issue is to start formalizing discussions around using https://github.com/prometheus-community/parquet-common in Thanos.
Ideas were exchanged around having two logical components:
parquet-compactor
, which would aim at compacting Prometheus blocks into Parquet filesparquet-querier
, which would query parquet files
To do
- Create a proposal
References
- https://github.com/prometheus-community/parquet-common
- Deep Dive into Long Term Metrics for Planet-Scale Commerce by @fpetkovski
- Implementing Parquet Queryable with fallback cortexproject/cortex#6743
- https://github.com/cloudflare/parquet-tsdb-poc
- https://cloud-native.slack.com/archives/CL25937SP/p1746801292045059
- LFX: Research querying Apache Parquet files promql-engine#167