Skip to content

[vparquet3] Add command to tempo-cli to analyse blocks for dedicated columns#2622

Merged
stoewer merged 17 commits intografana:mainfrom
mapno:vparquet3-cli
Jul 28, 2023
Merged

[vparquet3] Add command to tempo-cli to analyse blocks for dedicated columns#2622
stoewer merged 17 commits intografana:mainfrom
mapno:vparquet3-cli

Conversation

@mapno
Copy link
Copy Markdown
Contributor

@mapno mapno commented Jul 6, 2023

What this PR does:

NOTE: Depends on vparquet3 being merged to main

Adds two new methods to tempo-cli to analyse parquet blocks and output summaries of generic attribute columns:

Analyse block

Analyses a block and outputs a summary of the block's generic attributes.
It's of particular use when trying to determine what attributes to configure for dedicated columns in vParquet3.

Arguments:

  • tenant-id The tenant ID. Use single-tenant for single tenant setups.
  • block-id The block ID as UUID string.

Options:

  • Backend options
  • --num-attr <value> Number of attributes to output (default: 10)

Example:

tempo-cli analyse block --backend=local --bucket=./cmd/tempo-cli/test-data/ single-tenant b18beca6-4d7f-4464-9f72-f343e688a4a0
Analyse blocks Analyses all blocks in a given time range and outputs a summary of the blocks' generic attributes. It's of particular use when trying to determine what attributes to configure for dedicated columns in vParquet3.

Arguments:

  • tenant-id The tenant ID. Use single-tenant for single tenant setups.

Options:

  • Backend options
  • --num-attr <value> Number of attributes to output (default: 10)
  • --min-compaction-level <value> Minimum compaction level to include in the analysis (default: 3)
  • --max-blocks <value> Maximum number of blocks to analyse (default: 10)

Example:

tempo-cli analyse blocks --backend=local --bucket=./cmd/tempo-cli/test-data/ single-tenant

Which issue(s) this PR fixes:
Fixes #2630

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Contributor

@stoewer stoewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm :)

Comment on lines +60 to +64
case vparquet.VersionString:
return vparquet.FieldSpanAttrKey, vparquetSpanAttrs
case vparquet2.VersionString:
return vparquet2.FieldSpanAttrKey, vparquet2SpanAttrs
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will be vParquet2 and vParquet3 as soon as vParquet3 is realeased?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly 👍

Comment thread cmd/tempo-cli/cmd-analyse-block.go Outdated
Comment thread cmd/tempo-cli/cmd-analyse-block.go Outdated
Comment thread tempodb/encoding/vparquet/block.go Outdated

func newBackendBlock(meta *backend.BlockMeta, r backend.Reader) *backendBlock {
return &backendBlock{
func NewBackendBlock(meta *backend.BlockMeta, r backend.Reader) *BackendBlock {
Copy link
Copy Markdown
Contributor

@stoewer stoewer Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether there is an alternative to making this function and BackendBlock public. Maybe the CLI could have a thin wrapper around backend.ContextReader that implements an io.ReaderAt that can be passed into parquet.OpenFile()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used NewBackendReaderAt() and passed the reader to parquet.OpenFile() instead. No need to export BackendBlock nor any of its methods. Nice call.

@mapno mapno mentioned this pull request Jul 17, 2023
3 tasks

## Analyse blocks
Analyses all blocks in a given time range and outputs a summary of the blocks' generic attributes.
It's of particular use when trying to determine what attributes to configure for dedicated columns in vParquet3.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the dedicated columns page (PR#2664)?

Comment thread docs/sources/tempo/operations/tempo_cli.md Outdated
Comment thread docs/sources/tempo/operations/tempo_cli.md Outdated
Copy link
Copy Markdown
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the doc portion of the PR. Thank you for adding doc!

mapno and others added 3 commits July 18, 2023 09:55
Copy link
Copy Markdown
Contributor

@stoewer stoewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stoewer stoewer merged commit 41dfbc4 into grafana:main Jul 28, 2023
@mapno mapno deleted the vparquet3-cli branch November 25, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Method to identify candidates for dedicated columns

3 participants