Skip to content

vParquet3: add dedicated columns to parquet schema and block meta#2517

Merged
stoewer merged 8 commits intografana:main-vparquet3from
stoewer:vparquet3-dedicated-col-schema
Jun 1, 2023
Merged

vParquet3: add dedicated columns to parquet schema and block meta#2517
stoewer merged 8 commits intografana:main-vparquet3from
stoewer:vparquet3-dedicated-col-schema

Conversation

@stoewer
Copy link
Copy Markdown
Contributor

@stoewer stoewer commented May 31, 2023

What this PR does:

This PR is a first step towards dynamically assignable dedicated columns for resource and span attributes. Attributes written to and read from spare dedicated columns depending on a column assignment in the block metadata.

Which issue(s) this PR fixes:
Fixes #2528

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stoewer stoewer self-assigned this May 31, 2023
@stoewer stoewer closed this May 31, 2023
@stoewer stoewer deleted the vparquet3-dedicated-col-schema branch May 31, 2023 11:11
@stoewer stoewer restored the vparquet3-dedicated-col-schema branch May 31, 2023 11:11
@stoewer stoewer reopened this May 31, 2023
@stoewer stoewer changed the base branch from main-vparquet3 to main May 31, 2023 11:26
@stoewer stoewer changed the base branch from main to main-vparquet3 May 31, 2023 11:26
// Column paths for spare dedicated attribute columns
dedicatedResourceColumnsByType = map[string][]string{
dedicatedColumnTypeString: {
"rs.list.element.Resource.DedicatedAttributes.String01",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column paths are actually not needed yet, but will be needed later for the querying part. For now only the number of entries is relevant.

@stoewer stoewer marked this pull request as ready for review June 1, 2023 07:24
@stoewer stoewer assigned mapno and unassigned stoewer Jun 1, 2023
@stoewer stoewer linked an issue Jun 1, 2023 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +9 to +11
dedicatedColumnTypeString = "string"
dedicatedColumnScopeResource = "resource"
dedicatedColumnScopeSpan = "span"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these could be their own type, instead of strings. Makes for a tighter contract.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that can be beneficial to make types for the column scope and type. However, similar fields / types will exist in different places: config/overrides, block meta, and encoding. Adding types makes more sense if the definitions are shared across those parts. I’m not quite sure if we want to do this in the end. I’d rather wait until we have the other parts and then see if sharing types makes sense.

Comment thread tempodb/backend/block_meta_test.go Outdated
Co-authored-by: Mario <mariorvinas@gmail.com>
@stoewer stoewer merged commit d240f3e into grafana:main-vparquet3 Jun 1, 2023
stoewer added a commit that referenced this pull request Jun 22, 2023
)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
mapno added a commit to mapno/tempo that referenced this pull request Jun 27, 2023
…afana#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
mapno added a commit to mapno/tempo that referenced this pull request Jun 28, 2023
…afana#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
stoewer added a commit that referenced this pull request Jul 13, 2023
)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
stoewer added a commit to stoewer/tempo that referenced this pull request Jul 13, 2023
…afana#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
stoewer added a commit that referenced this pull request Jul 14, 2023
)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
@stoewer stoewer deleted the vparquet3-dedicated-col-schema branch July 14, 2023 01:25
stoewer added a commit to stoewer/tempo that referenced this pull request Jul 26, 2023
…afana#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>
mapno added a commit that referenced this pull request Jul 26, 2023
)

* [vParquet3] create new block encoding by copying vParquet2

* vParquet3: add dedicated columns to parquet schema and block meta (#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>

* Add dedicated columns to overrides module (#2551)

* [vParquet3] Write path (#2555)

* Add dedicated columns to overrides and blocks

* Improvements

* Change test

* Fix tests

* Extend ingester_test:

* Add dedicated columns config to storage block

* Review comments

* Add comment

* [vParquet3] dedicated columns read path (#2592)

* Refactor and rename function blockMetaToDedicatedColumnMapping

* Query dedicated attribute columns with TraceQL

* Search tag values in dedicated attribute columns

* Search tags in dedicated attribute columns

* Search for values in dedicated attribute columns in tests

* More consistent naming

* Update block and meta.json in vparquet2/test-data

* Test dedicated column in traceToParquet test

* Format Go code

* Introduce types for dedicated column type and scope

Replace StaticTypeFromString() with DedicatedColumnType.ToStaticType()

* The function dedicatedColumnsToColumnMapping() can receive multiple scopes

* [vParquet3] Add support for dedicated columns in compactor (#2561)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

* Add dedicated columns to overrides and blocks

* Support dedicated columns in compactor block selection

* Changes to hash

* More tests

---------

Co-authored-by: A. Stoewer <adrian@stoewer.me>

* [vParquet3] pass dedicated columns to querier (#2603)

* Add dedicated columns to SearchBlockRequest message

* Assign SearchBlockRequest dedicated cols from BlockMeta and vice versa

* Encode SearchBlockRequest to http request and vice versa

* Don't add empty dedicated columns when building a search request

* Unit tests with dedicated columns

* Implement dedicated column scope and type as protobuf enums

* [vParquet3] validate dedicated columns configuration (#2616)

* Add validate function

* Refactor: use DedicatedColumns type instead of []DedicatedColumn

* Initialize logger before verifying the config

This fixes the config verification output

* Check for invalid dedicated columns with '-config.verify true'

* Use ToTempopb() to validate dedicated column scope and type

* [vParquet3] mention feature in CHANGELOG.md

* [vParquet3] Address review comments

* Remove TODO comment about caching the dedicated column hash

* Shorten url param for dedicated columns to 'dc'

* Add function to get latest encoding and use it in tests

* Fix name DedicateColumnsFromTempopb

* [vParquet3] Address more review comments

* Remove 'Test' columns from vParquet3 schema

* Rename async iterator environment variable

* Do not export methods of dedicatedColumnMapping

* Skip dedicated attribute lookup depending on scope in searchTagValues

* Validate maximum number of configured dedicated columns

* Test data for vparquet3 uses dedicated columns

* Reduce size of block meta JSON

* Use 'parquet_' prefix for dedicated column configuration

* [vParquet3] Integration tests with dedicated attribute columns

* Add e2e tests for encodings and dedicated attribute columns

* Use dedicated attribute columns in TestSearchCompleteBlock

* Add support for v2 in encodings test

---------

Co-authored-by: Mario <mariorvinas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vParquet3: Add dedicated columns to schema and block meta

2 participants