vParquet3: add dedicated columns to parquet schema and block meta#2517
Conversation
| // Column paths for spare dedicated attribute columns | ||
| dedicatedResourceColumnsByType = map[string][]string{ | ||
| dedicatedColumnTypeString: { | ||
| "rs.list.element.Resource.DedicatedAttributes.String01", |
There was a problem hiding this comment.
The column paths are actually not needed yet, but will be needed later for the querying part. For now only the number of entries is relevant.
| dedicatedColumnTypeString = "string" | ||
| dedicatedColumnScopeResource = "resource" | ||
| dedicatedColumnScopeSpan = "span" |
There was a problem hiding this comment.
Nit: these could be their own type, instead of strings. Makes for a tighter contract.
There was a problem hiding this comment.
I agree that can be beneficial to make types for the column scope and type. However, similar fields / types will exist in different places: config/overrides, block meta, and encoding. Adding types makes more sense if the definitions are shared across those parts. I’m not quite sure if we want to do this in the end. I’d rather wait until we have the other parts and then see if sharing types makes sense.
Co-authored-by: Mario <mariorvinas@gmail.com>
) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
…afana#2517) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
…afana#2517) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
…afana#2517) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
…afana#2517) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com>
) * [vParquet3] create new block encoding by copying vParquet2 * vParquet3: add dedicated columns to parquet schema and block meta (#2517) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns Co-authored-by: Mario <mariorvinas@gmail.com> * Add dedicated columns to overrides module (#2551) * [vParquet3] Write path (#2555) * Add dedicated columns to overrides and blocks * Improvements * Change test * Fix tests * Extend ingester_test: * Add dedicated columns config to storage block * Review comments * Add comment * [vParquet3] dedicated columns read path (#2592) * Refactor and rename function blockMetaToDedicatedColumnMapping * Query dedicated attribute columns with TraceQL * Search tag values in dedicated attribute columns * Search tags in dedicated attribute columns * Search for values in dedicated attribute columns in tests * More consistent naming * Update block and meta.json in vparquet2/test-data * Test dedicated column in traceToParquet test * Format Go code * Introduce types for dedicated column type and scope Replace StaticTypeFromString() with DedicatedColumnType.ToStaticType() * The function dedicatedColumnsToColumnMapping() can receive multiple scopes * [vParquet3] Add support for dedicated columns in compactor (#2561) * Re-order schema to keep columns affected by column index changes low * Add spare columns for dedicated attributes to schema struct * Add dedicated column config to block meta * Read and write attributes in dedicated columns * Make order of dedicated attributes predictable when reading * Fix existing tests and benchmark * Run exiting benchmarks and tests with dedicated columns * Add dedicated columns to overrides and blocks * Support dedicated columns in compactor block selection * Changes to hash * More tests --------- Co-authored-by: A. Stoewer <adrian@stoewer.me> * [vParquet3] pass dedicated columns to querier (#2603) * Add dedicated columns to SearchBlockRequest message * Assign SearchBlockRequest dedicated cols from BlockMeta and vice versa * Encode SearchBlockRequest to http request and vice versa * Don't add empty dedicated columns when building a search request * Unit tests with dedicated columns * Implement dedicated column scope and type as protobuf enums * [vParquet3] validate dedicated columns configuration (#2616) * Add validate function * Refactor: use DedicatedColumns type instead of []DedicatedColumn * Initialize logger before verifying the config This fixes the config verification output * Check for invalid dedicated columns with '-config.verify true' * Use ToTempopb() to validate dedicated column scope and type * [vParquet3] mention feature in CHANGELOG.md * [vParquet3] Address review comments * Remove TODO comment about caching the dedicated column hash * Shorten url param for dedicated columns to 'dc' * Add function to get latest encoding and use it in tests * Fix name DedicateColumnsFromTempopb * [vParquet3] Address more review comments * Remove 'Test' columns from vParquet3 schema * Rename async iterator environment variable * Do not export methods of dedicatedColumnMapping * Skip dedicated attribute lookup depending on scope in searchTagValues * Validate maximum number of configured dedicated columns * Test data for vparquet3 uses dedicated columns * Reduce size of block meta JSON * Use 'parquet_' prefix for dedicated column configuration * [vParquet3] Integration tests with dedicated attribute columns * Add e2e tests for encodings and dedicated attribute columns * Use dedicated attribute columns in TestSearchCompleteBlock * Add support for v2 in encodings test --------- Co-authored-by: Mario <mariorvinas@gmail.com>
What this PR does:
This PR is a first step towards dynamically assignable dedicated columns for resource and span attributes. Attributes written to and read from spare dedicated columns depending on a column assignment in the block metadata.
Which issue(s) this PR fixes:
Fixes #2528
Checklist
Documentation addedCHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]