Skip to content

vParquet5 - Dedicated event attributes and detection/support of blobs#5946

Merged
mdisibio merged 32 commits intografana:mainfrom
mdisibio:vp5-newcolumns-50
Dec 14, 2025
Merged

vParquet5 - Dedicated event attributes and detection/support of blobs#5946
mdisibio merged 32 commits intografana:mainfrom
mdisibio:vp5-newcolumns-50

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Nov 14, 2025

What this PR does:
Several new vParquet 5 features:

  1. Dedicated attributes at the event level. This is straightforward, and are now included in tempo-cli analyse block output. Does not affect previous formats.

  2. Blobs: Detection and support for "blob" attributes. These are attributes with high cardinality and/or high length, such as UUIDs or stack traces, where the current dictionary encoding is a hindrance. Now tempo-cli analyse block can detect these and mark the dedicated column mapping accordingly. When reading and writing these columns the dictionary encoding is turned off (and we swap compression algorithms), for better overall performance and much reduced memory pressure, because we aren't encoding/decoding large dictionaries.

Example CLI Output:
$ go build && ./tempo-cli analyse block -c ~/tempo.debug.yaml --num-attr=10 single-tenant 1d9e6492-8953-4bdc-936a-50e8701ee8c1 --blob-threshold=200kb 
Scanning block contents.  Press CRTL+C to quit ...

Top 10 span attributes by size
name: http.url                         size: 6.0 MB   (29.44%)  count: 178400   distinct: 1582   avg reuse: 112.77      avg rowgroup content (dict + body): 237 kB (blob)
name: net.host.name (dedicated)        size: 3.4 MB   (16.37%)  count: 214080   distinct: 6      avg reuse: 35680.00    avg rowgroup content (dict + body): 214 kB (blob)
name: net.sock.host.addr (dedicated)   size: 3.0 MB   (14.78%)  count: 214080   distinct: 2936   avg reuse: 72.92       avg rowgroup content (dict + body): 267 kB (blob)
name: net.transport (dedicated)        size: 1.5 MB   (7.32%)   count: 249760   distinct: 1      avg reuse: 249760.00   avg rowgroup content (dict + body): 250 kB (blob)
name: http.target                      size: 1.1 MB   (5.40%)   count: 214080   distinct: 3      avg reuse: 71360.00    avg rowgroup content (dict + body): 214 kB (blob)
name: http.scheme                      size: 1.1 MB   (5.23%)   count: 214080   distinct: 1      avg reuse: 214080.00   avg rowgroup content (dict + body): 214 kB (blob)
name: net.sock.family                  size: 999 kB   (4.88%)   count: 249760   distinct: 1      avg reuse: 249760.00   avg rowgroup content (dict + body): 250 kB (blob)
name: http.flavor                      size: 749 kB   (3.66%)   count: 249760   distinct: 1      avg reuse: 249760.00   avg rowgroup content (dict + body): 250 kB (blob)
name: http.method                      size: 749 kB   (3.66%)   count: 178400   distinct: 5      avg reuse: 35680.00    avg rowgroup content (dict + body): 178 kB 
name: net.peer.name                    size: 642 kB   (3.14%)   count: 35680    distinct: 1      avg reuse: 35680.00    avg rowgroup content (dict + body): 36 kB 
Top 2 span array attributes by size
name: http.response.header.content-type   size: 3.6 MB   (17.36%)
name: http.request.header.accept          size: 592 kB   (2.89%)

Top 6 resource attributes by size
name: k6                       size: 855 kB   (92.43%)  count: 213678   distinct: 1   avg reuse: 213678.00   avg rowgroup content (dict + body): 214 kB (blob)
name: host.name                size: 33 kB    (3.60%)   count: 1666     distinct: 1   avg reuse: 1666.00     avg rowgroup content (dict + body): 1.7 kB 
name: telemetry.sdk.name       size: 22 kB    (2.34%)   count: 1666     distinct: 1   avg reuse: 1666.00     avg rowgroup content (dict + body): 1.7 kB 
name: telemetry.sdk.version    size: 10 kB    (1.08%)   count: 1666     distinct: 1   avg reuse: 1666.00     avg rowgroup content (dict + body): 1.7 kB 
name: telemetry.sdk.language   size: 3.3 kB   (0.36%)   count: 1666     distinct: 1   avg reuse: 1666.00     avg rowgroup content (dict + body): 1.7 kB 
name: service.version          size: 1.7 kB   (0.18%)   count: 1666     distinct: 1   avg reuse: 1666.00     avg rowgroup content (dict + body): 1.7 kB 

Top 8 event attributes by size
name: server.address            size: 191 B   (29.03%)  count: 14   distinct: 2    avg reuse: 7.00    avg rowgroup content (dict + body): 45 B 
name: http.conn.idletime        size: 155 B   (23.56%)  count: 12   distinct: 12   avg reuse: 1.00    avg rowgroup content (dict + body): 215 B 
name: http.local                size: 143 B   (21.73%)  count: 13   distinct: 1    avg reuse: 13.00   avg rowgroup content (dict + body): 28 B 
name: http.remote               size: 140 B   (21.28%)  count: 14   distinct: 1    avg reuse: 14.00   avg rowgroup content (dict + body): 28 B 
name: http.dns.addrs            size: 13 B    (1.98%)   count: 1    distinct: 1    avg reuse: 1.00    avg rowgroup content (dict + body): 18 B 
name: http.conn.done.addr       size: 10 B    (1.52%)   count: 1    distinct: 1    avg reuse: 1.00    avg rowgroup content (dict + body): 15 B 
name: http.conn.done.network    size: 3 B     (0.46%)   count: 1    distinct: 1    avg reuse: 1.00    avg rowgroup content (dict + body): 8 B 
name: http.conn.start.network   size: 3 B     (0.46%)   count: 1    distinct: 1    avg reuse: 1.00    avg rowgroup content (dict + body): 8 B 

The blob annotation is ignored by vParquet4, and since we also didn't change the selection of the top N columns, there will be no effect on vParquet4 or earlier. The dedicated columns analysis and configuration for those tenants will still remain optimal.

  1. Automatic column removal: automatically drop any unused dedicated columns from the output files. For example currently there are no assignments for the new integer columns, and are always unused. This allows them to be dropped entirely, instead of writing a column entirely of nulls. This saves on space and overhead. A next step for this feature is to have different number of dedicated columns for each level, resource, span, event, etc, optimized per tenant.

For example running parquet-tools schema on a block with only 4 dedicated columns shows the remaining 6 columns dropped entirely:

                    required group DedicatedAttributes {
                      repeated binary String01 (STRING);
                      repeated binary String02 (STRING);
                      repeated binary String03 (STRING);
                      repeated binary String04 (STRING);
                    }

Notes

  1. Blob detection logic: What defines a blob and when it is worth swapping encoding/compression? This is an interesting subproblem and here is what I arrived at experimentally: When the total estimated content size for the attribute exceeds 4MB per row group (~100MB), then consider it a blob. The content size is an estimate of the uncompressed dictionary and page content: the sum of all string lengths of all distinct values (plus 4 bytes per entry), plus 1 byte for every occurrence. This led to the smallest total file size (even smaller than vp4 with all dictionary encoding), which is a decent heuristic for "optimal layout". Other approaches tried to key off of the cardinality or max length, but did not work. Taking a step back, they not addressing the real issue which is to prevent runaway dictionaries. We can continue to experiment in this area.

Which issue(s) this PR fixes:
Related: #4694

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio mdisibio mentioned this pull request Dec 4, 2025
3 tasks
@mdisibio mdisibio changed the title [WIP] vParquet5 - Dedicated event attributes and detection/support of blobs vParquet5 - Dedicated event attributes and detection/support of blobs Dec 5, 2025
@mdisibio mdisibio mentioned this pull request Dec 11, 2025
Comment thread cmd/tempo-cli/cmd-analyse-block.go Outdated
fmt.Printf("%s attributes: ", scope)
for _, a := range attrList {
fmt.Printf("\"%s\", ", a.name)
fmt.Printf("\"%s.%s\" ", scope, a.name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's not a strong preference for this, can we go back to not printing the scope? or we will need to update the dedicated automation since it expects just a name here

Copy link
Copy Markdown
Contributor Author

@mdisibio mdisibio Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, reverted. We also aren't printing the event attributes in the simple summary, so effectively the new blob and event columns aren't useable by that automation. Which I think is OK. Adrian and I have been chatting and we think rather than try to update all of the output formats in this command to work with those features (and the new array option), a totally new command like tempo-cli suggest-columns with machine-readable output sounds better. Then update the automation to use that.

@javiermolinar javiermolinar added this to the v2.10 milestone Dec 12, 2025
Copy link
Copy Markdown
Contributor

@ie-pham ie-pham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm :)

@mdisibio mdisibio merged commit da90618 into grafana:main Dec 14, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants