Skip to content

[Parquet] Add traceid index to vparquet2 and vparquet3 and use it when finding trace by id#2697

Merged
mdisibio merged 15 commits intografana:mainfrom
mdisibio:vparquet2-index
Aug 24, 2023
Merged

[Parquet] Add traceid index to vparquet2 and vparquet3 and use it when finding trace by id#2697
mdisibio merged 15 commits intografana:mainfrom
mdisibio:vparquet2-index

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Jul 25, 2023

What this PR does:
This re-adds an index file to vparquet2 and vparquet3 blocks to help find traces by ID. It contains the max trace ID of each row group and allows us to jump straight to the correct one, instead of the current method which performs i/o to test each group. Added as optional to avoid breaking changes. New blocks will create and use the index, and old blocks will fallback to the prior method.

Benchmarks don't show any improvement unless we simulate some latency for the i/o operations. Numbers below are for a level-0 block with 11 rowgroups. It saves around log2(rgs) i/o operations, so would be even better on larger blocks. Additionally the index file will be cached in practice but not included here.

0 ms:

name              old time/op    new time/op    delta
FindTraceByID-12    24.2ms ±48%    24.2ms ±48%   ~     (p=0.739 n=10+10)

50ms

name              old time/op    new time/op    delta
FindTraceByID-12     321ms ± 1%     167ms ± 1%  -47.85%  (p=0.000 n=9+8)

100ms

name              old time/op    new time/op    delta
FindTraceByID-12     622ms ± 1%     318ms ± 2%  -48.82%  (p=0.000 n=9+10)

Which issue(s) this PR fixes:
Fixes n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Contributor

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread tempodb/encoding/vparquet2/block_findtracebyid_test.go
@joe-elliott
Copy link
Copy Markdown
Collaborator

Before merging should we include this code in vparquet3?

@mdisibio
Copy link
Copy Markdown
Contributor Author

Before merging should we include this code in vparquet3?

Yep, added as mandatory feature in vParquet3. Please take a look.

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm concerned about is that, since our block min/max ids are currently incorrect the same bug will exist in this logic unwittingly.

Other than that I'm good.

Comment thread .gitignore
@mdisibio
Copy link
Copy Markdown
Contributor Author

mdisibio commented Aug 4, 2023

Since vParquet3 is already being used in some installs, we decided it needs to not be a breaking change after all. Now it is optional like in vParquet2.

@mdisibio
Copy link
Copy Markdown
Contributor Author

since our block min/max ids are currently incorrect the same bug will exist in this logic

That's a good thought and I don't have a great answer. Since we never determined the root cause or prevalence of that, it is hard to know this feature will be affected or not. I added an environment variable VPARQUET_INDEX=0 to turn off the index lookup in case we are seeing issues. Sound good?

@joe-elliott
Copy link
Copy Markdown
Collaborator

joe-elliott commented Aug 23, 2023

That's a good thought and I don't have a great answer. Since we never determined the root cause or prevalence of that, it is hard to know this feature will be affected or not. I added an environment variable VPARQUET_INDEX=0 to turn off the index lookup in case we are seeing issues. Sound good?

Yeah, I'm good. I suppose we could write a script that validates the index (and the min/max block size) after this is in a testing cluster to see if we find any inconsistencies. Let's keep an eye on the vult(ch)!

@mdisibio mdisibio merged commit 14da7b8 into grafana:main Aug 24, 2023
@mdisibio mdisibio changed the title [Parquet] Add traceid index to vparquet2 and use it when finding trace by id [Parquet] Add traceid index to vparquet2 and vparquet3 and use it when finding trace by id Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants