Skip to content

[clickhouse] Implement FindTraceIDs for ClickHouse Storage For Primitive Parameters#7648

Merged
yurishkuro merged 24 commits intojaegertracing:mainfrom
mahadzaryab1:find-trace-ids-1
Dec 20, 2025
Merged

[clickhouse] Implement FindTraceIDs for ClickHouse Storage For Primitive Parameters#7648
yurishkuro merged 24 commits intojaegertracing:mainfrom
mahadzaryab1:find-trace-ids-1

Conversation

@mahadzaryab1
Copy link
Copy Markdown
Collaborator

@mahadzaryab1 mahadzaryab1 commented Nov 14, 2025

Which problem is this PR solving?

Description of the changes

  • This PR implements the FindTraceIDs function for ClickHouse Storage. It only adds the primitive query parameters (service name and operation name). The other query parameters will be added in follow-up PRs.

How was this change tested?

  • Unit tests

Checklist

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 added the changelog:experimental Change to an experimental part of the code label Nov 14, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.58%. Comparing base (3501efe) to head (1d8fae0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7648      +/-   ##
==========================================
+ Coverage   95.54%   95.58%   +0.03%     
==========================================
  Files         307      307              
  Lines       15392    15440      +48     
==========================================
+ Hits        14707    14759      +52     
+ Misses        537      534       -3     
+ Partials      148      147       -1     
Flag Coverage Δ
badger_v1 9.68% <0.00%> (-0.06%) ⬇️
badger_v2 2.02% <0.00%> (-0.02%) ⬇️
cassandra-4.x-v1-manual 14.02% <0.00%> (-0.08%) ⬇️
cassandra-4.x-v2-auto 2.01% <0.00%> (-0.02%) ⬇️
cassandra-4.x-v2-manual 2.01% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v1-manual 14.02% <0.00%> (-0.08%) ⬇️
cassandra-5.x-v2-auto 2.01% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v2-manual 2.01% <0.00%> (-0.02%) ⬇️
clickhouse 1.94% <0.00%> (-0.02%) ⬇️
elasticsearch-6.x-v1 18.50% <0.00%> (-0.11%) ⬇️
elasticsearch-7.x-v1 18.53% <0.00%> (-0.11%) ⬇️
elasticsearch-8.x-v1 18.69% <0.00%> (-0.11%) ⬇️
elasticsearch-8.x-v2 2.02% <0.00%> (-0.02%) ⬇️
elasticsearch-9.x-v2 2.02% <0.00%> (-0.02%) ⬇️
grpc_v1 9.56% <0.00%> (-0.06%) ⬇️
grpc_v2 2.02% <0.00%> (-0.02%) ⬇️
kafka-3.x-v2 2.02% <0.00%> (-0.02%) ⬇️
memory_v2 2.02% <0.00%> (-0.02%) ⬇️
opensearch-1.x-v1 ?
opensearch-2.x-v1 18.58% <0.00%> (-0.11%) ⬇️
opensearch-2.x-v2 2.02% <0.00%> (-0.02%) ⬇️
opensearch-3.x-v2 2.02% <0.00%> (-0.02%) ⬇️
query 2.02% <0.00%> (-0.02%) ⬇️
tailsampling-processor 0.58% <0.00%> (-0.01%) ⬇️
unittests 94.15% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 14, 2025

Metrics Comparison Summary

Total changes across all snapshots: 53

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 53

  • 🆕 Added: 0 metrics
  • ❌ Removed: 53 metrics
  • 🔄 Modified: 0 metrics

❌ Removed Metrics

  • http_server_request_body_size_bytes (18 variants)
View diff sample
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_request_duration_seconds` (17 variants)
View diff sample
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_response_body_size_bytes` (18 variants)
View diff sample
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="localhost",server_port="13133",url_scheme="http"}
...

➡️ View full metrics file

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 marked this pull request as ready for review November 19, 2025 03:50
@mahadzaryab1 mahadzaryab1 requested a review from a team as a code owner November 19, 2025 03:50
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Copy link
Copy Markdown
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to revisit the schema & index design. Looking at the spans table, all I see is

    ) ENGINE = MergeTree PRIMARY KEY (trace_id)

which doesn't make sense to me. Do we have a design doc about how we want CH to lay out the data?

At minimum there needs to be ORDER BY. The typical way to organize telemetry data is to group data by time range.

@yurishkuro yurishkuro removed the go label Nov 23, 2025
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
)

const (
defualtProtocol = "native"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in constant name: defualtProtocol should be defaultProtocol. This typo is propagated to the test file (config_test.go line 90) and should be fixed in both places.

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1
Copy link
Copy Markdown
Collaborator Author

mahadzaryab1 commented Nov 23, 2025

We need to revisit the schema & index design. Looking at the spans table, all I see is

    ) ENGINE = MergeTree PRIMARY KEY (trace_id)

which doesn't make sense to me. Do we have a design doc about how we want CH to lay out the data?

At minimum there needs to be ORDER BY. The typical way to organize telemetry data is to group data by time range.

@yurishkuro For the most part - the primary key and the sort key are the same (https://clickhouse.com/docs/best-practices/choosing-a-primary-key). A primary key in ClickHouse doesn't define uniqueness. I did change this to be the sort key however since that looks to be the convention in ClickHouse.

I'm going to create a design document to optimize the schema via the sort key and data-skipping indexes. I can make those optimizations in a follow-up PR.

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Comment on lines +187 to +195
b, err := hex.DecodeString(traceID)
if err != nil {
if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) {
return
}
continue
}

if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(b)}}, nil) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type conversion error: Cannot convert []byte to pcommon.TraceID (which is [16]byte). The hex.DecodeString(traceID) returns a slice []byte, but pcommon.TraceID expects a fixed-size array [16]byte. This will cause a compilation error.

Fix by copying the slice into a fixed-size array:

b, err := hex.DecodeString(traceID)
if err != nil {
    if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) {
        return
    }
    continue
}

var traceIDArray [16]byte
if len(b) != 16 {
    if !yield(nil, fmt.Errorf("invalid trace ID length: expected 16 bytes, got %d", len(b))) {
        return
    }
    continue
}
copy(traceIDArray[:], b)

if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(traceIDArray)}}, nil) {
    return
}
Suggested change
b, err := hex.DecodeString(traceID)
if err != nil {
if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) {
return
}
continue
}
if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(b)}}, nil) {
b, err := hex.DecodeString(traceID)
if err != nil {
if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) {
return
}
continue
}
var traceIDArray [16]byte
if len(b) != 16 {
if !yield(nil, fmt.Errorf("invalid trace ID length: expected 16 bytes, got %d", len(b))) {
return
}
continue
}
copy(traceIDArray[:], b)
if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(traceIDArray)}}, nil) {
return
}

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

mahadzaryab1 and others added 3 commits December 13, 2025 08:48
Signed-off-by: Mahad Zaryab <43658574+mahadzaryab1@users.noreply.github.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
for rows.Next() {
var traceID string
if err := rows.Scan(&traceID); err != nil {
if !yield(nil, fmt.Errorf("failed to scan row: %w", err)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's distracting that there are so many places you need to call yield(). I would recommend moving the logic of reading the row into a helper function such that here you simply do

traceID, err := readRowIntoTraceID(row)
if !yield(traceID, err) {
  return
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Done!

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@yurishkuro yurishkuro enabled auto-merge December 20, 2025 21:16
@yurishkuro yurishkuro added this pull request to the merge queue Dec 20, 2025
Merged via the queue into jaegertracing:main with commit c0b1db9 Dec 20, 2025
82 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage changelog:experimental Change to an experimental part of the code enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants