Skip to content

feat: return total size in metric size Allow method for Log#5628

Merged
joe-elliott merged 9 commits intografana:mainfrom
sienna011022:feat/changeToReturnSize
Sep 25, 2025
Merged

feat: return total size in metric size Allow method for Log#5628
joe-elliott merged 9 commits intografana:mainfrom
sienna011022:feat/changeToReturnSize

Conversation

@sienna011022
Copy link
Copy Markdown
Contributor

@sienna011022 sienna011022 commented Sep 9, 2025

What this PR does

  • Implements feedback from #5625
    Returns the trace size directly from the .Allow() method instead, ensuring a thread-safe approach.

  • Adds total size logging functionality
    Track and log the cumulative size of traces processed by the ingester.
    → Provides better observability into trace data volume and helps with monitoring resource usage.


Comparison of Approaches

Performance Impact

Command used:

go test -bench=BenchmarkInstancePushBytesExistingTrace -benchmem -run=^$ -v -benchtime=30s

At This PR

goos: darwin
goarch: arm64
pkg: [github.com/grafana/tempo/modules/ingester](http://github.com/grafana/tempo/modules/ingester)
cpu: Apple M3 Pro
BenchmarkInstancePushBytesExistingTrace
BenchmarkInstancePushBytesExistingTrace-12         99746            301778 ns/op        11026808 B/op          7 allocs/op
PASS
ok      [github.com/grafana/tempo/modules/ingester](http://github.com/grafana/tempo/modules/ingester)       34.555s

At #5625

goos: darwin
goarch: arm64
pkg: [github.com/grafana/tempo/modules/ingester](http://github.com/grafana/tempo/modules/ingester)
cpu: Apple M3 Pro
BenchmarkInstancePushBytesExistingTrace
BenchmarkInstancePushBytesExistingTrace-12        116062            343547 ns/op        11026809 B/op          7 allocs/op
PASS
ok      [github.com/grafana/tempo/modules/ingester](http://github.com/grafana/tempo/modules/ingester)       43.814s
 ~/Documents/SRE/tempo/modules/ingester  feat/addTotalSizeLogging *2

Analysis:
Latency increase: ~42μs per operation (+13.84%)

Result

  • Result: Returning size in .Allow() method is more efficient and maintains thread safety.

Which issue(s) this PR fixes:
Addresses feedback from #5625 - implements trace size logging via .Allow() method return value for thread-safe operation.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@sienna011022 sienna011022 changed the title feat: return total size in Allow method feat: return total size in metric size Allow method for Log Sep 9, 2025
@joe-elliott
Copy link
Copy Markdown
Collaborator

Thank you for this improvement, but for any future changes please don't open a new PR. We can continue working in this one.

I have one more change to request. The code you've improved is going to go away in Tempo 3.0. We've moved this logic to a new spot. Adding better support for logging max traces sizes in 3.0 should be quite simple. Can you wrap this error with the appropriate info:

https://github.com/grafana/tempo/blob/main/pkg/livetraces/livetraces.go#L92

@sienna011022
Copy link
Copy Markdown
Contributor Author

sienna011022 commented Sep 11, 2025

Thank you for this improvement, but for any future changes please don't open a new PR. We can continue working in this one.

I have one more change to request. The code you've improved is going to go away in Tempo 3.0. We've moved this logic to a new spot. Adding better support for logging max traces sizes in 3.0 should be quite simple. Can you wrap this error with the appropriate info:

https://github.com/grafana/tempo/blob/main/pkg/livetraces/livetraces.go#L92

@joe-elliott Thank you for the feedback! I've been exploring different approaches for wrapping the error with appropriate info:

Option 1: fmt.Errorf() with detailed info (Current implementation)

return fmt.Errorf("%w: traceID=%x currentSize=%d batchSize=%d maxSize=%d", 
    ErrMaxTraceSizeExceeded, traceID, tr.sz, sz, maxTraceSize)

Pros: Simple, maintains errors.Is() compatibility, minimal changes to existing code
Cons: String-based, requires parsing for structured access

Option 2: Return AllowResult struct instead of error from PushWithTimestampAndLimits

func PushWithTimestampAndLimits(...) *AllowResult

Pros: Structured information, handles both success/failure cases
Cons: Too many places to modify, completely different from existing error handling pattern

Option 3: Inject logger into PushWithTimestampAndLimits for internal logging

func PushWithTimestampAndLimits(ts time.Time, traceID []byte, batch T, 
    maxLiveTraces, maxTraceSize uint64, logger log.Logger) error {
    // Handle logging directly inside
    if maxTraceSize > 0 && (tr.sz+sz > maxTraceSize) {
        level.Error(logger).Log("msg", "trace too large", "traceID", hex.EncodeToString(traceID))
        return ErrMaxTraceSizeExceeded
    }
}

Pros: Logging handled at the point of occurrence, minimal caller code changes
Cons: LiveTraces takes on logging responsibility, increased dependencies

Which approach would you prefer for Tempo 3.0?

@joe-elliott
Copy link
Copy Markdown
Collaborator

These lines can log an enormous amount. I think we should pass in a logger when creating the livetraces struct, but we need it to be rate limited like here: https://github.com/grafana/tempo/blob/main/modules/ingester/instance.go#L131

@sienna011022
Copy link
Copy Markdown
Contributor Author

These lines can log an enormous amount. I think we should pass in a logger when creating the livetraces struct, but we need it to be rate limited like here: https://github.com/grafana/tempo/blob/main/modules/ingester/instance.go#L131

@joe-elliott thanks I added log on livetraces.go too.

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good, but looks like theres a lint issue?

@sienna011022
Copy link
Copy Markdown
Contributor Author

this looks good, but looks like theres a lint issue?

@joe-elliott I fixed it thanks 👍

@sienna011022
Copy link
Copy Markdown
Contributor Author

@joe-elliott Hi can u check this PR please?

@sienna011022
Copy link
Copy Markdown
Contributor Author

@joe-elliott hi I checked ci workflow again can u check this pr I really appreciate your time

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sticking with this! Nice addition 🙏

@joe-elliott joe-elliott merged commit ab3c39e into grafana:main Sep 25, 2025
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants