Skip to content

Conversation

rs-unity
Copy link
Contributor

@rs-unity rs-unity commented Sep 10, 2025

Enable UseCachedSize in proto marshal to eliminate redundant size computation

Fixes: #8570

The proto message size was previously being computed twice: once before marshalling and again during the marshalling call itself. In high-throughput workloads, this duplicated computation is expensive.

By enabling UseCachedSize on MarshalOptions, we reuse the size calculated immediately before marshalling, avoiding the second call to proto.Size.

In our application, the redundant size call accounted for ~12% of total CPU time. With this change, we eliminate that overhead while preserving correctness.


image image

RELEASE NOTES:

  • encoding/proto: Avoid redundant message size calculation when marshalling.

Copy link

linux-foundation-easycla bot commented Sep 10, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

codecov bot commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.56%. Comparing base (e60a04b) to head (ec0c47e).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
encoding/proto/proto.go 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8569      +/-   ##
==========================================
- Coverage   81.64%   80.56%   -1.09%     
==========================================
  Files         413      413              
  Lines       40621    40843     +222     
==========================================
- Hits        33167    32906     -261     
- Misses       5991     6260     +269     
- Partials     1463     1677     +214     
Files with missing lines Coverage Δ
encoding/proto/proto.go 55.55% <88.23%> (-7.24%) ⬇️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eshitachandwani
Copy link
Member

eshitachandwani commented Sep 11, 2025

The code seems okay but I am not sure if we want to do that especially since the description for UseCachedSize says this :

// There is absolutely no guarantee that Size followed by Marshal with
// UseCachedSize set will perform equivalently to Marshal alone.

On the other side it also says that it asserts that nothing has changed in between :

// Setting this option asserts that:
	//
	// 1. Size has previously been called on this message with identical
	// options (except for UseCachedSize itself).
	//
	// 2. The message and all its submessages have not changed in any
	// way since the Size call. For lazily decoded messages, accessing
	// a message results in decoding the message, which is a change.

So I would like other maintainers opinion on this. cc @dfawley @easwars @arjan-bal

return nil, fmt.Errorf("proto: failed to marshal, message is %T, want proto.Message", v)
}

size := proto.Size(vv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this seems to rely on a runtime behaviour of calling proto.Size(vv) before using the cached size (please correct me if I'm wrong), we should call this out in a code comment just above this to serve as a warning for future code authors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we don't call proto.Size(vv) here, we could end up using a stale cached size.

Copy link
Contributor Author

@rs-unity rs-unity Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be cleaner? rs-unity@188bece rs-unity@ec0c47e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring of UseCachedSize does say the following:

	// UseCachedSize indicates that the result of a previous Size call
	// may be reused.****

And that if that condition is not met, bad things might happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to flag an edge case that normally shouldn’t happen: if another part of the stack keeps a reference to the proto message and it’s shared across multiple Go routines, one routine might modify the message while another is marshaling it through this codec. In that situation, enabling UseCachedSize becomes risky.

That said, the issue I’m describing would still cause (other) problems even if UseCachedSize isn’t enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment: rs-unity@ec0c47e

@arjan-bal arjan-bal assigned rs-unity and unassigned arjan-bal Sep 11, 2025
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also be able to run the benchmarks here: https://github.com/grpc/grpc-go/tree/master/benchmark and report the results? Thanks.

return nil, fmt.Errorf("proto: failed to marshal, message is %T, want proto.Message", v)
}

size := proto.Size(vv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring of UseCachedSize does say the following:

	// UseCachedSize indicates that the result of a previous Size call
	// may be reused.****

And that if that condition is not met, bad things might happen.

@arjan-bal
Copy link
Contributor

Would you also be able to run the benchmarks here: https://github.com/grpc/grpc-go/tree/master/benchmark and report the results? Thanks.

Here are the commands to run them:

# get the baseline performance
git checkout master
RUN_NAME="unary-before"
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
   -compression=off -maxConcurrentCalls=120 -trace=off \
   -reqSizeBytes=1024 -respSizeBytes=1024 -networkMode=Local -resultFile="${RUN_NAME}"   -recvBufferPool=simple

# get the performance with the improvements
git checkout use-cached-size
RUN_NAME="unary-after"
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
   -compression=off -maxConcurrentCalls=120 -trace=off \
   -reqSizeBytes=1024 -respSizeBytes=1024 -networkMode=Local -resultFile="${RUN_NAME}"   -recvBufferPool=simple

# generate a report
go run benchmark/benchresult/main.go unary-before unary-after

I suspect the benchmarks may not show a significant improvement since they use a very simple proto with a shallow schema.

@rs-unity
Copy link
Contributor Author

In our workload, the improvements had a big impact, we were able to cut node count by 10–20%. That said, our use case is pretty unique since we’re serving over 200M payloads per second.


Here's the results of the benchmarks based on @arjan-bal command:

               Title       Before        After Percentage
            TotalOps     10403961     10458046     0.52%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op     13863.13     13864.13     0.01%
           Allocs/op       156.05       156.05     0.00%
             ReqT/op 1420487475.20 1427871880.53     0.52%
            RespT/op 1420487475.20 1427871880.53     0.52%
            50th-Lat    642.875µs      638.5µs    -0.68%
            90th-Lat    841.792µs    838.833µs    -0.35%
            99th-Lat   1.447792ms   1.459792ms     0.83%
             Avg-Lat    691.469µs    687.897µs    -0.52%
           GoVersion     go1.25.1     go1.25.1
         GrpcVersion   1.76.0-dev   1.76.0-dev

Here's the results of the benchmarks based on a test closer to our workload:

                     │   old.txt   │               new.txt               │
                     │   sec/op    │   sec/op     vs base                │
ProtoMarshal_2000-16   621.3µ ± 2%   448.4µ ± 4%  -27.83% (p=0.000 n=10)

                     │   old.txt    │               new.txt               │
                     │     B/op     │     B/op      vs base               │
ProtoMarshal_2000-16   1.094Mi ± 0%   1.063Mi ± 0%  -2.79% (p=0.000 n=10)

                     │   old.txt   │               new.txt               │
                     │  allocs/op  │  allocs/op   vs base                │
ProtoMarshal_2000-16   6.007k ± 0%   4.007k ± 0%  -33.29% (p=0.000 n=10)

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement! We'll wait for a second approval before merging.

@arjan-bal arjan-bal changed the title proto: enable use cached size option encoding/proto: enable use cached size option Sep 12, 2025
@easwars easwars removed their assignment Sep 12, 2025
@arjan-bal arjan-bal merged commit 0f45079 into grpc:master Sep 15, 2025
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement: Enable UseCachedSize in proto marshal to eliminate redundant size computation
5 participants