Skip to content

feat(otlp-transformer): add custom protobuf logs serializer#6228

Closed
pichlermarc wants to merge 1 commit intoopen-telemetry:mainfrom
dynatrace-oss-contrib:feat/custom-logs-serializer
Closed

feat(otlp-transformer): add custom protobuf logs serializer#6228
pichlermarc wants to merge 1 commit intoopen-telemetry:mainfrom
dynatrace-oss-contrib:feat/custom-logs-serializer

Conversation

@pichlermarc
Copy link
Copy Markdown
Member

@pichlermarc pichlermarc commented Dec 16, 2025

Which problem is this PR solving?

Important

Draft PR; things that are still missing:

  • comparing allocations.
  • the way it's growing the buffer may not be exactly the way we want it yet.
    NOTE: I'm ooo until Jan 7 2026, will not be able to drive this forward until then.

This PR replaces the generated code for serializing logs to OTLP/protobuf with hand-rolled code. The idea is to eliminate most intermediate steps we needed to conform to the generated code's layout and write directly to a Uint8Array. I made this to address the performance regression discovered in #6221.

I added some perf tests to show the differences:

protobuf.js (currently on main - this is the baseline we're comparing against)

--- Logs Serialization Benchmarks ---
serialize 1 log (protobuf) x 219,135 ops/sec ±0.14% (101 runs sampled)
serialize 100 logs (protobuf) x 3,051 ops/sec ±0.19% (100 runs sampled)
serialize 256 logs (protobuf) x 1,230 ops/sec ±0.16% (98 runs sampled)
serialize 512 logs (protobuf) x 468 ops/sec ±0.78% (74 runs sampled)
serialize 1024 logs (protobuf) x 237 ops/sec ±0.25% (87 runs sampled)
serialize 1 log (json) x 215,862 ops/sec ±0.17% (101 runs sampled)
serialize 100 logs (json) x 3,778 ops/sec ±0.46% (92 runs sampled)
serialize 256 logs (json) x 1,460 ops/sec ±0.43% (98 runs sampled)
serialize 512 logs (json) x 743 ops/sec ±0.14% (99 runs sampled)
serialize 1024 logs (json) x 368 ops/sec ±0.29% (95 runs sampled)

custom implementation (from this PR, turns out to be pretty-much on-par with the current OTLP/json serializer, better than what we had with protobuf.js)

--- Logs Serialization Benchmarks ---
serialize 1 log (protobuf) x 200,625 ops/sec ±0.15% (97 runs sampled)
serialize 100 logs (protobuf) x 4,196 ops/sec ±0.19% (100 runs sampled)
serialize 256 logs (protobuf) x 1,608 ops/sec ±0.54% (98 runs sampled)
serialize 512 logs (protobuf) x 807 ops/sec ±0.27% (94 runs sampled)
serialize 1024 logs (protobuf) x 397 ops/sec ±0.41% (92 runs sampled)
serialize 1 log (json) x 217,570 ops/sec ±0.14% (97 runs sampled)
serialize 100 logs (json) x 3,874 ops/sec ±0.17% (100 runs sampled)
serialize 256 logs (json) x 1,470 ops/sec ±0.26% (100 runs sampled)
serialize 512 logs (json) x 745 ops/sec ±0.14% (99 runs sampled)
serialize 1024 logs (json) x 371 ops/sec ±0.30% (96 runs sampled)

Disclosure of AI use: I used GitHub copilot with Claude Sonnet 4.5 to fill in most of the logic from an interface that I initially set up. I did apply a bunch of optimizations on afterwards to make it my own, but it was a significant help to get the basics up and running.

Ref #6100
Ref #6221

Short description of the changes

Type of change

  • Refactor

How Has This Been Tested?

  • existing unit tests
  • existing e2e tests
  • added unit tests

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 91.05431% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.26%. Comparing base (b2025ca) to head (99da24c).
⚠️ Report is 178 commits behind head on main.

Files with missing lines Patch % Lines
...transformer/src/common/protobuf/protobuf-writer.ts 80.15% 26 Missing ⚠️
...p-transformer/src/logs/protobuf/logs-serializer.ts 98.11% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6228      +/-   ##
==========================================
- Coverage   95.40%   95.26%   -0.15%     
==========================================
  Files         317      320       +3     
  Lines        9514     9823     +309     
  Branches     2192     2235      +43     
==========================================
+ Hits         9077     9358     +281     
- Misses        437      465      +28     
Files with missing lines Coverage Δ
...ansformer/src/common/protobuf/common-serializer.ts 100.00% <100.00%> (ø)
...ackages/otlp-transformer/src/logs/protobuf/logs.ts 100.00% <100.00%> (ø)
...p-transformer/src/logs/protobuf/logs-serializer.ts 98.11% <98.11%> (ø)
...transformer/src/common/protobuf/protobuf-writer.ts 80.15% <80.15%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@trentm
Copy link
Copy Markdown
Contributor

trentm commented Dec 17, 2025

Pretty cool, Marc.

Comment on lines +126 to +132
suite.add('transform 512 logs (protobuf)', function () {
ProtobufLogsSerializer.serializeRequest(logs);
});

function createSpan() {
const span = tracer.startSpan('span');
span.setAttribute('aaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaa');
span.setAttribute('bbbbbbbbbbbbbbbbbbbb', 'bbbbbbbbbbbbbbbbbbbb');
span.setAttribute('cccccccccccccccccccc', 'cccccccccccccccccccc');
span.setAttribute('dddddddddddddddddddd', 'dddddddddddddddddddd');
span.setAttribute('eeeeeeeeeeeeeeeeeeee', 'eeeeeeeeeeeeeeeeeeee');
span.setAttribute('ffffffffffffffffffff', 'ffffffffffffffffffff');
span.setAttribute('gggggggggggggggggggg', 'gggggggggggggggggggg');
span.setAttribute('hhhhhhhhhhhhhhhhhhhh', 'hhhhhhhhhhhhhhhhhhhh');
span.setAttribute('iiiiiiiiiiiiiiiiiiii', 'iiiiiiiiiiiiiiiiiiii');
span.setAttribute('jjjjjjjjjjjjjjjjjjjj', 'jjjjjjjjjjjjjjjjjjjj');
span.end();
suite.add('transform 512 logs (json)', function () {
JsonLogsSerializer.serializeRequest(logs);
});
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: probably 512 logs is the most common case, since it's the default batch size - so I opted to keep just these here.


public pos: number = 0;

constructor(initialSize = 65536) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I assume this is not yet ready for production as it allocates one big object on every export. as such, this PR is not marked as ready yet. Unfortunately I will be ooo until Jan 7 2026, so I won't get back to this until then - I do believe that this PR is almost there though.

Possible improvements:

  • We could keep around the buffer and re-use it between exports, but depending on size/how long it is around I would assume it would be moved to old-space and negatively impact GC performance. I'm not sure how much of a problem that is.
  • Also, if we want to re-use the buffer, we will want to change the transformer consts to be factory functions instead, to give each exporter their own buffer to work with instead of having it shared.
  • Another thing I noticed while working on this: now that we serialize our own OTLP messages, we could implement export request size limiting where we split the message into multiple Uint8Array that includes each a resource and scope, but splits the LogRecords across messages. So if the endpoint has a limit we can deal with that - also we'd be able to limit to 64KB for sendBeacon/fetch with keepalive. That'd need exporter work + changes to the ISerializer#serializeRequest() interface. But that's something to look at later.

cc @trentm since you were talking about possibly having some time to look into exporter performance while I'm out. Feel free to pick this up if you like 🙂

@trentm
Copy link
Copy Markdown
Contributor

trentm commented Jan 12, 2026

Marc,

I played with this a little bit last week, mostly trying to grok perf impact if any, and then generally down rabbit holes trying to grok OTel JS perf overhead. My recollection is that I cannot push to your work branches, so I pushed to my own feature branch here:

https://github.com/trentm/opentelemetry-js/tree/trentm-custom-otlp-serializers

Besides, my changes have many TODO and HACK comments and code that make it far from a complete proposal. :)

I have a couple commits:
main...trentm:opentelemetry-js:trentm-custom-otlp-serializers

  • The first is a small addition of literal HACK_... vars to your logs-serializer.ts for turning on/off behaviour for load testing perf runs.
  • The second adds an ExportTraceServiceRequest serializer.

I'll describe some limited performance impressions in a separate comment.

@trentm
Copy link
Copy Markdown
Contributor

trentm commented Jan 13, 2026

Here are some results from some macrobenchmarking perf testing I did.

This is all run on my macOS MBP (M3 Pro, 36 GB), using node v20.19.1 (should really be testing with more recent Node.js versions).

Intro to my run-bench script

First, I have a small run-bench script to assist with runs. It takes a config file like:

// http-pino-oj.bench.jsonc
{
  "app": {
    "command": "node",
    "args": ["--import", "./telemetry-oj.mjs", "app-http-pino.js"],
    "env": {
      "OTEL_SERVICE_NAME": "app-http-pino",
    },
    "pause": 2
  },
  "loadgen": {
    "command": "hey",
    "args": ["-c", "10", "-q", "100", "-z", "180s", "http://localhost:3000/"],
    "eta": 180, // used for a progress bar
  }
}

A run will: start the app, lame 2s pause for startup, use psrecord for monitoring the app CPU and memory, apply some load (currently using hey for that), then present results. Currently it is the caller's responsibility to have any other requisite services (e.g. a database if used, an OTel collector). I'm currently using a mock collector (mockotlpserver).

An example run looks something like:

% ./bin/run-bench http-pino-oj.bench.jsonc
Bench run ID: 20260106n005
Start app under test:
  $ node --import ./telemetry-oj.mjs app-http-pino.js >results/20260106n005/app.log
Pause for 2s for app (pid=54654) to startup.
Start app process monitor (procmon):
  $ psrecord 54654 --include-children --interval 0.1 --plot results/20260106n005/procmon.png --log-format csv --log results/20260106n005/procmon.csv
Attaching to process 54654
Start loadgen:
  $ hey -c 10 -q 100 -z 180s http://localhost:3000/ >results/20260106n005/loadgen.log

App closed gracefully.
Procmon closed gracefully.
Done bench run "20260106n005". Results in "results/20260106n005/"

=== Summary (for now just dump loadgen.log) ===
...

I'll try to get this into a repo soon to show exactly what I'm running.

Small app

A small app-http-pino.js app that serves an HTTP response, and logs a single pino log record per request. The point of doing some logging was to test your log-serializer.

'use strict';
const http = require('http');
const pino = require('pino');

const log = pino();
const server = http.createServer((req, res) => {
    // Not reading the req body. Should perhaps do that.
    log.info({method: req.method, headers: req.headers}, 'http request');
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end('OK\n');
});

server.listen(3000, () => {
  console.log('Server running at http://localhost:3000/');
});
Baseline run

A baseline run of the small app with no instrumentation:

% bin/run-bench http-pino.bench.jsonc
Bench run ID: 20260106n002
Start app under test:
  $ node app-http-pino.js >results/20260106n002/app.log
Pause for 2s for app (pid=53987) to startup.
Start app process monitor (procmon):
  $ psrecord 53987 --include-children --interval 0.1 --plot results/20260106n002/procmon.png --log-format csv --log results/20260106n002/procmon.csv
Attaching to process 53987
Start loadgen:
  $ hey -c 10 -q 100 -z 180s http://localhost:3000/ >results/20260106n002/loadgen.log

App closed gracefully.
Procmon closed gracefully.
Done bench run "20260106n002". Results in "results/20260106n002/"

=== Summary (for now just dump loadgen.log) ===

Summary:
  Total:	180.0024 secs
  Slowest:	0.0174 secs
  Fastest:	0.0001 secs
  Average:	0.0004 secs
  Requests/sec:	999.9869


Response time histogram:
  0.000 [1]	|
  0.002 [179960]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.004 [29]	|
  0.005 [0]	|
  0.007 [0]	|
  0.009 [0]	|
  0.010 [0]	|
  0.012 [1]	|
  0.014 [2]	|
  0.016 [3]	|
  0.017 [4]	|


Latency distribution:
  10% in 0.0002 secs
  25% in 0.0002 secs
  50% in 0.0003 secs
  75% in 0.0005 secs
  90% in 0.0007 secs
  95% in 0.0008 secs
  99% in 0.0011 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0001 secs, 0.0174 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0033 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0011 secs
  resp wait:	0.0003 secs, 0.0000 secs, 0.0108 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0014 secs

Status code distribution:
  [200]	180000 responses
procmon2
Instrumented with current OTel JS main

Instrumented with current OTel JS on main (telemetry-oj.mjs):

% cat telemetry-oj.mjs
// Setup telemetry using OpenTelemetry JS

import * as os from 'os';
import {NodeSDK} from '@opentelemetry/sdk-node';
import {getNodeAutoInstrumentations} from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
    instrumentations: getNodeAutoInstrumentations(),
});

process.on('SIGTERM', async () => {
    try {
        await sdk.shutdown();
    } catch (err) {
        console.warn('warning: error shutting down OTel SDK', err);
    }
    process.exit(128 + os.constants.signals.SIGTERM);
});

process.once('beforeExit', async () => {
    try {
        await sdk.shutdown();
    } catch (err) {
        console.warn('warning: error shutting down OTel SDK', err);
    }
});

sdk.start();

Results:

% ./bin/run-bench http-pino-oj.bench.jsonc
Bench run ID: 20260106n005
Start app under test:
  $ node --import ./telemetry-oj.mjs app-http-pino.js >results/20260106n005/app.log
Pause for 2s for app (pid=54654) to startup.
Start app process monitor (procmon):
  $ psrecord 54654 --include-children --interval 0.1 --plot results/20260106n005/procmon.png --log-format csv --log results/20260106n005/procmon.csv
Attaching to process 54654
Start loadgen:
  $ hey -c 10 -q 100 -z 180s http://localhost:3000/ >results/20260106n005/loadgen.log

App closed gracefully.
Procmon closed gracefully.
Done bench run "20260106n005". Results in "results/20260106n005/"

=== Summary (for now just dump loadgen.log) ===

Summary:
  Total:	180.0028 secs
  Slowest:	0.0338 secs
  Fastest:	0.0001 secs
  Average:	0.0007 secs
  Requests/sec:	999.8288


Response time histogram:
  0.000 [1]	|
  0.003 [177798]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.007 [665]	|
  0.010 [499]	|
  0.014 [724]	|
  0.017 [235]	|
  0.020 [33]	|
  0.024 [2]	|
  0.027 [3]	|
  0.030 [3]	|
  0.034 [9]	|


Latency distribution:
  10% in 0.0002 secs
  25% in 0.0003 secs
  50% in 0.0005 secs
  75% in 0.0007 secs
  90% in 0.0011 secs
  95% in 0.0014 secs
  99% in 0.0048 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0001 secs, 0.0338 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0031 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0006 secs
  resp wait:	0.0006 secs, 0.0001 secs, 0.0338 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0015 secs

Status code distribution:
  [200]	179972 responses
procmon2
Instrumented with the custom logs-serializer.ts in this PR
% diff -u http-pino-oj.bench.jsonc http-pino-oj-custom-serializer.bench.jsonc
--- http-pino-oj.bench.jsonc	2026-01-05 16:44:38
+++ http-pino-oj-custom-serializer.bench.jsonc	2026-01-05 16:44:45
@@ -1,7 +1,7 @@
 {
   "app": {
     "command": "node",
-    "args": ["--import", "./telemetry-oj.mjs", "app-http-pino.js"],
+    "args": ["--import", "./telemetry-oj-custom-serializer.mjs", "app-http-pino.js"],
     "env": {
       "OTEL_SERVICE_NAME": "app-http-pino",
     },

where "telemetry-oj-custom-serializer.mjs" points to the local working copy with your feature branch:

...
import {NodeSDK} from './oj8/experimental/packages/opentelemetry-sdk-node/build/src/index.js';
...

Run results:

% ./bin/run-bench http-pino-oj-custom-serializer.bench.jsonc
Bench run ID: 20260106n006
Start app under test:
  $ node --import ./telemetry-oj-custom-serializer.mjs app-http-pino.js >results/20260106n006/app.log
Pause for 2s for app (pid=54964) to startup.
Start app process monitor (procmon):
  $ psrecord 54964 --include-children --interval 0.1 --plot results/20260106n006/procmon.png --log-format csv --log results/20260106n006/procmon.csv
Attaching to process 54964
Start loadgen:
  $ hey -c 10 -q 100 -z 180s http://localhost:3000/ >results/20260106n006/loadgen.log

App closed gracefully.
Procmon closed gracefully.
Done bench run "20260106n006". Results in "results/20260106n006/"

=== Summary (for now just dump loadgen.log) ===

Summary:
  Total:	180.0029 secs
  Slowest:	0.0333 secs
  Fastest:	0.0001 secs
  Average:	0.0007 secs
  Requests/sec:	999.8841


Response time histogram:
  0.000 [1]	|
  0.003 [178172]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.007 [517]	|
  0.010 [791]	|
  0.013 [342]	|
  0.017 [134]	|
  0.020 [13]	|
  0.023 [2]	|
  0.027 [4]	|
  0.030 [0]	|
  0.033 [6]	|


Latency distribution:
  10% in 0.0002 secs
  25% in 0.0004 secs
  50% in 0.0005 secs
  75% in 0.0007 secs
  90% in 0.0011 secs
  95% in 0.0014 secs
  99% in 0.0035 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0001 secs, 0.0333 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0016 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0005 secs
  resp wait:	0.0006 secs, 0.0001 secs, 0.0333 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200]	179982 responses
procmon2
Flamegraphs for the above scenarios

Here are some flamegraphs (captured via the excellent 0x) for the three cases above. Capture stacks impacts performance (at least on macos), so these are from separate runs to the latency runs above.

Screenshot 2026-01-12 at 4 01 32 PM Screenshot 2026-01-12 at 4 01 35 PM Screenshot 2026-01-12 at 4 01 40 PM

The main thing I'm showing here is that the logs-serialization shows a visible perf improvement (the narrower stack).

Conclusions so far

Latency

Here are the latency numbers ("99%" is p99, for example) for the 3 runs shown above, plus one with my added custom trace-serializer.ts side-by-side for comparison.

Latency distribution:
  BASELINE          MAIN              LOGS-SERIALIZER   TRACE-SERIALIZER
  10% in 0.0002s    10% in 0.0002s    10% in 0.0002s    10% in 0.0003s
  25% in 0.0002s    25% in 0.0003s    25% in 0.0004s    25% in 0.0004s
  50% in 0.0003s    50% in 0.0005s    50% in 0.0005s    50% in 0.0005s
  75% in 0.0005s    75% in 0.0007s    75% in 0.0007s    75% in 0.0008s
  90% in 0.0007s    90% in 0.0011s    90% in 0.0011s    90% in 0.0012s
  95% in 0.0008s    95% in 0.0014s    95% in 0.0014s    95% in 0.0016s
  99% in 0.0011s    99% in 0.0048s    99% in 0.0035s    99% in 0.0028s
  • Instrumentation definitely has a latency impact. For example p95 goes from 0.8ms (baseline) to 1.4ms (main). A warning that there is some variance in test runs here, so there are error bars. Also, it isn't clear to me if measuring sub-1ms is nearing the accuracy of measurement with this tooling.
  • p99 latency can be very noticeable with instrumentation. My guess is that this is due to latency bubbles due to slow, synchronous OTLP serialization during export.
  • The custom OTLP serializers here noticeably improved p99. Again my guess is that this is due to reducing latency bubbles with faster serialization.
  • The custom OTLP logs-serializer had a noticeable improvement in the flamegraphs above. (I don't have a flamegraph of a run with the trace-serializer.)

CPU

  • The psrecord-based charts above show an obvious increase in average CPU from (I'm eyeballing the numbers) ~8% to ~15% with instrumentation, also with noticeable spikes in CPU. It is possible those CPU spikes are when the psrecord interval coincided with OTLP export, but I don't have evidence of that. Mostly, I think, this just shows that more work is being done with instrumentation.
  • We don't know how CPU overhead scales with load.

Memory

  • In my various runs, the memory usage went from a stable 40MB to a stable ~130MB -- including over a longer 15 minute load run.
  • We don't know how memory overhead scales with load. Is it a roughly constant 90MB overhead to have telemetry? Does it scale with number of spans? log-records? metrics cardinality?
  • Some test runs re-using the 64k+ buffer for serialization did not have a measurable impact that I could see. I haven't done memory profiling, but my guess is that the 64k churn is not that significant an amount of memory compared to the other memory usage/churn in the instrumentation.

Note that I've done a number of other testing runs locally that I haven't presented, including:

  • Re-using the 64k+ buffer for serialization.
  • With the custom trace-serializer.ts I added.
  • Longer runs to see if 120-130MB memory usage stays stable.

Subsequent work

Lots that could be done. To get this PR over the line:

  • Complete this custom OTLP serialization (testing, verify correctness, metrics implementation, custom OTLP response deserialization?).
  • Discuss whether maintenance burden of a custom serializer for OTLP is acceptable.

To get more confidence in macrobenchmarking numbers for OTel JS:

  • Do perf runs on linux (more representative of production usage), likely on cloud deployments for reproducibility (modulo noisy neighbours)
  • Do multiple of the above runs to get some statistical confidence in the latency values.
  • Test with other load generators to gain confidence. I know autocannon gets used a log in Node.js-land. It would be nice to see somewhat reproducible numbers with other load-generators.
  • Test with other loads to see if (a) CPU overhead scales with load, and (b) if memory overhead scales with load.
  • Test with an app that generates more spans/logs for the same load to see if overhead scales with telemetry data.
  • Test with apps using different functionality and instrumentations. Currenly I've effectively only tested the core SDK and instrumentation-http + instrumentation-pino.
  • Test with more recent versions of node to see if there are improvements. E.g. AsyncLocalStorage overhead was meant to have improved in Node.js v24.

@trentm
Copy link
Copy Markdown
Contributor

trentm commented Jan 13, 2026

Here is a screenshot of flamegraphs before (current main) and after (with both the custom logs-serializer and trace-serializer changes), showing the improvement (narrow stacks) for both:

Screenshot 2026-01-12 at 4 47 07 PM

@pichlermarc
Copy link
Copy Markdown
Member Author

Thank you @trentm for the in-depth analysis - we were able to land these changes (slightly modified) in #6530 🙂
I also opened #6530 for deserialization. Once landed, we can also start migrating the other signals and remove protobuf.js as a dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants