feat(otlp-transformer): add custom protobuf logs serializer#6228
feat(otlp-transformer): add custom protobuf logs serializer#6228pichlermarc wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6228 +/- ##
==========================================
- Coverage 95.40% 95.26% -0.15%
==========================================
Files 317 320 +3
Lines 9514 9823 +309
Branches 2192 2235 +43
==========================================
+ Hits 9077 9358 +281
- Misses 437 465 +28
🚀 New features to boost your workflow:
|
|
Pretty cool, Marc. |
75067fb to
99da24c
Compare
| suite.add('transform 512 logs (protobuf)', function () { | ||
| ProtobufLogsSerializer.serializeRequest(logs); | ||
| }); | ||
|
|
||
| function createSpan() { | ||
| const span = tracer.startSpan('span'); | ||
| span.setAttribute('aaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaa'); | ||
| span.setAttribute('bbbbbbbbbbbbbbbbbbbb', 'bbbbbbbbbbbbbbbbbbbb'); | ||
| span.setAttribute('cccccccccccccccccccc', 'cccccccccccccccccccc'); | ||
| span.setAttribute('dddddddddddddddddddd', 'dddddddddddddddddddd'); | ||
| span.setAttribute('eeeeeeeeeeeeeeeeeeee', 'eeeeeeeeeeeeeeeeeeee'); | ||
| span.setAttribute('ffffffffffffffffffff', 'ffffffffffffffffffff'); | ||
| span.setAttribute('gggggggggggggggggggg', 'gggggggggggggggggggg'); | ||
| span.setAttribute('hhhhhhhhhhhhhhhhhhhh', 'hhhhhhhhhhhhhhhhhhhh'); | ||
| span.setAttribute('iiiiiiiiiiiiiiiiiiii', 'iiiiiiiiiiiiiiiiiiii'); | ||
| span.setAttribute('jjjjjjjjjjjjjjjjjjjj', 'jjjjjjjjjjjjjjjjjjjj'); | ||
| span.end(); | ||
| suite.add('transform 512 logs (json)', function () { | ||
| JsonLogsSerializer.serializeRequest(logs); | ||
| }); |
There was a problem hiding this comment.
note: probably 512 logs is the most common case, since it's the default batch size - so I opted to keep just these here.
|
|
||
| public pos: number = 0; | ||
|
|
||
| constructor(initialSize = 65536) { |
There was a problem hiding this comment.
note: I assume this is not yet ready for production as it allocates one big object on every export. as such, this PR is not marked as ready yet. Unfortunately I will be ooo until Jan 7 2026, so I won't get back to this until then - I do believe that this PR is almost there though.
Possible improvements:
- We could keep around the buffer and re-use it between exports, but depending on size/how long it is around I would assume it would be moved to old-space and negatively impact GC performance. I'm not sure how much of a problem that is.
- Also, if we want to re-use the buffer, we will want to change the transformer
consts to be factory functions instead, to give each exporter their own buffer to work with instead of having it shared. - Another thing I noticed while working on this: now that we serialize our own OTLP messages, we could implement export request size limiting where we split the message into multiple
Uint8Arraythat includes each a resource and scope, but splits theLogRecords across messages. So if the endpoint has a limit we can deal with that - also we'd be able to limit to 64KB forsendBeacon/fetchwith keepalive. That'd need exporter work + changes to theISerializer#serializeRequest()interface. But that's something to look at later.
cc @trentm since you were talking about possibly having some time to look into exporter performance while I'm out. Feel free to pick this up if you like 🙂
|
Marc, I played with this a little bit last week, mostly trying to grok perf impact if any, and then generally down rabbit holes trying to grok OTel JS perf overhead. My recollection is that I cannot push to your work branches, so I pushed to my own feature branch here: https://github.com/trentm/opentelemetry-js/tree/trentm-custom-otlp-serializers Besides, my changes have many TODO and HACK comments and code that make it far from a complete proposal. :) I have a couple commits:
I'll describe some limited performance impressions in a separate comment. |
|
Here are some results from some macrobenchmarking perf testing I did. This is all run on my macOS MBP (M3 Pro, 36 GB), using node v20.19.1 (should really be testing with more recent Node.js versions). Intro to my run-bench scriptFirst, I have a small // http-pino-oj.bench.jsonc
{
"app": {
"command": "node",
"args": ["--import", "./telemetry-oj.mjs", "app-http-pino.js"],
"env": {
"OTEL_SERVICE_NAME": "app-http-pino",
},
"pause": 2
},
"loadgen": {
"command": "hey",
"args": ["-c", "10", "-q", "100", "-z", "180s", "http://localhost:3000/"],
"eta": 180, // used for a progress bar
}
}A run will: start the app, lame 2s pause for startup, use An example run looks something like: I'll try to get this into a repo soon to show exactly what I'm running. Small appA small 'use strict';
const http = require('http');
const pino = require('pino');
const log = pino();
const server = http.createServer((req, res) => {
// Not reading the req body. Should perhaps do that.
log.info({method: req.method, headers: req.headers}, 'http request');
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('OK\n');
});
server.listen(3000, () => {
console.log('Server running at http://localhost:3000/');
});Baseline runA baseline run of the small app with no instrumentation:
Instrumented with current OTel JS mainInstrumented with current OTel JS on main ( % cat telemetry-oj.mjs
// Setup telemetry using OpenTelemetry JS
import * as os from 'os';
import {NodeSDK} from '@opentelemetry/sdk-node';
import {getNodeAutoInstrumentations} from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
instrumentations: getNodeAutoInstrumentations(),
});
process.on('SIGTERM', async () => {
try {
await sdk.shutdown();
} catch (err) {
console.warn('warning: error shutting down OTel SDK', err);
}
process.exit(128 + os.constants.signals.SIGTERM);
});
process.once('beforeExit', async () => {
try {
await sdk.shutdown();
} catch (err) {
console.warn('warning: error shutting down OTel SDK', err);
}
});
sdk.start();Results:
Instrumented with the custom logs-serializer.ts in this PR% diff -u http-pino-oj.bench.jsonc http-pino-oj-custom-serializer.bench.jsonc
--- http-pino-oj.bench.jsonc 2026-01-05 16:44:38
+++ http-pino-oj-custom-serializer.bench.jsonc 2026-01-05 16:44:45
@@ -1,7 +1,7 @@
{
"app": {
"command": "node",
- "args": ["--import", "./telemetry-oj.mjs", "app-http-pino.js"],
+ "args": ["--import", "./telemetry-oj-custom-serializer.mjs", "app-http-pino.js"],
"env": {
"OTEL_SERVICE_NAME": "app-http-pino",
},where "telemetry-oj-custom-serializer.mjs" points to the local working copy with your feature branch: ...
import {NodeSDK} from './oj8/experimental/packages/opentelemetry-sdk-node/build/src/index.js';
...Run results:
Flamegraphs for the above scenariosHere are some flamegraphs (captured via the excellent
The main thing I'm showing here is that the logs-serialization shows a visible perf improvement (the narrower stack). Conclusions so farLatency Here are the latency numbers ("99%" is p99, for example) for the 3 runs shown above, plus one with my added custom trace-serializer.ts side-by-side for comparison.
CPU
Memory
Note that I've done a number of other testing runs locally that I haven't presented, including:
Subsequent workLots that could be done. To get this PR over the line:
To get more confidence in macrobenchmarking numbers for OTel JS:
|







Which problem is this PR solving?
Important
Draft PR; things that are still missing:
NOTE: I'm ooo until Jan 7 2026, will not be able to drive this forward until then.
This PR replaces the generated code for serializing logs to OTLP/protobuf with hand-rolled code. The idea is to eliminate most intermediate steps we needed to conform to the generated code's layout and write directly to a
Uint8Array. I made this to address the performance regression discovered in #6221.I added some perf tests to show the differences:
protobuf.js (currently on
main- this is the baseline we're comparing against)custom implementation (from this PR, turns out to be pretty-much on-par with the current OTLP/json serializer, better than what we had with
protobuf.js)Disclosure of AI use: I used GitHub copilot with Claude Sonnet 4.5 to fill in most of the logic from an interface that I initially set up. I did apply a bunch of optimizations on afterwards to make it my own, but it was a significant help to get the basics up and running.
Ref #6100
Ref #6221
Short description of the changes
Type of change
How Has This Been Tested?