Skip to content

OTLP HTTP exporter default agentFactory should honor HTTPS_PROXY / NODE_USE_ENV_PROXY env vars #6637

@mayank6136

Description

@mayank6136

What happened?

The default agentFactory returned by getNodeHttpConfigurationDefaults (in otlp-exporter-base/src/configuration/otlp-node-http-configuration.ts) is httpAgentFactoryFromOptions({ keepAlive: true }), which constructs new https.Agent({ keepAlive: true }) per export. This bypasses https.globalAgent — the agent that Node replaces with EnvHttpProxyAgent when NODE_USE_ENV_PROXY=1 (or, on Node 22+ with appropriate flags, when HTTPS_PROXY/HTTP_PROXY are set).

Effect: in any runtime that injects a proxy via env vars (kubernetes pods behind an egress proxy, sandboxed runtimes that L7-proxy outbound HTTP, dev environments with a local capture proxy), the default OTLP HTTP exporter silently fails to traverse the proxy. Users only discover this when their backend stays empty.

This is acknowledged territory — #5835 was closed pointing at the merged work in #5711 / #5719, which DID add httpAgentOptions factory support so users can pass a proxy-aware agent themselves. That feature is great. This issue is the follow-up: the default behavior still ignores env-proxy, so users have to read the docs, learn that the SDK overrides globalAgent, find a proxy-agent library, and wire it in. The diagnostic surface is a silent failure — the exporter dies with ECONNREFUSED or EAI_AGAIN and the SDK's diag logger isn't wired by default, so operators see nothing.

In plain English: if the user's environment sets HTTPS_PROXY=…, the OTel exporter ignores it and tries to connect directly. In sandboxed/cloud environments where the proxy was the only path out, spans stop landing — and the user gets no visible error.

Steps to Reproduce

Tested-against:

  • @opentelemetry/sdk-trace-node (SDK v2)
  • @opentelemetry/otlp-exporter-base v0.203.0
  • @opentelemetry/exporter-trace-otlp-http v0.203.0
  • Node v22.22.2

Setup:

  1. Stand up an OTel collector reachable via a public HTTPS hostname (e.g. through a Cloudflare quick tunnel: cloudflared tunnel --url http://localhost:4318).
  2. Run any OTLP-emitting process in a container that has HTTPS_PROXY=http://<your-proxy-host>:<port> injected into its environment. The proxy is the only path that can reach the public collector hostname; direct connect/DNS fails.
  3. Use the default OTel SDK configuration (no explicit httpAgentOptions, no factory override) and point it at the public collector hostname.
  4. Generate spans.
  5. Watch the backend stay empty. With default log levels, the SDK produces no error.
  6. Add diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG) and observe Error: getaddrinfo EAI_AGAIN <hostname> (or ECONNREFUSED) — proving the SDK is bypassing the proxy.
  7. As a control, in the same environment, node -e 'https.request({hostname: "<same-host>", ...})' succeeds (200) — Node's default https.globalAgent IS replaced with EnvHttpProxyAgent when the env vars are set, so the standard https module works. Only the SDK's default-factory path fails.

Expected Result

When the user has not explicitly provided an agentFactory AND the runtime has env-proxy configured (any of NODE_USE_ENV_PROXY=1, HTTPS_PROXY, https_proxy, HTTP_PROXY, http_proxy), the default factory should return globalAgent (proxy-aware) instead of a fresh Agent.

Actual Result

Default factory always returns new Agent({ keepAlive: true }), which is not proxy-aware regardless of env vars. The exporter then attempts direct egress, fails, and silently retries.

Proposed fix

In httpAgentFactoryFromOptions (or in the default-config path), conditionally fall back to globalAgent when env-proxy is configured:

 export function httpAgentFactoryFromOptions(
   options: http.AgentOptions | https.AgentOptions
 ): HttpAgentFactory {
   return async protocol => {
     const isInsecure = protocol === 'http:';
     const module = isInsecure ? import('http') : import('https');
     const { Agent } = await module;
+
+    // Honor Node's env-proxy configuration: globalAgent IS the EnvHttpProxyAgent
+    // when NODE_USE_ENV_PROXY=1 or HTTPS_PROXY/http_proxy are set. A fresh
+    // `new Agent(...)` skips it. Users who want proxy-aware default behavior
+    // shouldn't have to wire a proxy-agent library by hand.
+    if (process.env.NODE_USE_ENV_PROXY === '1' ||
+        process.env.HTTPS_PROXY || process.env.https_proxy ||
+        process.env.HTTP_PROXY  || process.env.http_proxy) {
+      const mod = await module;
+      return (mod as any).globalAgent;
+    }
+
     if (isInsecure) {
       // eslint-disable-next-line @typescript-eslint/no-unused-vars
       const { ca, cert, key, ...insecureOptions } = options as https.AgentOptions;
       return new Agent(insecureOptions);
     }
     return new Agent(options);
   };
 }

A user who explicitly opts out (passes their own factory or wants the non-proxy behavior) is not affected — only the default changes.

Validated locally as a runtime patch on v0.203.0 in a sandbox where openshell injects HTTPS_PROXY=http://10.200.0.1:3128 and the only reachable hostname is on the proxied side. Spans flow through the proxy as expected after the patch. Without the patch, exports silently fail with EAI_AGAIN against the openshell DNS proxy.

Alternatives considered

  • Always use globalAgent from the default factory. Simpler, but loses the SDK's keepAlive: true default. The patch above preserves keepAlive when env-proxy is not set.
  • Document that users must wire https-proxy-agent themselves. Existing path post-feat(exporter-otlp-*)!: support custom HTTP agents #5719. Works, but is high-friction and the silent-failure mode means most users will only discover the gap when their backend is empty.
  • Auto-detect NO_PROXY patterns inside the factory (since EnvHttpProxyAgent already does this when used). Out of scope here — the proposed patch delegates to globalAgent which already honors NO_PROXY / no_proxy.
  • Add OTEL_EXPORTER_OTLP_PROXY as a sibling env var. Plausible follow-up but not a substitute — node's standard env vars are what most container/orchestration platforms inject by default.

Test plan

  • Add a unit test for httpAgentFactoryFromOptions: with NODE_USE_ENV_PROXY=1 set in process.env, the returned factory should return globalAgent. Without it, it should return a fresh Agent.
  • Add an integration test: spin a small HTTP-CONNECT proxy + mock OTLP receiver in the existing test harness; assert spans flow through the proxy when HTTPS_PROXY points at it.
  • Regression: confirm no existing httpAgentFactoryFromOptions callers break.

Risk / blast radius

  • Behavioral change: users behind a proxy who ALSO relied on the SDK's keepAlive: true default will see those overridden by globalAgent's defaults (which already are proxy-aware via EnvHttpProxyAgent, but may not pin keepAlive the same way). Mitigation: env-proxy users today are broken at the default level, so any working configuration they have is via explicit factory injection — that path is unaffected.
  • Detectable side effect: users who set HTTPS_PROXY for other tools and didn't expect OTel to honor it will now have OTel honor it. Mitigation: this matches Node 22+ default behavior for fetch — bringing OTel into line with standard Node networking conventions.
  • Backwards compatibility: the explicit-factory escape hatch added in feat(exporter-otlp-*)!: support custom HTTP agents #5719 still lets users opt out by passing agentFactory: () => new https.Agent({...}) directly.

Open questions

  1. Is httpAgentFactoryFromOptions the right home for the conditional, or should the default-config builder (getNodeHttpConfigurationDefaults) check env-proxy and select between two factories? The former is narrower; the latter is more explicit.
  2. Should the SDK emit a warning at start when it detects env-proxy and falls back to globalAgent, so operators have a visible signal? (I'd say yes — silent magic is exactly the failure mode this fixes.)
  3. Preferred env-var matrix — just HTTPS_PROXY/http_proxy, or also OTEL_EXPORTER_OTLP_PROXY / similar OTel-specific override?
  4. Independent of this fix: would the maintainers entertain wiring diag.setLogger automatically when OTEL_LOG_LEVEL is set in the env? Operators who hit silent-failure modes have to add a code change today to surface the underlying SDK error. Out of scope here — flagging for context.

Real-world repro context: this surfaced while debugging a sandbox-runtime deployment where HTTPS_PROXY is the only egress path, the OTel SDK's exporter silently failed, and root-causing took ~3 hours of investigation through the SDK source. Happy to share full ClickHouse query output, gateway log, and SDK debug logs if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions