Skip to content

Conversation

@jensneuse
Copy link
Member

@jensneuse jensneuse commented Nov 26, 2025

Summary by CodeRabbit

  • New Features

    • Configurable header propagation across subgraphs and header-aware subscriptions; new inbound/request deduplication controls exposed.
  • Performance

    • Increased default HTTP connection pooling for higher concurrency.
  • Tests

    • Expanded deduplication and subscription test coverage; updated telemetry and Prometheus expectations; added header-propagation unit tests.
  • Chores

    • Dependency upgrades and added tracing attribute to mark resolver deduplications; various test and cleanup refinements.

✏️ Tip: You can customize this high-level summary in your review settings.

# Conflicts:
#	router/core/graph_server.go
#	router/core/graphql_prehandler.go
#	router/pkg/config/config.schema.json
# Conflicts:
#	router/core/graphql_prehandler.go
# Conflicts:
#	demo/pkg/subgraphs/employees/employees.go
#	router-tests/testenv/testdata/config.json
#	router-tests/testenv/testdata/configWithEdfs.json
#	router-tests/testenv/testdata/configWithPlugins.json
# Conflicts:
#	router/pkg/pubsub/kafka/engine_datasource.go
#	router/pkg/pubsub/kafka/engine_datasource_test.go
#	router/pkg/pubsub/nats/engine_datasource.go
#	router/pkg/pubsub/nats/engine_datasource_test.go
#	router/pkg/pubsub/redis/engine_datasource.go
#	router/pkg/pubsub/redis/engine_datasource_test.go
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/supervisor.go (1)

137-150: Startup error wrapping now favors the sentinel over the underlying cause

Using fmt.Errorf("%w: ...: %v", ErrStartupFailed, err) makes ErrStartupFailed the only unwrap target, and the original err is now only present in the message text. That’s good if you mainly want errors.Is(startErr, ErrStartupFailed) to work, but it also means callers can no longer use errors.Is/As to inspect the underlying failure returned by loadConfig, createRouter, or startRouter.

If any code paths currently rely on unwrapping the original cause from Start(), this is a behavioral change worth double‑checking. If you’d like to keep both:

  • Sentinel + cause both discoverable via errors.Is while still having a single %w, you could do something like:
return fmt.Errorf("failed to load config: %w", errors.Join(ErrStartupFailed, err))

and similarly for the other two sites.

This preserves structured cause chains without losing the ErrStartupFailed sentinel semantics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d9090f and 999c868.

📒 Files selected for processing (3)
  • router-tests/prometheus_test.go (4 hunks)
  • router/core/supervisor.go (1 hunks)
  • router/pkg/grpcconnector/grpcpluginoci/image_unpack.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • router-tests/prometheus_test.go
  • router/pkg/grpcconnector/grpcpluginoci/image_unpack.go
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/telemetry/telemetry_test.go (1)

3948-3961: Consider asserting the resolver‑dedup attribute anywhere you bump attribute counts

In these other Operation - Execute span checks you only increase require.Len(t, sn[7].Attributes(), …) but don’t assert that otel.WgResolverDeduplicatedRequest is actually present. For stronger, less “magic number” style coverage, you could mirror the earlier pattern and also:

require.Contains(t, sn[7].Attributes(), otel.WgResolverDeduplicatedRequest.Bool(false))

in each of these contexts.

Also applies to: 4403-4415, 6384-6387, 8837-8843

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 999c868 and f201b2c.

⛔ Files ignored due to path filters (2)
  • router-tests/go.sum is excluded by !**/*.sum
  • router/go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • router-tests/go.mod (2 hunks)
  • router-tests/telemetry/telemetry_test.go (6 hunks)
  • router-tests/testenv/testenv.go (4 hunks)
  • router/core/graphql_handler.go (6 hunks)
  • router/go.mod (2 hunks)
  • router/pkg/otel/attributes.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router-tests/testenv/testenv.go
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/graphql_handler.go
  • router-tests/go.mod
  • router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/graphql_handler.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router/go.mod
  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/go.mod
  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/go.mod
  • router-tests/go.mod
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/go.mod
  • router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
🧬 Code graph analysis (2)
router/core/graphql_handler.go (3)
router/core/header_rule_engine.go (2)
  • HeaderPropagation (137-146)
  • HeaderPropagationWriter (93-105)
router/core/context.go (1)
  • SubgraphHeadersBuilder (298-347)
router/pkg/otel/attributes.go (2)
  • WgAcquireResolverWaitTimeMs (37-37)
  • WgResolverDeduplicatedRequest (38-38)
router-tests/telemetry/telemetry_test.go (1)
router/pkg/otel/attributes.go (1)
  • WgResolverDeduplicatedRequest (38-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (12)
router-tests/go.mod (2)

208-214: Critical issue from previous review appears to be resolved.

The graphql-go-tools/v2 replace directive is now properly commented out (line 213), which addresses the previous review's critical finding about the invalid path breaking builds. This aligns with the project's guidance in CONTRIBUTING.md to use personal go.work files for local development with external repositories rather than active replace directives in committed go.mod files.


16-16: Dependency updates look appropriate.

  • mcp-go pinned to v0.36.0 (stable version, not v0.38.0 which has regressions per learnings)
  • redis/go-redis/v9 upgraded to v9.7.3
  • WunderGraph dependencies updated consistently (astjson, demo, router, graphql-go-tools)
  • go-arena added as indirect dependency (v1.1.0)

Also applies to: 20-20, 25-25, 26-26, 28-28, 30-30, 154-154

router/go.mod (2)

82-83: Address production readiness concerns for go-arena dependency.

A prior review flagged that wundergraph/go-arena v1.1.0 is a new, minimally-maintained package with few commits and limited stars, making it a vendor-lock risk before production use. Ensure the team has evaluated this dependency and is comfortable with its maturity level for the memory management improvements this PR introduces.

Can you confirm that the codebase changes sufficiently justify adding this new dependency and that team consensus supports production adoption? If this is a critical path feature, consider whether fallback options or gradual rollout are needed.


34-34: The graphql-go-tools/v2 pseudoversion is intentional development pinning.

The commit hash 85774faf3ed4 exists on the feature branch feat/improve-memory-usage-with-arenas in the graphql-go-tools repository. This is coordinated with the addition of the go-arena dependency (line 83) for memory optimization. The pseudoversion is stable and intentional; however, ensure this feature branch is merged to a release tag before moving to production, as the current pinning targets a development branch rather than a stable release.

router/core/graphql_handler.go (6)

80-80: LGTM: HeaderPropagation field wiring follows standard pattern.

The HeaderPropagation field is correctly added to HandlerOptions and wired through to GraphQLHandler's internal field following the established pattern in this codebase.

Also applies to: 102-102, 128-128


150-150: LGTM: Resolve context correctly populated with hash values.

The VariablesHash and Request.ID fields are appropriately populated from the operation context, aligning with the PR's objective to move variables hashing into the operation context.

Also applies to: 154-154


193-193: LGTM: HeaderPropagationWriter correctly used.

The call to HeaderPropagationWriter correctly passes the response writer, resolve context, and setContentLength flag. The function signature matches the implementation in router/core/header_rule_engine.go.


195-196: LGTM: Arena-based resolution improves memory management.

The switch from ResolveGraphQLResponse to ArenaResolveGraphQLResponse aligns with the PR's memory management objectives. The returned info object provides timing metrics for observability, and extracting data sources after resolution ensures we capture subgraph names even when errors occur.


202-203: LGTM: Timing metrics correctly updated.

The timing metrics are now properly sourced from the info struct returned by ArenaResolveGraphQLResponse. The attributes WgAcquireResolverWaitTimeMs and WgResolverDeduplicatedRequest are correctly defined in the otel package.


163-166: No issue - BuildRequestHeaderForSubgraph safely handles nil headerPropagation.

The BuildRequestHeaderForSubgraph method (router/core/header_rule_engine.go:314) includes a nil receiver guard that returns safe defaults when headerPropagation is nil:

if h == nil || h.rules == nil || ctx == nil || ctx.Request() == nil {
  return http.Header{}, 0
}

Passing h.headerPropagation to SubgraphHeadersBuilder at line 163 is safe even when the field is nil, as the method handles this case defensively.

router/pkg/otel/attributes.go (1)

37-42: New resolver dedup OTEL attribute looks consistent

The WgResolverDeduplicatedRequest key name and string value follow the existing naming and namespacing conventions for engine resolver metrics. No issues from an API or compatibility perspective.

router-tests/telemetry/telemetry_test.go (1)

3225-3238: Resolver dedup telemetry assertion on Operation‑Execute span looks solid

Bumping the attribute count to 12 and explicitly asserting otel.WgResolverDeduplicatedRequest.Bool(false) on the Operation - Execute span is consistent with the new OTEL attribute and gives clear coverage of the new behavior.

# Conflicts:
#	router-tests/testenv/testenv.go
#	router/core/graph_server.go
#	router/core/graphql_handler.go
#	router/core/operation_processor.go
#	router/pkg/config/config.go
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
router-tests/go.mod (1)

213-213: Verify the graphql-go-tools/v2 replace path is correct.

The replace directive for graphql-go-tools/v2 is now properly commented out, which addresses the prior critical issue flagged in review. However, the path shows ../../graphql-go-tools/v2 in the annotated code, while earlier shell output showed ../../../graphql-go-tools/v2. If this path needs to be uncommented in the future, it must correctly resolve from router-tests/ directory. From the repo root structure, the path should be ../../../graphql-go-tools/v2 (three levels up). Ensure consistency with any local development documentation.

router/core/router.go (1)

82-95: Router‑level headerPropagation initialization and nil‑guard look correct

NewRouter now builds r.headerPropagation once via NewHeaderPropagation, and the subsequent if r.headerPropagation != nil && r.headerPropagation.HasResponseRules() guard before appending OnOriginResponse prevents the earlier nil‑pointer risk when no header rules are configured. Storing the propagation engine on Router and passing it down keeps the design coherent.

Also applies to: 311-321

🧹 Nitpick comments (6)
router-tests/telemetry/telemetry_test.go (1)

4688-4701: Optionally assert the dedup attribute instead of relying only on count

Here you rely on require.Len(t, sn[7].Attributes(), 12) but only assert 11 specific attributes; the 12th is implicitly the new resolver‑dedup flag. For clearer coverage and future‑proofing, consider asserting it explicitly (as in the earlier test):

 require.Len(t, sn[7].Attributes(), 12)
 ...
 require.Contains(t, sn[7].Attributes(), otel.WgRouterConfigVersion.String(xEnv.RouterConfigVersionMain()))
 require.Contains(t, sn[7].Attributes(), otel.WgAcquireResolverWaitTimeMs.Int64(0))
+require.Contains(t, sn[7].Attributes(), otel.WgResolverDeduplicatedRequest.Bool(false))
router/core/context.go (1)

298-347: SubgraphHeadersBuilder handles execution plan types appropriately.

The function correctly handles both SynchronousResponsePlan and SubscriptionResponsePlan, computing per-subgraph headers and aggregating hashes using LittleEndian encoding. The default case returns an empty builder, which is safe but could indicate an unhandled plan type.

Consider logging a warning for unhandled plan types to aid debugging if new plan types are added in the future.

 	}
+	// Unhandled plan type - could log warning here if needed for debugging
 	return &headerBuilder{
 		headers: make(map[string]*HeaderWithHash),
 	}
router-tests/singleflight_test.go (1)

17-159: Broad single‑flight / inbound dedup test matrix looks solid

The added subtests cover the key combinations of single‑flight, inbound deduplication, mutations vs queries, header and variable uniqueness, and subscription flows (including multi‑subgraph NATS‑driven subscriptions). The use of SubgraphRequestCount per subgraph provides a clear observable for dedup behavior, and the expectations (dedup vs no‑dedup) match the documented semantics.

Also applies to: 213-871

router/core/router.go (2)

323-359: CORS default headers now systematically include client and trace headers

Using defaultCorsHeaders (including common headers, GraphQL client headers, Apollo headers, ART headers, W3C trace headers, and feature‑flag header) and then deduping into r.corsOptions.AllowHeaders gives a clear, centralized CORS surface. The CorsDefaultOptions remain minimal and the extra headers are opt‑in via router initialization, which is a good split of concerns.

Also applies to: 1687-1700


1934-1947: Updated transport defaults significantly raise concurrency limits—ensure they match expected deployments

DefaultTransportRequestOptions now increases MaxConnsPerHost to 1024 and scales idle connection limits (MaxIdleConnsPerHost = 64, MaxIdleConns = 640). This should reduce connection churn and improve throughput, but also increases potential file‑descriptor and memory usage under heavy load. If some environments are connection‑constrained, consider documenting these new defaults or steering operators towards explicit traffic_shaping overrides where needed.

Also applies to: 2181-2200

router/core/graphql_prehandler.go (1)

1207-1234: Consider simplifying the deduplication flag logic for readability.

The order-dependent deduplication logic is correct but complex. The hierarchical relationships (general flags override specific flags, PreOriginHandlers disable both, force flags can re-enable) could be clearer. Consider extracting this logic into a helper function with explicit documentation of the precedence rules.

For example:

func (h *PreHandler) computeDeduplicationFlags() (disableSubgraph, disableInbound bool) {
    // Start with config defaults
    disableSubgraph = !h.enableRequestDeduplication
    disableInbound = !h.enableInboundRequestDeduplication
    
    // General flag overrides specific inbound flag
    if !h.enableRequestDeduplication {
        disableInbound = true
    }
    
    // PreOriginHandlers disable both (headers might affect dedup keys)
    if h.hasPreOriginHandlers {
        disableSubgraph = true
        disableInbound = true
    }
    
    // Force flags can override
    if h.forceEnableRequestDeduplication {
        disableSubgraph = false
    }
    if h.forceEnableInboundRequestDeduplication {
        disableInbound = false
    }
    
    return disableSubgraph, disableInbound
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f201b2c and 6984392.

⛔ Files ignored due to path filters (2)
  • router-tests/go.sum is excluded by !**/*.sum
  • router/go.sum is excluded by !**/*.sum
📒 Files selected for processing (19)
  • router-tests/go.mod (2 hunks)
  • router-tests/singleflight_test.go (1 hunks)
  • router-tests/telemetry/telemetry_test.go (6 hunks)
  • router-tests/testenv/testenv.go (4 hunks)
  • router/core/cache_warmup.go (1 hunks)
  • router/core/context.go (5 hunks)
  • router/core/graph_server.go (11 hunks)
  • router/core/graphql_handler.go (6 hunks)
  • router/core/graphql_prehandler.go (9 hunks)
  • router/core/operation_processor.go (5 hunks)
  • router/core/operation_processor_test.go (1 hunks)
  • router/core/router.go (9 hunks)
  • router/core/websocket.go (3 hunks)
  • router/go.mod (2 hunks)
  • router/pkg/config/config.go (2 hunks)
  • router/pkg/config/config.schema.json (57 hunks)
  • router/pkg/config/testdata/config_defaults.json (2 hunks)
  • router/pkg/config/testdata/config_full.json (2 hunks)
  • router/pkg/otel/attributes.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • router/core/cache_warmup.go
  • router/core/operation_processor_test.go
  • router/go.mod
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/otel/attributes.go
  • router/pkg/config/config.schema.json
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/router.go
  • router/pkg/config/testdata/config_full.json
  • router/core/graphql_handler.go
  • router-tests/testenv/testenv.go
  • router/core/graphql_prehandler.go
  • router/pkg/config/config.go
  • router-tests/singleflight_test.go
  • router/core/graph_server.go
  • router/core/context.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/core/router.go
  • router/pkg/config/testdata/config_full.json
  • router-tests/telemetry/telemetry_test.go
  • router/core/graphql_prehandler.go
  • router/pkg/config/config.go
  • router/core/context.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.

Applied to files:

  • router/core/router.go
  • router/pkg/config/testdata/config_full.json
  • router/pkg/config/config.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/pkg/config/testdata/config_full.json
  • router-tests/testenv/testenv.go
  • router-tests/go.mod
  • router/core/websocket.go
  • router-tests/telemetry/telemetry_test.go
  • router/core/graphql_prehandler.go
  • router-tests/singleflight_test.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/graphql_handler.go
  • router/core/websocket.go
  • router/core/graphql_prehandler.go
  • router/core/graph_server.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/testenv/testenv.go
  • router-tests/singleflight_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/operation_processor.go
  • router/core/graphql_prehandler.go
  • router/core/graph_server.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/operation_processor.go
  • router/core/graphql_prehandler.go
  • router/core/graph_server.go
📚 Learning: 2025-09-08T20:57:07.946Z
Learnt from: JivusAyrus
Repo: wundergraph/cosmo PR: 2156
File: controlplane/src/core/repositories/SubgraphRepository.ts:1746-1763
Timestamp: 2025-09-08T20:57:07.946Z
Learning: In the SubgraphRepository class, both byId() and byName() methods are equally valid for fetching subgraph data since they both use the same underlying getSubgraph() helper method. When getLinkedSubgraph() returns targetSubgraphName and targetSubgraphNamespace, using byName() is the appropriate choice rather than byId().

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-08-28T09:59:19.999Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/core/supervisor_instance.go:196-204
Timestamp: 2025-08-28T09:59:19.999Z
Learning: In router/pkg/config/config.go, the BackoffJitterRetry struct uses envDefault tags to provide default values for Algorithm ("backoff_jitter") and Expression ("IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"), ensuring that even when YAML configs omit these fields, valid defaults are applied automatically, preventing ProcessRetryOptions from rejecting empty values.

Applied to files:

  • router/pkg/config/config.go
📚 Learning: 2025-09-29T10:28:07.122Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.

Applied to files:

  • router/pkg/config/config.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-11-16T11:52:34.064Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/pkg/pubsub/redis/engine_datasource.go:37-42
Timestamp: 2025-11-16T11:52:34.064Z
Learning: In router/pkg/pubsub/redis/engine_datasource.go (and similar files for kafka/nats), the MutableEvent.GetData() method intentionally returns the internal Data slice without cloning, while Event.GetData() clones the slice. This is by design: MutableEvent is designed to be mutable (so callers can modify the data), while Event is immutable (cloning prevents modification). This difference is part of the Cosmo Streams v1 hook interface design.

Applied to files:

  • router/core/context.go
🧬 Code graph analysis (9)
router/core/router.go (3)
router/core/header_rule_engine.go (2)
  • HeaderPropagation (137-146)
  • NewHeaderPropagation (157-194)
router/internal/rediscloser/rediscloser.go (2)
  • NewRedisCloser (26-81)
  • RedisCloserOptions (19-24)
router/pkg/config/config.go (2)
  • RateLimitConfiguration (541-551)
  • SubgraphErrorPropagationConfiguration (794-806)
router/core/graphql_handler.go (2)
router/core/header_rule_engine.go (2)
  • HeaderPropagation (137-146)
  • HeaderPropagationWriter (93-105)
router/core/context.go (1)
  • SubgraphHeadersBuilder (298-347)
router-tests/testenv/testenv.go (1)
router-plugin/httpclient/client.go (1)
  • WithTimeout (81-85)
router/core/operation_processor.go (2)
router/internal/expr/expr.go (1)
  • Request (66-75)
shared/src/router-config/builder.ts (1)
  • Input (39-46)
router/core/websocket.go (5)
router/core/operation_metrics.go (1)
  • OperationProtocolWS (22-22)
router/internal/expr/expr.go (2)
  • Context (35-39)
  • Request (66-75)
router-plugin/tracing/tracer.go (1)
  • TracingOptions (21-28)
router/core/errors.go (1)
  • Extensions (48-54)
router/core/context.go (1)
  • SubgraphHeadersBuilder (298-347)
router-tests/telemetry/telemetry_test.go (1)
router/pkg/otel/attributes.go (1)
  • WgResolverDeduplicatedRequest (38-38)
router/core/graphql_prehandler.go (5)
router/core/executor.go (1)
  • Executor (42-52)
router/core/router_metrics.go (1)
  • RouterMetrics (15-22)
router/core/operation_processor.go (1)
  • OperationProcessor (133-145)
router/core/operation_planner.go (1)
  • OperationPlanner (26-31)
router/internal/expr/expr.go (1)
  • Request (66-75)
router-tests/singleflight_test.go (6)
router/pkg/config/config.go (6)
  • Config (1039-1114)
  • EngineExecutionConfiguration (384-423)
  • HeaderRules (261-266)
  • GlobalHeaderRule (268-272)
  • RequestHeaderRule (286-309)
  • HeaderRuleOperationPropagate (277-277)
demo/pkg/subgraphs/subgraphs.go (2)
  • Config (79-84)
  • Subgraphs (86-89)
router/core/router.go (3)
  • Option (174-174)
  • WithEngineExecutionConfig (1809-1813)
  • WithHeaderRules (1779-1783)
router/internal/requestlogger/requestlogger.go (1)
  • Option (37-37)
router/internal/recoveryhandler/recovery.go (1)
  • Option (17-17)
router/core/operation_processor.go (1)
  • GraphQLRequest (195-200)
router/core/context.go (1)
router/core/header_rule_engine.go (1)
  • HeaderPropagation (137-146)
🔇 Additional comments (31)
router-tests/go.mod (2)

20-20: Dependency version updates align with PR memory management improvements.

The updated dependencies appear well-coordinated with the PR objective to improve memory management:

  • astjson promoted to v1.0.0 (stable release)
  • router and demo versions reflect recent development
  • graphql-go-tools/v2 updated to rc.240 with specific build timestamp

These updates are consistent with the PR's focus on deterministic header propagation and variables hashing.

Also applies to: 25-26, 28-28, 30-30


154-154: New indirect dependency addition is appropriate for memory management improvements.

The addition of github.com/wundergraph/go-arena v1.1.0 as an indirect dependency aligns well with the PR's focus on improving memory management. Arena allocators are a suitable tool for memory optimization. This is likely pulled in transitively from the updated router or graphql-go-tools dependencies.

router-tests/telemetry/telemetry_test.go (4)

3965-3979: Operation - Execute span now validates resolver dedup attribute explicitly

Asserting WgResolverDeduplicatedRequest in addition to bumping the attribute count to 12 keeps this test tightly coupled to the new telemetry contract and matches the list of 12 require.Contains checks.


5143-5156: Attribute count update for Execute span matches expected set

The length change to 12 for sn[7] in this scenario is consistent with the added resolver‑dedup attribute; the rest of the assertions still describe 11 attributes, leaving one slot for that new flag, which is acceptable for this test’s purpose.


7123-7127: Feature‑flag Execute span length aligns with extra attributes

Bumping the Execute span to 13 attributes in the feature‑flag path is consistent with the base 12 attributes plus the additional WgFeatureFlag (and updated config‑version) assertions you check here.


9593-9600: Custom‑exporter Execute span length kept in sync with core telemetry

Updating the Execute span attribute count to 12 in this custom‑exporter test path matches the new resolver‑dedup field while preserving the existing 11 explicit assertions, so the expectations remain aligned with the core tracing behavior.

router-tests/testenv/testenv.go (4)

430-436: Good addition: Early Kafka connectivity validation.

Adding a Ping check with a 1-second timeout provides fast-fail behavior during test setup, preventing tests from hanging when Kafka is unavailable.


865-871: LGTM - Consistent Kafka health check.

This mirrors the health check added in CreateTestSupervisorEnv, maintaining consistency across the test environment setup functions.


1329-1329: Verify test implications of disabling inbound request deduplication.

Setting EnableInboundRequestDeduplication: false in test environments may cause tests to behave differently from production. Ensure this is intentional and that any tests specifically covering deduplication behavior account for this default.


1892-1895: Good defensive check for already-closed listeners.

Skipping the log for net.ErrClosed prevents noisy test output when servers are intentionally closed before shutdown (e.g., via CloseOnStart configurations).

router/core/operation_processor.go (5)

63-63: New public field for variables hashing.

Adding VariablesHash to ParsedOperation enables efficient deduplication and caching based on variable content. This aligns with the broader hash-based approach introduced in this PR.


301-306: Simplified unmarshalling removes redundant processing.

Replacing the manual json.Compact flow with direct json.Unmarshal is cleaner. The previous compaction step was likely redundant since variables are parsed separately via fastjson later in unmarshalOperation.


392-396: Hash computation method is clear and concise.

The method correctly writes variables to the key generator, stores the hash, and resets the generator. This pattern is consistent with other hashing operations in this file.


932-938: Pre-normalization hash capture for change detection.

Computing hashes of Variables and RawBytes before normalization enables efficient comparison to detect if normalization modified the operation, avoiding expensive byte-slice comparisons.


976-983: Hash-based change detection after normalization.

Using VariablesHash == variablesBefore && operationRawBytesAfter == operationRawBytesBefore to determine if reprinting is needed is more efficient than comparing byte slices directly.

router/core/context.go (3)

146-149: New public type for header-hash pairing.

HeaderWithHash cleanly encapsulates both the header data and its hash, enabling efficient comparisons and cache key generation downstream.


282-296: Header builder implementation for subgraph header resolution.

The headerBuilder struct and its methods correctly implement the resolve.SubgraphHeadersBuilder interface. Note that HeadersForSubgraph returns a cloned header, which is good for preventing unintended mutations.


601-605: Extended operationContext with variablesHash.

Adding variablesHash uint64 to operationContext enables the hash to be propagated through the request lifecycle, supporting the new deduplication and caching mechanisms.

router/pkg/config/testdata/config_full.json (2)

343-345: Significant connection pool configuration changes.

The changes to connection pooling are substantial:

  • MaxConnsPerHost: 0 removes the per-host connection limit (unlimited)
  • MaxIdleConns: 10240 (10x increase from 1024)
  • MaxIdleConnsPerHost: 2048 (100x increase from 20)

These are test configuration values, but verify these reflect intended production defaults or are test-specific overrides.


755-757: New engine execution configuration fields for deduplication control.

The new fields ForceEnableSingleFlight, EnableInboundRequestDeduplication, and ForceEnableInboundRequestDeduplication provide fine-grained control over request deduplication behavior. The test config shows:

  • EnableInboundRequestDeduplication: true (enabled by default)
  • Both Force* flags are false

This aligns with the PR's objective to improve memory management through better request deduplication control.

router/core/graphql_handler.go (3)

83-83: Header propagation wiring is complete.

The HeaderPropagation field is properly added to HandlerOptions, wired in NewGraphQLHandler, and stored in GraphQLHandler. This follows the established pattern for handler configuration.

Also applies to: 103-103, 129-129


146-165: Resolve context enriched with hashing and header building.

The resolve context now includes:

  • VariablesHash from the operation context for deduplication
  • Request.ID set to internalHash for request identification
  • SubgraphHeadersBuilder initialized with header propagation config

This provides all necessary data for the engine to perform header-aware deduplication.


192-202: Switched to ArenaResolveGraphQLResponse with metrics collection.

The change from ResolveGraphQLResponse to ArenaResolveGraphQLResponse enables:

  1. Header propagation via HeaderPropagationWriter
  2. Timing metrics (ResolveAcquireWaitTime)
  3. Deduplication tracking (ResolveDeduplicated)

Both metrics are correctly added as span attributes for observability.

router/core/graph_server.go (2)

79-106: Header propagation wiring from Router → graphServer → HandlerOptions looks consistent

graphServer.headerPropagation is sourced from Router.headerPropagation in newGraphServer and passed into HandlerOptions.HeaderPropagation, which is the expected single source of truth for the new header propagation pipeline. Struct ownership and lifetimes are clear and nil is handled gracefully at the server construction point.

Also applies to: 150-169, 1389-1400


1276-1294: Operation processor, cache warmup, and pre‑handler config are correctly aligned with flattened config

Using s.introspection, s.securityConfiguration.ParserLimits, s.cacheWarmup, s.cdnConfig, and the extended PreHandlerOptions (including both single‑flight and inbound request dedup flags) keeps all execution behavior driven from the central router config without additional indirection. The wiring here is coherent and matches the new execution/dedup semantics.

Also applies to: 1310-1335, 1449-1484

router/core/websocket.go (1)

339-351: WS path now correctly carries execution/trace options and variables hash

Using parseExecutionAndTraceOptions in handleUpgradeRequest, propagating those via PlanOptions, and then setting opContext.variablesHash and the extended resolve.Context (Variables, RemapVariables, VariablesHash, ExecutionOptions, TracingOptions, Extensions) brings websocket subscriptions in line with the HTTP pipeline for deduplication and telemetry. This is coherent and matches the new execution semantics.

Also applies to: 817-929, 1023-1035

router/core/router.go (2)

887-897: Rate‑limit configuration and logging are wired through the new r.rateLimit field safely

Redis is only constructed when r.rateLimit != nil && r.rateLimit.Enabled, and later logging in Start checks r.redisClient != nil before dereferencing r.rateLimit.SimpleStrategy.*. Given redisClient is only set in the same gated block, this keeps the new top‑level rateLimit usage nil‑safe while simplifying configuration access.

Also applies to: 1351-1357, 1880-1883


1424-1426: Using r.Usage() for router config telemetry keeps usage tracking encapsulated

Switching trackRouterConfigUsage to r.usage.TrackRouterConfigUsage(r.Usage()) lets the Router control how its usage metadata is assembled, instead of exposing embedded Config internals. That’s a small but clean encapsulation win.

router/pkg/config/config.go (1)

384-396: LGTM! Clear deduplication control flags.

The new fields provide fine-grained control over request deduplication behavior with clear documentation explaining their purpose and interaction with PreOriginHandlers. This aligns well with the PR's objectives.

router/core/graphql_prehandler.go (2)

56-116: LGTM! Proper struct expansion for deduplication controls.

The PreHandlerOptions and PreHandler structs have been expanded with new deduplication control fields that align with the config changes. All fields are properly initialized in the NewPreHandler constructor.


567-574: LGTM! Correct dual variable parsing approach.

Variables are correctly parsed twice: once before normalization (lines 567-568) and once after normalization (lines 919-920) when they may have been modified. This ensures the operation context always has the current variables. The simplified handleOperation signature improves API clarity.

Also applies to: 919-928

# Conflicts:
#	router-tests/go.mod
#	router-tests/go.sum
#	router/go.mod
#	router/go.sum
#	router/pkg/graphqlschemausage/schemausage_test.go
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router-tests/go.mod (1)

213-213: Fix the commented replace directive path to match router/go.mod.

The path has been changed from ../../../graphql-go-tools/v2 to ../../graphql-go-tools/v2, which is inconsistent with the router/go.mod comment. If this line is ever uncommented, it will point to an incorrect location. Either use ../../../graphql-go-tools/v2 to match the router module, or verify the directory structure to ensure the path is correct.

Apply this diff to restore the correct path:

-// github.com/wundergraph/graphql-go-tools/v2 => ../../graphql-go-tools/v2
+// github.com/wundergraph/graphql-go-tools/v2 => ../../../graphql-go-tools/v2
🧹 Nitpick comments (6)
router/pkg/graphqlschemausage/schemausage_test.go (1)

205-218: Arena-based MergeValues usage looks correct; consider reuse if you ever chase test allocs

The arena‑first call merged, _, err := astjson.MergeValues(arena.NewMonotonicArena(), vars, inputVariables) matches the new API and preserves the previous merge order. If you ever need to micro‑optimize test allocations, you could reuse a single arena per test instead of allocating one inline, but that’s purely optional here.

router-tests/telemetry/telemetry_test.go (4)

4688-4700: Consider asserting the new dedup telemetry attribute explicitly, not only via count

Here the attribute count was updated from 11→12, but only 11 specific attributes are asserted; the extra slot is implicitly assumed to be WgResolverDeduplicatedRequest. That will pass even if some other attribute occupies the new slot.

If you want this test to also pin dedup behavior, consider adding:

-			require.Len(t, sn[7].Attributes(), 12)
+			require.Len(t, sn[7].Attributes(), 12)
+			require.Contains(t, sn[7].Attributes(), otel.WgResolverDeduplicatedRequest.Bool(false))

(and keep the existing Contains checks as-is).


5143-5155: Mirror the explicit dedup-attribute assertion here as well

Same pattern as the previous test: attribute count is increased to 12 but the new attribute isn’t asserted directly. For symmetry with the main telemetry test and to guard against accidental extra attributes, you could also assert the concrete key/value:

-			require.Len(t, sn[7].Attributes(), 12)
+			require.Len(t, sn[7].Attributes(), 12)
+			require.Contains(t, sn[7].Attributes(), otel.WgResolverDeduplicatedRequest.Bool(false))

7124-7126: Attribute-count bump looks correct; optional: assert dedup flag explicitly

Increasing the execute-span attribute count to 13 here is consistent with adding one more attribute on top of the existing feature-flag fields. If you intend this test to validate dedup telemetry as well (not just feature flags), you could also add a Contains check for WgResolverDeduplicatedRequest.Bool(false) alongside the existing router-config/feature-flag assertions.


9577-9577: Keep this test aligned with the others by checking the actual dedup attribute

This length change (11→12) matches the new attribute set on execute spans, but like the other cases it doesn’t assert which attribute was added. To keep “custom metric attributes don’t affect tracing” focused while still pinning behavior, consider also asserting WgResolverDeduplicatedRequest.Bool(false) here so future changes can’t silently swap out the extra attribute.

router/core/graphql_prehandler.go (1)

566-574: Avoid parsing variables twice and reassigning variablesHash in handleOperation

handleOperation currently does:

  • Early on (lines 566-574): set requestContext.operation.variablesHash and parse Request.Variables into requestContext.operation.variables.
  • Later, after normalization (lines 919-928): set variablesHash again and re‑parse the (now normalized) Request.Variables into the same field.

Between these two points, requestContext.operation.variables and variablesHash are not read, so the first parse and hash assignment are effectively discarded. This adds an extra astjson.ParseBytes call and allocations per request without a functional benefit.

Given that by the time you export or attach variables (e.g. for tracing) you seem to intend to use the normalized representation, you can safely drop the earlier parse and keep only the post‑normalization block.

Suggested diff:

-    requestContext.operation.extensions = operationKit.parsedOperation.Request.Extensions
-    requestContext.operation.variablesHash = operationKit.parsedOperation.VariablesHash
-    requestContext.operation.variables, err = astjson.ParseBytes(operationKit.parsedOperation.Request.Variables)
-    if err != nil {
-        return &httpGraphqlError{
-            message:    fmt.Sprintf("error parsing variables: %s", err),
-            statusCode: http.StatusBadRequest,
-        }
-    }
+    requestContext.operation.extensions = operationKit.parsedOperation.Request.Extensions

and keep the later block (lines 919-928) as the single place where variablesHash and variables are populated from the normalized variables.

Also applies to: 917-928

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6984392 and a2ac0e2.

📒 Files selected for processing (5)
  • router-tests/go.mod (2 hunks)
  • router-tests/telemetry/telemetry_test.go (6 hunks)
  • router/core/graphql_prehandler.go (9 hunks)
  • router/pkg/graphqlschemausage/schemausage_bench_test.go (1 hunks)
  • router/pkg/graphqlschemausage/schemausage_test.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/graphqlschemausage/schemausage_bench_test.go
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
  • router-tests/go.mod
  • router/core/graphql_prehandler.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
  • router/core/graphql_prehandler.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-08T20:57:07.946Z
Learnt from: JivusAyrus
Repo: wundergraph/cosmo PR: 2156
File: controlplane/src/core/repositories/SubgraphRepository.ts:1746-1763
Timestamp: 2025-09-08T20:57:07.946Z
Learning: In the SubgraphRepository class, both byId() and byName() methods are equally valid for fetching subgraph data since they both use the same underlying getSubgraph() helper method. When getLinkedSubgraph() returns targetSubgraphName and targetSubgraphNamespace, using byName() is the appropriate choice rather than byId().

Applied to files:

  • router/core/graphql_prehandler.go
🧬 Code graph analysis (1)
router/core/graphql_prehandler.go (3)
router/pkg/config/config.go (2)
  • ClientHeader (976-979)
  • ApolloCompatibilityFlags (955-965)
router/internal/expr/expr_manager.go (1)
  • Manager (15-17)
router/core/context.go (1)
  • ClientInfo (44-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
router-tests/go.mod (2)

20-20: Verify dependency version updates are compatible.

Multiple key dependencies have been updated:

  • github.com/redis/go-redis/v9: v9.4.0 → v9.7.3
  • github.com/wundergraph/astjson: v0.0.0-... → v1.0.0 (major version bump)
  • github.com/wundergraph/cosmo/router: updated to v0.0.0-20251126113850-ce5751591be9
  • github.com/wundergraph/graphql-go-tools/v2: rc.240 → rc.240.0.20251208175714-70eb5187e12c

Ensure backward compatibility, especially for the astjson major version bump.

Also applies to: 25-25, 26-26, 28-28, 30-30


154-154: New indirect dependency supports memory management improvements.

The addition of github.com/wundergraph/go-arena v1.1.0 (indirect) aligns with the PR objective to improve memory management. This arena allocation pattern is commonly used for efficient memory reuse.

router/pkg/graphqlschemausage/schemausage_test.go (2)

3-26: Imports updated to support new APIs

Adding net/http and github.com/wundergraph/go-arena cleanly reflects the new header-aware loader and arena‑based MergeValues API; no issues here.


4017-4027: FakeDataSource now matches header-aware loader interface

Updating Load and LoadWithFiles to accept http.Header and return (data []byte, err error) aligns this test stub with the new data source contract. Returning nil, nil is fine given these tests only exercise planning/usage logic and never inspect the payload.

router-tests/telemetry/telemetry_test.go (1)

3965-3978: Good coverage of new resolver-dedup attribute on Operation - Execute

Bumping the attribute count to 12 and explicitly asserting otel.WgResolverDeduplicatedRequest.Bool(false) on the execute span makes the new telemetry behavior observable and stable in this base test. This looks correct and aligned with the rest of the span-attribute checks.

router/core/graphql_prehandler.go (3)

56-78: New PreHandlerOptions / PreHandler fields and wiring look consistent

The added telemetry, compatibility, expression, and (inbound/subgraph) deduplication fields in PreHandlerOptions / PreHandler are mapped 1:1 in NewPreHandler, and the types line up with the referenced config and expr manager types. I don't see miswired or missing assignments here.

Also applies to: 81-116, 157-177


261-276: Execution/trace option parsing and dedup flag precedence are coherent

Handler’s switch to parseExecutionAndTraceOptions and the layering inside that helper (internalParseRequestOptions → global flags → pre‑origin handler safety → force‑enable overrides) results in a clear precedence chain:

  • Base options from request (headers/query/JWT) via internalParseRequestOptions.
  • Global knobs (alwaysIncludeQueryPlan, alwaysSkipLoader, queryPlansEnabled).
  • Default dedup disabling via EnableRequestDeduplication / EnableInboundRequestDeduplication.
  • Safety override when HasPreOriginHandlers is set (disables both dedup paths).
  • Final “escape hatch” via ForceEnableRequestDeduplication and ForceEnableInboundRequestDeduplication.

This ordering matches the comments and should be predictable to configure.

Also applies to: 1193-1235


484-507: handleOperation refactor (signature + early unmarshalling) looks correct

The updated handleOperation(req *http.Request, httpOperation *httpOperation) signature and the GET/POST unmarshalling paths still enforce:

  • URL vs body based GraphQL request decoding.
  • File upload association on POST via httpOperation.files into requestContext.operation.files.
  • Proper kit lifecycle management via the defer operationKit.Free() block.

I don’t see new correctness or lifecycle issues introduced by this refactor.

@jensneuse jensneuse requested a review from StarpTech December 17, 2025 11:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
router-tests/singleflight_test.go (1)

161-212: Previous loop variable capture issue has been resolved.

The goroutine now correctly captures i as a parameter: go func(i int64) { ... }(i), ensuring each request uses a distinct Authorization header.

router/core/websocket.go (1)

1041-1050: Nil guard for headerPropagation now properly in place.

The code now correctly checks h.graphqlHandler.headerPropagation != nil before calling SubgraphHeadersBuilder, preventing panics when header rules are not configured for WebSocket subscriptions.

🧹 Nitpick comments (3)
router-tests/testenv/testenv.go (2)

430-436: LGTM! Kafka connection health check improves test reliability.

The added health check validates Kafka connectivity before proceeding with test setup, providing clear early failure feedback. The 1-second timeout and error handling are appropriate.

Optional: Extract duplicate health check logic.

The same Kafka ping logic appears in both CreateTestSupervisorEnv (lines 430-436) and CreateTestEnv (lines 860-866). Consider extracting this into a helper function to reduce duplication:

+func pingKafkaClient(client *kgo.Client) error {
+	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
+	defer cancel()
+	if err := client.Ping(ctx); err != nil {
+		return fmt.Errorf("could not connect to kafka: %w", err)
+	}
+	return nil
+}

Then call pingKafkaClient(client) in both functions.

Also applies to: 860-866


1319-1319: Inbound request deduplication disabled for tests.

The change sets EnableInboundRequestDeduplication: false as the default for test environments, aligning with the PR's dedup refactoring objectives.

Consider adding configurability for test flexibility.

The hardcoded false value means all tests will have deduplication disabled. If future tests need to verify dedup behavior, consider making this configurable via the Config struct:

+	ModifyEngineExecutionConfiguration func(engineExecutionConfiguration *config.EngineExecutionConfiguration)

Then tests can override when needed:

cfg.ModifyEngineExecutionConfiguration = func(eec *config.EngineExecutionConfiguration) {
	eec.EnableInboundRequestDeduplication = true
}

Note: The ModifyEngineExecutionConfiguration callback already exists (line 1342-1344), so tests can already override this if needed.

router-tests/telemetry/telemetry_test.go (1)

4688-4701: Consider explicitly asserting the new dedup flag where only the attribute count changed

In these blocks you’ve bumped require.Len(t, sn[7].Attributes(), …) to account for the extra attribute, but you don’t assert otel.WgResolverDeduplicatedRequest (or whatever the new field is for those paths). That’s fine functionally, but relying only on Len makes failures a bit opaque and slightly more brittle to future attribute additions.

If these Operation - Execute spans are also expected to carry the resolver‑dedup flag, you could optionally assert it as well, e.g.:

-           require.Len(t, sn[7].Attributes(), 12)
+           require.Len(t, sn[7].Attributes(), 12)
+           require.Contains(t, sn[7].Attributes(), otel.WgResolverDeduplicatedRequest.Bool(false))

(and similarly for the other updated Operation - Execute spans with counts 12/13). This keeps the tests aligned with the primary span in the first subtest and makes regressions in the new attribute easier to diagnose.

Also applies to: 5143-5155, 7124-7126, 9573-9577

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2ac0e2 and 9eaf180.

⛔ Files ignored due to path filters (6)
  • demo/go.sum is excluded by !**/*.sum
  • demo/pkg/subgraphs/projects/generated/service.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • demo/pkg/subgraphs/projects/generated/service.proto is excluded by !**/generated/**
  • demo/pkg/subgraphs/projects/generated/service_grpc.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • router-tests/go.sum is excluded by !**/*.sum
  • router/go.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • demo/go.mod (2 hunks)
  • router-tests/go.mod (2 hunks)
  • router-tests/singleflight_test.go (1 hunks)
  • router-tests/telemetry/telemetry_test.go (6 hunks)
  • router-tests/testenv/testenv.go (4 hunks)
  • router/core/graphql_handler.go (5 hunks)
  • router/core/websocket.go (3 hunks)
  • router/go.mod (2 hunks)
  • router/pkg/config/config.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • router/go.mod
  • demo/go.mod
  • router/pkg/config/config.go
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/graphql_handler.go
  • router/core/websocket.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/graphql_handler.go
  • router-tests/testenv/testenv.go
  • router-tests/singleflight_test.go
  • router/core/websocket.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
  • router-tests/singleflight_test.go
  • router-tests/telemetry/telemetry_test.go
  • router-tests/go.mod
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/testenv/testenv.go
  • router-tests/singleflight_test.go
📚 Learning: 2025-08-28T09:17:49.477Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.

Applied to files:

  • router-tests/singleflight_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
  • router/core/websocket.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.

Applied to files:

  • router/core/websocket.go
📚 Learning: 2025-09-01T13:44:13.045Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.045Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false as the first check, so no external nil checks are needed when calling c.retryOptions.IsEnabled(). This is a consistent pattern used across multiple IsEnabled() methods in the codebase.

Applied to files:

  • router/core/websocket.go
📚 Learning: 2025-09-01T13:44:13.045Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.045Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false, so no external nil checks are needed when calling c.retryOptions.IsEnabled().

Applied to files:

  • router/core/websocket.go
🧬 Code graph analysis (3)
router/core/graphql_handler.go (3)
router/core/header_rule_engine.go (2)
  • HeaderPropagation (137-146)
  • HeaderPropagationWriter (93-105)
router/core/context.go (1)
  • SubgraphHeadersBuilder (298-347)
router/pkg/otel/attributes.go (2)
  • WgAcquireResolverWaitTimeMs (37-37)
  • WgResolverDeduplicatedRequest (38-38)
router-tests/telemetry/telemetry_test.go (1)
router/pkg/otel/attributes.go (1)
  • WgResolverDeduplicatedRequest (38-38)
router/core/websocket.go (2)
router/core/operation_metrics.go (1)
  • OperationProtocolWS (22-22)
router/core/context.go (1)
  • SubgraphHeadersBuilder (298-347)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (15)
router-tests/testenv/testenv.go (1)

1866-1869: LGTM! Hardened shutdown logic handles already-closed listeners gracefully.

The check for net.ErrClosed prevents unnecessary error logging when a listener has already been closed (e.g., through manual intervention during tests). This improves test cleanup robustness.

router-tests/go.mod (4)

19-19: Dependency update approved: redis/go-redis v9.7.3.

Minor version bump from v9.4.0 appears backward compatible and should not introduce issues.


25-27: Internal module versions updated to match memory management improvements.

Updated versions for cosmo/demo, cosmo/router, and graphql-go-tools/v2 reflect the changes in the broader PR. The graphql-go-tools pseudo-version includes a specific commit hash (258377327dcd), which ensures reproducible builds.

Confirm that the graphql-go-tools pseudo-version v2.0.0-rc.241.0.20251217104010-258377327dcd is the intended commit for this PR and that the timestamp aligns with the PR branch.

Also applies to: 29-29


154-154: New indirect dependency: go-arena v1.1.0 aligns with memory management objectives.

The addition of github.com/wundergraph/go-arena v1.1.0 as an indirect dependency is consistent with the PR's memory management improvements. This arena-based allocation library likely supports deterministic memory handling.

Verify that go-arena is actively used by one of the updated dependencies (likely graphql-go-tools or astjson) and not a transitive artifact. Confirm it's declared as indirect rather than direct if it's not explicitly imported in router-tests code.


24-24: Verify astjson v1.0.0 compatibility.

Review the astjson v1.0.0 changelog to confirm there are no breaking API changes compared to the previous version. Check the wundergraph/astjson GitHub releases for any behavioral changes if the prior version was a pre-release.

router-tests/telemetry/telemetry_test.go (1)

3965-3978: New resolver‑dedup attribute assertion on Operation‑Execute looks correct

The updated attribute count (12) matches the explicit assertions, and adding otel.WgResolverDeduplicatedRequest.Bool(false) here gives good coverage of the new telemetry field without loosening any existing guarantees.

router-tests/singleflight_test.go (3)

265-348: Comprehensive subscription deduplication test with proper synchronization.

The test correctly:

  • Waits for all subscriptions to be established before triggering via NATS
  • Uses proper WebSocket lifecycle (subscribe → read message → complete → read complete)
  • Validates deduplication by asserting actualSubgraphRequests < numOfOperations

788-815: Robust variable-based deduplication test with proper JSON handling.

The test correctly handles the case where an employee might not exist (index 0 would query for an employee that doesn't exist) by checking emp.Type() == astjson.TypeObject before accessing fields. This ensures the test validates deduplication behavior rather than data existence.


715-765: LGTM! Validates layered deduplication across multiple subgraphs.

This test effectively verifies that both EnableSingleFlight and EnableInboundRequestDeduplication work together, checking that all subgraph request counts (global, family, availability, mood) are deduplicated.

router/core/graphql_handler.go (3)

161-166: Proper nil guard for header propagation.

The conditional check if h.headerPropagation != nil correctly prevents panics when header rules are not configured, matching the pattern established in the websocket handler.


146-160: Context enrichment supports improved deduplication.

Adding VariablesHash and using internalHash for Request.ID enables the resolver to compute deduplication keys that account for both operation content and variables. This aligns with the header-based and variable-based deduplication test scenarios.


192-202: Arena-based resolution with header propagation and telemetry is correctly implemented.

The integration of ArenaResolveGraphQLResponse with HeaderPropagationWriter is correct. HeaderPropagationWriter returns an io.Writer matching the expected parameter type, and the timing and deduplication metrics from the returned info struct are properly recorded on the span using the correct attribute types.

router/core/websocket.go (3)

1019-1035: WebSocket context setup aligned with HTTP handler.

The context enrichment now mirrors the HTTP handler pattern:

  • VariablesHash set for variable-based deduplication
  • Request.ID uses operationCtx.internalHash for consistent deduplication keys
  • ExecutionOptions and TraceOptions properly propagated

This ensures deduplication behavior is consistent between HTTP and WebSocket operations.


926-932: Variables hash captured for deduplication.

The variablesHash is correctly captured from operationKit.parsedOperation.VariablesHash, enabling consistent variable-based deduplication for WebSocket operations.


339-351: Execution and trace options parsing refactored.

The change from parseRequestOptions to parseExecutionAndTraceOptions aligns with the broader refactoring of execution/trace option handling in the prehandler. The function correctly returns resolve.ExecutionOptions and resolve.TraceOptions, which are properly unpacked and used to populate PlanOptions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 086ad19 and b6821d2.

📒 Files selected for processing (2)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/config/testdata/config_defaults.json
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/pkg/config/testdata/config_full.json
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/pkg/config/testdata/config_full.json
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.

Applied to files:

  • router/pkg/config/testdata/config_full.json
📚 Learning: 2025-07-03T10:33:25.778Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2009
File: router/pkg/config/config.go:0-0
Timestamp: 2025-07-03T10:33:25.778Z
Learning: The CardinalityLimit field in the Metrics struct (router/pkg/config/config.go) is validated at the JSON schema level in config.schema.json with a minimum value constraint of 1, preventing zero or negative values without requiring runtime validation.

Applied to files:

  • router/pkg/config/testdata/config_full.json
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/pkg/config/testdata/config_full.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/factoryresolver.go (1)

92-96: Replace panic with error return for missing subgraph transport.

Using panic when a subgraph is not found in subgraphTransports is too severe and makes error recovery impossible. This appears to be a configuration mismatch that should be surfaced as a proper error.

Consider refactoring NewDefaultFactoryResolver to return an error:

 func NewDefaultFactoryResolver(
 	ctx context.Context,
 	transportOptions *TransportOptions,
 	subscriptionClientOptions *SubscriptionClientOptions,
 	baseTransport http.RoundTripper,
 	subgraphTransports map[string]http.RoundTripper,
 	connector *grpcconnector.Connector,
 	log *zap.Logger,
 	enableNetPoll bool,
 	instanceData InstanceData,
-) *DefaultFactoryResolver {
+) (*DefaultFactoryResolver, error) {
 	transportFactory := NewTransport(transportOptions)
 
 	subgraphHTTPClients := map[string]*http.Client{}
 
 	for subgraph, subgraphOpts := range transportOptions.SubgraphTransportOptions.SubgraphMap {
 		subgraphTransport, ok := subgraphTransports[subgraph]
 		if !ok {
-			panic(fmt.Sprintf("subgraph %s not found in subgraphTransports", subgraph))
+			return nil, fmt.Errorf("subgraph %s not found in subgraphTransports", subgraph)
 		}
🧹 Nitpick comments (3)
router/core/plan_generator.go (1)

340-341: Remove unused NetPollConfiguration.

The netPollConfig variable is created and configured but never used after the refactoring that removed subscription client initialization from this path. This is dead code.

Apply this diff to remove the unused code:

-	var netPollConfig graphql_datasource.NetPollConfiguration
-	netPollConfig.ApplyDefaults()
-
 	ctx, cancel := context.WithCancel(context.Background())
router/core/factoryresolver.go (1)

167-181: Consider caching subscription clients per subgraph to avoid recreation.

ResolveGraphqlFactory creates new defaultHTTPClient, streamingClient, and subscriptionClient instances on every invocation. While the subgraph-specific HTTP clients are cached in d.subgraphHTTPClients (line 183), the subscription clients are recreated each time. If ResolveGraphqlFactory is called multiple times for the same subgraph (e.g., multiple datasources using the same subgraph), this could create duplicate subscription clients, potentially wasting resources.

Consider caching subscription clients similar to how subgraphHTTPClients are cached, or verify that ResolveGraphqlFactory is only called once per subgraph during initialization.

Run this script to check how often ResolveGraphqlFactory is called:

#!/bin/bash
# Description: Analyze ResolveGraphqlFactory call patterns

# Find all calls to ResolveGraphqlFactory
rg -B5 -A5 'ResolveGraphqlFactory\(' router/core/

# Check if it's called in a loop
ast-grep --pattern $'for $_ := range $_ {
  $$$
  ResolveGraphqlFactory($$$)
  $$$
}'
router/core/header_rule_engine.go (1)

359-388: Consider pooling the keys slice for hot-path performance.

The hash function is correct and produces deterministic results. However, it allocates a new slice on every call (line 366). If profiling shows this is a hot path, consider using sync.Pool to reuse allocations.

Example optimization (only if profiling warrants it):

var keysPool = sync.Pool{
	New: func() interface{} { return make([]string, 0, 16) },
}

func hashHeaderStable(hdr http.Header) uint64 {
	if len(hdr) == 0 {
		return 0
	}

	keys := keysPool.Get().([]string)
	keys = keys[:0]
	defer func() {
		keysPool.Put(keys)
	}()
	
	// ... rest of function
}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6821d2 and 91fd20b.

📒 Files selected for processing (3)
  • router/core/factoryresolver.go (5 hunks)
  • router/core/header_rule_engine.go (12 hunks)
  • router/core/plan_generator.go (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/core/plan_generator.go
  • router/core/header_rule_engine.go
  • router/core/factoryresolver.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/header_rule_engine.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/header_rule_engine.go
  • router/core/factoryresolver.go
🧬 Code graph analysis (1)
router/core/factoryresolver.go (2)
router-plugin/httpclient/client.go (1)
  • Client (23-31)
router/core/router.go (1)
  • SubgraphTransportOptions (117-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
router/core/factoryresolver.go (2)

54-57: LGTM: Interface simplified by removing single-flight parameter.

The removal of enableSingleFlight from the RoundTripper signature aligns with the broader refactoring to remove the single-flight mechanism from transport initialization.


59-75: LGTM: Struct refactored to support per-subgraph transport configuration.

The new fields (baseTransport, transportFactory, defaultSubgraphRequestTimeout, subscriptionClientOptions) enable proper per-subgraph HTTP client and subscription configuration, replacing the previous single-flight approach.

router/core/header_rule_engine.go (8)

11-20: LGTM: New imports support deterministic header hashing.

The new imports are correctly used throughout the file for the header hashing and propagation improvements.


124-126: LGTM: Proper cache control for error responses.

Setting no-store, no-cache when subgraph errors are present correctly prevents caching of error responses.


140-142: LGTM: Smart precomputation optimization.

The precomputed hasAllRequestRules and subgraphHasRequestRules fields enable O(1) rule-applicability checks in hasRequestRulesForSubgraph, avoiding repeated rule iteration. The logic correctly distinguishes between global rules (affecting all subgraphs) and subgraph-specific rules.

Also applies to: 170-180


307-340: LGTM: Deterministic header building for deduplication.

The method correctly builds outbound headers by applying rules to a fresh map, ensuring deterministic results independent of Go's map iteration order. The fast-path check (line 317) and stable hash computation (line 338) enable efficient deduplication of requests with identical header configurations.


342-357: LGTM: Consistent with precomputation logic.

The method correctly uses the precomputed flags to determine rule applicability, matching the initialization logic in lines 170-180.


627-627: LGTM: Cleaner operation type checking.

Replacing inline cache-control logic with resolve.GetOperationTypeFromContext centralizes the check and improves maintainability. The logic correctly prevents caching of mutations.


489-606: No action needed. The refactored method correctly modifies a passed http.Header instead of the request's header directly, enabling deterministic header building for deduplication.

At line 492, nil is passed as the third parameter (err any) to getCustomDynamicAttributeValue, not the second parameter. The err parameter is designed to be optional and can safely be nil—it's only used for error-related context field checks that explicitly compare err != nil. The reqContext parameter (second parameter, passed as ctx) is not nil.


93-102: All callers have been updated to the new HeaderPropagationWriter signature.

Verification confirms the single call site at router/core/graphql_handler.go:192 correctly invokes the function with the new parameters: HeaderPropagationWriter(w, resolveCtx, true).

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router/core/factoryresolver.go (1)

176-186: Subscription transport configuration still bypasses subgraph-specific settings.

This is the same issue flagged in the previous review. While lines 184-185 add an explanation claiming this is intentional, the justification appears insufficient. The comment states that "custom subgraph clients are intended to be used for custom timeouts, which is not relevant for subscriptions."

However, subgraph-specific transports can include:

  • Custom authentication headers
  • Custom routing or proxy configurations
  • Custom TLS settings
  • Custom retry logic
  • Custom middleware/interceptors

All of these could be relevant for subscriptions, not just request timeouts. Using the base transport for subscriptions means these subgraph-specific configurations are silently ignored, which could lead to authentication failures, routing issues, or other transport-layer problems for subscription operations.

🧹 Nitpick comments (1)
router/core/factoryresolver.go (1)

163-181: Consider caching default HTTP clients for efficiency.

The defaultHTTPClient, streamingClient, and subscriptionClient are created fresh on every call to ResolveGraphqlFactory for subgraphs without specific configurations. If this method is called multiple times (e.g., during query planning), consider caching these clients to avoid unnecessary allocations.

💡 Suggested approach

Cache the default clients in DefaultFactoryResolver during initialization, similar to how subgraph-specific clients are handled:

 type DefaultFactoryResolver struct {
 	static *staticdatasource.Factory[staticdatasource.Configuration]
 	log    *zap.Logger
 
 	engineCtx context.Context
 
 	subgraphHTTPClients map[string]*http.Client
+	defaultHTTPClient *http.Client
+	defaultStreamingClient *http.Client
+	defaultSubscriptionClient *graphql_datasource.SubscriptionClient
 	connector           *grpcconnector.Connector
 
 	factoryLogger abstractlogger.Logger
 	instanceData  InstanceData
 
 	baseTransport                 http.RoundTripper
 	transportFactory              ApiTransportFactory
 	defaultSubgraphRequestTimeout time.Duration
 	subscriptionClientOptions     []graphql_datasource.Options
 }

Then initialize them in NewDefaultFactoryResolver and reuse them in ResolveGraphqlFactory.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91fd20b and 6702744.

📒 Files selected for processing (4)
  • router-tests/modules/context_error_field_test.go (1 hunks)
  • router/core/context.go (5 hunks)
  • router/core/factoryresolver.go (5 hunks)
  • router/core/header_rule_engine.go (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/core/context.go
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/factoryresolver.go
  • router/core/header_rule_engine.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/core/factoryresolver.go
  • router/core/header_rule_engine.go
📚 Learning: 2025-08-28T09:17:49.477Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.

Applied to files:

  • router/core/factoryresolver.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router/core/factoryresolver.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/header_rule_engine.go
🧬 Code graph analysis (3)
router-tests/modules/context_error_field_test.go (1)
router/pkg/config/config.go (2)
  • SubgraphErrorPropagationConfiguration (794-806)
  • SubgraphErrorPropagationModePassthrough (791-791)
router/core/factoryresolver.go (1)
router/core/router.go (1)
  • SubgraphTransportOptions (117-120)
router/core/header_rule_engine.go (3)
router/core/modules.go (1)
  • EnginePostOriginHandler (132-136)
router/pkg/config/config.go (2)
  • RequestHeaderRule (286-309)
  • HeaderRuleOperationSet (278-278)
router/core/context.go (1)
  • OperationTypeMutation (589-589)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_test
🔇 Additional comments (13)
router/core/factoryresolver.go (2)

10-10: LGTM: API simplification aligns with transport refactoring.

The removal of enableSingleFlight from the RoundTripper signature and the addition of the time import are consistent with the PR's objectives to simplify transport management.

Also applies to: 54-57


71-74: LGTM: Per-subgraph transport handling correctly implemented.

The new fields in DefaultFactoryResolver and the per-subgraph HTTP client creation logic properly support subgraph-specific transport configurations. The initialization correctly wires up the transport factory and subscription options.

Also applies to: 92-105, 138-150

router-tests/modules/context_error_field_test.go (1)

100-124: LGTM! Test validates Cache-Control behavior on subgraph errors.

The test correctly validates that when subgraph errors occur in passthrough mode, the Cache-Control header is set to "no-store, no-cache, must-revalidate" as implemented in router/core/header_rule_engine.go (lines 128-132). The test setup, assertions, and expected error payload structure are all appropriate.

router/core/header_rule_engine.go (10)

11-19: LGTM! Imports support new hashing and header propagation features.

The new imports are appropriate:

  • sort and xxhash enable deterministic header hashing in hashHeaderStable
  • strconv supports Content-Length conversion
  • ast provides operation type checking (mutation detection)
  • graphql_datasource supplies types for header propagation patterns

34-34: LGTM! Interface change aligns with response header handling.

The change from EnginePreOriginHandler to EnginePostOriginHandler correctly reflects that HeaderPropagation.OnOriginResponse (line 394) operates on responses after origin requests, making this the appropriate interface.


128-132: LGTM! Subgraph error handling correctly sets Cache-Control.

The logic properly:

  • Detects subgraph errors via resolveCtx.SubgraphErrors()
  • Sets Cache-Control to "no-store, no-cache, must-revalidate" to prevent caching of error responses
  • Uses a flag to avoid redundant header sets
  • Tracks errors for observability via trackFinalResponseError

This integrates correctly with the test in router-tests/modules/context_error_field_test.go (lines 100-124).


144-146: LGTM! Precomputation optimizes runtime rule checks.

The precomputed fields efficiently determine rule applicability:

  • hasAllRequestRules: cached check for global rules that apply to all subgraphs
  • subgraphHasRequestRules: per-subgraph map built only when no global rules exist
  • Precomputation occurs once during initialization, avoiding repeated iterations

This optimization is correctly used by hasRequestRulesForSubgraph (lines 346-361) for fast-path decisions in BuildRequestHeaderForSubgraph.

Also applies to: 174-184


311-344: LGTM! BuildRequestHeaderForSubgraph is well-optimized and correct.

The method efficiently builds per-subgraph headers with:

  • Fast-path optimization (lines 321-323) that skips processing when no rules apply
  • Fresh header map construction without modifying the original request
  • Application of both global and subgraph-specific rules in the correct order
  • Deterministic hash computation for caching/deduplication

The early return (nil, 0) when no rules apply is appropriate, as callers can distinguish between "no headers to add" (nil) and "empty headers after rules" (empty map with non-zero hash).


346-361: LGTM! hasRequestRulesForSubgraph provides efficient rule applicability checks.

The method correctly implements fast O(1) lookups using precomputed state:

  • Returns early for nil cases
  • Checks global rules first (apply to all subgraphs)
  • Falls back to per-subgraph map lookup only when needed
  • Handles empty subgraph names appropriately

This enables efficient fast-path optimization in BuildRequestHeaderForSubgraph.


363-392: LGTM! hashHeaderStable provides deterministic and efficient header hashing.

The implementation is excellent:

  • Deterministic: Sorts keys before hashing to ensure consistent results regardless of map iteration order
  • Efficient: Uses xxhash (fast non-cryptographic hash) and minimizes allocations by direct slice indexing (line 384)
  • Correct: Properly delimits keys (\x00), values (\x00), and key groups (\x01) to avoid collisions
  • Complete: Handles multi-value headers by iterating all values

The delimiter scheme ensures that {"A": ["B", "C"]} produces a different hash than {"AB": ["", "C"]}.


493-610: LGTM! applyRequestRuleToHeader refactoring enables per-subgraph header building.

The refactoring successfully:

  • Changes from in-place request modification to operating on a provided header http.Header parameter
  • Updates error propagation to use ctx.SetError instead of returning errors
  • Preserves all original rule application logic (set, propagate, rename, regex matching)
  • Maintains edge case handling (ignored headers, defaults, case-insensitive matching)

This enables BuildRequestHeaderForSubgraph to build headers in a fresh map, supporting deterministic per-subgraph header construction and hashing.


631-631: LGTM! Operation type check is more explicit.

The change from single-flight cache-control logic to context-based operation type checking (resolve.GetOperationTypeFromContext(ctx) == ast.OperationTypeMutation) is clearer and more maintainable. This ensures mutations are never cached, which is the correct behavior for data-mutating operations.


114-118: Content-Length header relies on single-write assumption, which is properly enforced by design.

The comment correctly documents that Write() is called exactly once with the entire response body. This assumption is safe in practice because:

  1. HeaderPropagationWriter is only used for non-streaming operations — it's instantiated in graphql_handler.go:192 for normal GraphQL queries/mutations through ArenaResolveGraphQLResponse.

  2. Streaming responses use a separate code path — subscriptions and streaming responses use GetSubscriptionResponseWriter() which returns an HttpFlushWriter, not HeaderPropagationWriter.

  3. Even if Write() were called multiple times, the flag-based guard (h.setContentLength set to false after the first call on line 119) ensures Content-Length is only set once; subsequent writes delegate directly to the underlying http.ResponseWriter.

The assumption is sound for this code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants