Skip to content

Conversation

@zhaohuabing
Copy link
Member

@zhaohuabing zhaohuabing commented Nov 21, 2025

What type of PR is this?

Currently, when calculating the CPU usage, the duration passed to rate() is computed as the time from the test starts to the current sampling time.

`rate(container_cpu_usage_seconds_total{namespace="envoy-gateway-system", container="envoy"}[%DURATIONs])*100`

Because this %DURATION continuously grows, the query ends up averaging CPU over the entire test run instead of a fixed window for each sample, which makes the reported CPU usage incorrect.

This PR uses a 30s fixed time window for sampling for each sample, and also avoids reading values from the previous test by using time.Since(startTime) when time.Since(startTime) < benchmarkCPURateWindow.

Before:
Envoy Proxy CPU (min : mean : max)

Test Name Envoy Gateway CPU (%)
min/max/means
Averaged Envoy Proxy CPU (%)
min/max/means
scaling up httproutes to 10 with 2 routes per hostname 0.00 / 0.80 / 0.47 0.00 / 60.52 / 31.28
scaling up httproutes to 50 with 10 routes per hostname 0.77 / 1.97 / 1.13 0.68 / 48.90 / 24.82
scaling up httproutes to 100 with 20 routes per hostname 1.02 / 2.42 / 1.50 0.76 / 38.87 / 22.23
scaling up httproutes to 300 with 60 routes per hostname 2.78 / 4.03 / 3.35 1.67 / 15.27 / 10.25
scaling up httproutes to 500 with 100 routes per hostname 3.95 / 6.35 / 5.04 2.80 / 14.06 / 11.15
scaling up httproutes to 1000 with 200 routes per hostname 8.38 / 10.53 / 9.40 4.34 / 9.59 / 8.24
scaling down httproutes to 500 with 100 routes per hostname 0.00 / 3.00 / 1.41 0.00 / 79.69 / 20.46
scaling down httproutes to 300 with 60 routes per hostname 0.00 / 1.13 / 0.80 0.00 / 88.97 / 36.48
scaling down httproutes to 100 with 20 routes per hostname 0.00 / 3.93 / 1.66 0.00 / 99.94 / 33.64
scaling down httproutes to 50 with 10 routes per hostname 0.00 / 1.33 / 0.83 0.00 / 100.00 / 26.95
scaling down httproutes to 10 with 2 routes per hostname 0.00 / 1.27 / 0.89 0.00 / 70.81 / 19.99

After:

Envoy Proxy CPU (min : mean : max)
Test Name Envoy Gateway CPU (%)
min/max/means
Averaged Envoy Proxy CPU (%)
min/max/means
scaling up httproutes to 10 with 2 routes per hostname 0.00 / 1.13 / 0.42 0.00 / 88.05 / 8.13
scaling up httproutes to 50 with 10 routes per hostname 0.40 / 2.00 / 0.64 0.00 / 97.94 / 14.41
scaling up httproutes to 100 with 20 routes per hostname 0.40 / 2.60 / 0.75 0.00 / 98.78 / 3.86
scaling up httproutes to 300 with 60 routes per hostname 0.40 / 4.93 / 1.07 0.00 / 99.22 / 10.20
scaling up httproutes to 500 with 100 routes per hostname 0.53 / 7.73 / 1.39 0.00 / 77.85 / 4.59
scaling up httproutes to 1000 with 200 routes per hostname 0.60 / 13.20 / 2.00 0.00 / 99.10 / 4.02
scaling down httproutes to 500 with 100 routes per hostname 0.00 / 1.20 / 0.76 0.00 / 98.43 / 4.00
scaling down httproutes to 300 with 60 routes per hostname 0.00 / 1.07 / 0.78 0.00 / 99.39 / 3.12
scaling down httproutes to 100 with 20 routes per hostname 0.00 / 1.13 / 0.72 0.00 / 77.76 / 6.60
scaling down httproutes to 50 with 10 routes per hostname 0.00 / 1.20 / 0.77 0.00 / 57.04 / 1.16
scaling down httproutes to 10 with 2 routes per hostname 0.00 / 1.00 / 0.68 0.00 / 98.58 / 6.95

@zhaohuabing zhaohuabing requested a review from a team as a code owner November 21, 2025 14:59
@zhaohuabing zhaohuabing marked this pull request as draft November 21, 2025 14:59
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.30%. Comparing base (84cd21c) to head (536b6f8).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7581      +/-   ##
==========================================
+ Coverage   72.29%   72.30%   +0.01%     
==========================================
  Files         232      232              
  Lines       34114    34114              
==========================================
+ Hits        24662    24667       +5     
+ Misses       7676     7673       -3     
+ Partials     1776     1774       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhaohuabing zhaohuabing removed the request for review from a team November 22, 2025 02:47
@zhaohuabing zhaohuabing force-pushed the cpu-time branch 6 times, most recently from 2360539 to a657980 Compare November 22, 2025 10:17
@zhaohuabing zhaohuabing changed the title cacluate cpu time bechmark: fix cpu sampling Nov 22, 2025
@zhaohuabing zhaohuabing force-pushed the cpu-time branch 3 times, most recently from 49840fb to beedcdd Compare November 22, 2025 11:49
@zhaohuabing zhaohuabing marked this pull request as ready for review November 23, 2025 01:45
@zhaohuabing zhaohuabing added cherrypick/release-v0.6 cherrypick to release v0.6 cherrypick/release-v1.6.1 and removed cherrypick/release-v0.6 cherrypick to release v0.6 labels Nov 24, 2025
@zirain zirain merged commit 536486f into envoyproxy:main Nov 24, 2025
52 of 55 checks passed
zhaohuabing added a commit to zhaohuabing/gateway that referenced this pull request Dec 5, 2025
use fixed duration for cpu rate

Signed-off-by: Huabing Zhao <[email protected]>
(cherry picked from commit 536486f)
Signed-off-by: Huabing Zhao <[email protected]>
zhaohuabing added a commit that referenced this pull request Dec 5, 2025
* fix: oidc authentication endpoint was overwritten by discovered value (#7460)

fix: oid authentication endpoint was overriden by discovered value

Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
(cherry picked from commit 50dcb15)
Signed-off-by: Huabing Zhao <[email protected]>

* fix: do not return 500 for all requests when part of BackendRefs are invalid (#7488)

* do not return 500 for all requests when part of BackendRefs are invalid

Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
(cherry picked from commit 2899416)
Signed-off-by: Huabing Zhao <[email protected]>

* fix: prevent skeleton route status entries for unmanaged GatewayClasses (#7536)

* fix: prevent skeleton route status entries for unmanaged GatewayClasses

When processing policies (EnvoyExtensionPolicy, SecurityPolicy), the translator
was calling GetRouteParentContext for ALL parentRefs in a route, even those
referencing gateways with different GatewayClasses not managed by this translator.

GetRouteParentContext creates a skeleton RouteParentStatus entry with just the
controllerName when called on a parentRef that hasn't been processed yet. Since
all GatewayClass instances share the same controller name, these skeleton entries
persisted in status without conditions.

The fix checks if a parentRef context already exists before attempting to apply
policy configuration to it. If the context doesn't exist, it means this parentRef
wasn't processed by this translator and should be skipped.

Signed-off-by: Raj Singh <[email protected]>

* fix: also prevent skeleton entries in BackendTrafficPolicy processing

The same issue exists in BackendTrafficPolicy route processing - calling
GetRouteParentContext for all parentRefs creates skeleton status entries.

Apply the same fix: check if parentRef context exists before adding to list.

Signed-off-by: Raj Singh <[email protected]>

---------

Signed-off-by: Raj Singh <[email protected]>
(cherry picked from commit ff13742)
Signed-off-by: Huabing Zhao <[email protected]>

* treat too many addresses as programmed (#7542)

Signed-off-by: cong <[email protected]>
(cherry picked from commit 7cb5f72)
Signed-off-by: Huabing Zhao <[email protected]>

* bechmark: fix cpu sampling (#7581)

use fixed duration for cpu rate

Signed-off-by: Huabing Zhao <[email protected]>
(cherry picked from commit 536486f)
Signed-off-by: Huabing Zhao <[email protected]>

* chore: bump golang.org/x/crypto (#7588)

* chore: bump golang.org/x/crypto

Signed-off-by: zirain <[email protected]>

* fix gen

Signed-off-by: zirain <[email protected]>

---------

Signed-off-by: zirain <[email protected]>
(cherry picked from commit 70fa59a)
Signed-off-by: Huabing Zhao <[email protected]>

* findOwningGateway should return controller based on linked GatewayClass (#7611)

* fix: filter Gateway by controller in findOwningGateway

Prevent cross-controller Gateway mutations by validating GatewayClass

Signed-off-by: Sudipto Baral <[email protected]>
(cherry picked from commit ba8e0e2)
Signed-off-by: Huabing Zhao <[email protected]>

* fix: use default when namespace is unset (#7612)

* fix: use default when namespace is unset

Signed-off-by: zirain <[email protected]>

* fix

Signed-off-by: zirain <[email protected]>

* fix test

Signed-off-by: zirain <[email protected]>

---------

Signed-off-by: zirain <[email protected]>
(cherry picked from commit be2cc73)
Signed-off-by: Huabing Zhao <[email protected]>

* bump Gateway API v1.4.1 (#7653)

Signed-off-by: zirain <[email protected]>
(cherry picked from commit 0fa26d7)
Signed-off-by: Huabing Zhao <[email protected]>

* update release note

Signed-off-by: Huabing Zhao <[email protected]>

* fix gen check

Signed-off-by: Huabing Zhao <[email protected]>

* ci: add script to free disk space (#7534)

* feat: free disk space

Signed-off-by: Shreemaan Abhishek <[email protected]>

* lint

Signed-off-by: Shreemaan Abhishek <[email protected]>

* cleanup

Signed-off-by: Shreemaan Abhishek <[email protected]>

* make target and tools/hack

Signed-off-by: Shreemaan Abhishek <[email protected]>

* lint

Signed-off-by: Shreemaan Abhishek <[email protected]>

* modular action

Signed-off-by: Shreemaan Abhishek <[email protected]>

---------

Signed-off-by: Shreemaan Abhishek <[email protected]>
(cherry picked from commit 4312f38)
Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
Signed-off-by: Raj Singh <[email protected]>
Signed-off-by: cong <[email protected]>
Signed-off-by: zirain <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Co-authored-by: Raj Singh <[email protected]>
Co-authored-by: 聪 <[email protected]>
Co-authored-by: zirain <[email protected]>
Co-authored-by: Sudipto Baral <[email protected]>
Co-authored-by: shreealt <[email protected]>
zhaohuabing added a commit to zhaohuabing/gateway that referenced this pull request Dec 17, 2025
use fixed duration for cpu rate

Signed-off-by: Huabing Zhao <[email protected]>
(cherry picked from commit 536486f)
Signed-off-by: Huabing Zhao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants