Skip to content

CNTRLPLANE-3238: pathological events: bump KMS ScalingReplicaSet threshold to 150#31209

Open
gangwgr wants to merge 1 commit into
openshift:mainfrom
gangwgr:fix-kms-scaling-threshold
Open

CNTRLPLANE-3238: pathological events: bump KMS ScalingReplicaSet threshold to 150#31209
gangwgr wants to merge 1 commit into
openshift:mainfrom
gangwgr:fix-kms-scaling-threshold

Conversation

@gangwgr
Copy link
Copy Markdown
Contributor

@gangwgr gangwgr commented May 22, 2026

The KMS encryption tests trigger cascading rollouts across openshift-apiserver and openshift-oauth-apiserver. The previous threshold of 100 was exceeded in CI (observed 106 events), causing spurious pathological event failures. Bump to 150 to provide adequate headroom.

Summary by CodeRabbit

  • Chores
    • Increased tolerance for repeated scaling-related events during KMS-encryption test rollouts to reduce false failures from higher observed repeat counts.
    • Updated validation thresholds and explanatory notes to reflect the new allowance, improving test stability for affected control-plane components.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 577ed80c-5752-48ae-9d34-85d26bbecefb

📥 Commits

Reviewing files that changed from the base of the PR and between aa7c66b and cf9ccf9.

📒 Files selected for processing (1)
  • pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

Walkthrough

This pull request increases the repeat threshold tolerance for ScalingReplicaSet events in the KMS-encryption test matcher. The change updates both the comment documentation and the repeatThresholdOverride value to allow more repeated events during test execution.

Changes

KMS Encryption ScalingReplicaSet Matcher

Layer / File(s) Summary
Increase ScalingReplicaSet repeat threshold for KMS encryption tests
pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go
Comment updated to reflect new observed range, and repeatThresholdOverride increased for newKMSEncryptionTestScalingReplicaSetMatcher to tolerate higher repeat counts for ScalingReplicaSet events.

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested labels: approved, lgtm, verified

Suggested reviewers:

  • kaleemsiddiqu
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: bumping the KMS ScalingReplicaSet threshold to 150, which directly aligns with the file-level changes and PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies library code for pathological event matching, not Ginkgo test definitions. No Ginkgo tests (It, Describe, Context, When) are present in the changes, making the check not applicable.
Test Structure And Quality ✅ Passed The modified file duplicated_event_patterns.go is a library configuration file, not Ginkgo test code. The check for Ginkgo test quality is not applicable to non-test code.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The file modified is a pathological event matcher library containing no test declarations, only configuration updates.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The change only updates event validation thresholds in duplicated_event_patterns.go, which is not a test definition file. Check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only a test monitoring library file that adjusts thresholds for pathological event detection. No deployment manifests, operator code, scheduling constraints, or topologies are affected.
Ote Binary Stdout Contract ✅ Passed The modified file is a library containing event matcher definitions with no process-level stdout writes, klog calls, or other OTE contract violations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies only duplicated_event_patterns.go, a library file with no Ginkgo e2e tests (It/Describe/Context/When). No new e2e tests are added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr
Once this PR has been reviewed and has the lgtm label, please assign neisw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from deads2k and p0lyn0mial May 22, 2026 07:21
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 22, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented May 25, 2026

/retest-required

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented May 25, 2026

/test e2e-vsphere-ovn-upi

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented May 25, 2026

/verified by ci runs

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@gangwgr: This PR has been marked as verified by ci runs.

Details

In response to this:

/verified by ci runs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gangwgr gangwgr changed the title pathological events: bump KMS ScalingReplicaSet threshold to 150 No-JIRA: pathological events: bump KMS ScalingReplicaSet threshold to 150 May 25, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@gangwgr: This pull request explicitly references no jira issue.

Details

In response to this:

The KMS encryption tests trigger cascading rollouts across openshift-apiserver and openshift-oauth-apiserver. The previous threshold of 100 was exceeded in CI (observed 106 events), causing spurious pathological event failures. Bump to 150 to provide adequate headroom.

Summary by CodeRabbit

  • Chores
  • Updated event validation thresholds in KMS encryption test scenarios to accommodate a higher frequency of scaling-related events, improving test reliability during rollouts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gangwgr gangwgr changed the title No-JIRA: pathological events: bump KMS ScalingReplicaSet threshold to 150 CNTRLPLANE-3238: pathological events: bump KMS ScalingReplicaSet threshold to 150 May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 25, 2026

@gangwgr: This pull request references CNTRLPLANE-3238 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

The KMS encryption tests trigger cascading rollouts across openshift-apiserver and openshift-oauth-apiserver. The previous threshold of 100 was exceeded in CI (observed 106 events), causing spurious pathological event failures. Bump to 150 to provide adequate headroom.

Summary by CodeRabbit

  • Chores
  • Updated event validation thresholds in KMS encryption test scenarios to accommodate a higher frequency of scaling-related events, improving test reliability during rollouts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

},
messageReasonRegex: regexp.MustCompile(`^ScalingReplicaSet$`),
repeatThresholdOverride: 100,
repeatThresholdOverride: 150,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, 150 is too high. If 106 is expected count, maybe we should update this to 110.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 120 since 110 is very close to the current threshold.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, in that scenario, it is better to close to the threshold. I'd like to hear opinions from @p0lyn0mial

@gangwgr gangwgr force-pushed the fix-kms-scaling-threshold branch from e5cce4a to aa7c66b Compare May 25, 2026 12:28
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 25, 2026
The KMS encryption tests trigger cascading rollouts across
openshift-apiserver and openshift-oauth-apiserver. The previous
threshold of 100 was exceeded in CI (observed 106 events),
causing spurious pathological event failures. Bump to 150 to
provide adequate headroom.
@gangwgr gangwgr force-pushed the fix-kms-scaling-threshold branch from aa7c66b to cf9ccf9 Compare May 25, 2026 12:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Line 1403: The value assigned to repeatThresholdOverride in
duplicated_event_patterns.go (repeatThresholdOverride: 120) contradicts the PR
description that says the threshold was increased to 150 and ignores prior
feedback suggesting 110; fix by making the intent consistent: either update the
code to set repeatThresholdOverride to 150 to match the PR title/description, or
if 120 (or 110) is preferred, update the PR description and justification to
state that value and why (e.g., observed max = 106 with chosen headroom). Locate
the repeatThresholdOverride assignment in duplicated_event_patterns.go and
adjust the numeric constant or the PR text accordingly so code and PR messaging
align.
- Line 1394: The comment above the repeated-event threshold is inconsistent with
the code: update the comment to match the implemented value used by
repeatThresholdOverride (currently 120) or change repeatThresholdOverride to 150
if the intended threshold is 150; locate the occurrence of
repeatThresholdOverride in duplicated_event_patterns.go and either (A) edit the
comment "threshold set to 150 with headroom." to "threshold set to 120 with
headroom." or (B) change the constant/assignment for repeatThresholdOverride
from 120 to 150 so code and comment match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 22625b1d-902e-4f29-8146-5e76de51a887

📥 Commits

Reviewing files that changed from the base of the PR and between e5cce4a and aa7c66b.

📒 Files selected for processing (1)
  • pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

Comment thread pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go Outdated
},
messageReasonRegex: regexp.MustCompile(`^ScalingReplicaSet$`),
repeatThresholdOverride: 100,
repeatThresholdOverride: 120,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Threshold value (120) contradicts PR description (150) and ignores prior feedback.

The repeatThresholdOverride is set to 120, but the PR title and description state the threshold is being increased to 150. Additionally, a past reviewer suggested that 110 would be more appropriate given the observed maximum of 106 events.

The current value of 120 provides reasonable headroom (~13% above the observed max), but the PR messaging is inconsistent and prior feedback appears unaddressed.

🔍 Suggested clarification

Either:

  1. Update the PR description to reflect the actual threshold of 120, or
  2. Change the code to 150 if that's the intended value, or
  3. Address the past review feedback suggesting 110 is sufficient
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
at line 1403, The value assigned to repeatThresholdOverride in
duplicated_event_patterns.go (repeatThresholdOverride: 120) contradicts the PR
description that says the threshold was increased to 150 and ignores prior
feedback suggesting 110; fix by making the intent consistent: either update the
code to set repeatThresholdOverride to 150 to match the PR title/description, or
if 120 (or 110) is preferred, update the PR description and justification to
state that value and why (e.g., observed max = 106 with chosen headroom). Locate
the repeatThresholdOverride assignment in duplicated_event_patterns.go and
adjust the numeric constant or the PR text accordingly so code and PR messaging
align.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@gangwgr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants