-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OTA-1637: ClusterOperators should not go Progressing only for a node reboot #30296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@hongkailiu: This pull request references OTA-1637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
22125c5
to
dbb8c76
Compare
@hongkailiu: This pull request references OTA-1637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test e2e-aws-ovn-upgrade |
I am a bit surprised that there are no failures. For example, this job: 🤔 |
Job Failure Risk Analysis for sha: dbb8c76
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: dbb8c76
|
/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1adde290-995e-11f0-9d57-8ba68c7fd024-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1adde290-995e-11f0-9d57-8ba68c7fd024-0 |
dbb8c76
to
bad744e
Compare
The payload cmd above is working. The triggered job shows the code spotted some violations but also missed a few (comparing to the COs with ![]()
Refining the code to see why missed. |
/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05382750-998d-11f0-963e-c3b711331506-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05382750-998d-11f0-963e-c3b711331506-0 |
Job Failure Risk Analysis for sha: bad744e
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: bad744e
|
bad744e
to
d415550
Compare
Same caught list before, still missing something. Let us try again with total number of events: /payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5e26f440-9a14-11f0-8659-71352787cde2-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5e26f440-9a14-11f0-8659-71352787cde2-0 |
Job Failure Risk Analysis for sha: d415550
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: d415550
|
This is to cover the node rebooting case from the rule [1] that is introduced recently: ``` Operators should not report Progressing only because DaemonSets owned by them are adjusting to a new node from cluster scaleup or a node rebooting from cluster upgrade. ``` The test fails if - `co/machine-config` never became Progressing=True during a cluster upgrade, or - some CO left Progressing=False during the upgrade after `machine-config` became Progressing=True. This should not have taken place as `machine-config` was rebooting the nodes which was the only thing ongoing to the cluster during that time. [1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164
d415550
to
969fcc5
Compare
Let us see if https://github.com/openshift/origin/compare/d41555013e72e50e055e6fc1ecf38229560c5b35..969fcc5a0bf55a5242c3e57a302f9e2fd2a04370 helps. /payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7f025ef0-9a48-11f0-971e-3b18f6cdf426-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7f025ef0-9a48-11f0-971e-3b18f6cdf426-0 |
@hongkailiu: This PR was included in a payload test run from #30325
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/013427e0-9e46-11f0-90a3-59eae64d5fbf-0 |
New changes are detected. LGTM label has been removed. |
b97312c
to
d253c23
Compare
/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/dab66540-9ed3-11f0-9b1b-2118d6080ae8-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/dab66540-9ed3-11f0-9b1b-2118d6080ae8-0 |
d253c23
to
352fc56
Compare
/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c401b990-9ed7-11f0-8d28-0a344ca93541-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c401b990-9ed7-11f0-8d28-0a344ca93541-0 |
352fc56
to
603cedb
Compare
/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296 |
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af957770-9f00-11f0-8373-dac78478a6e5-0 |
@hongkailiu: This PR was included in a payload test run from #30296
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af957770-9f00-11f0-8373-dac78478a6e5-0 |
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 603cedb
New tests seen in this PR at sha: 603cedb
|
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 603cedb
|
The new tests passing on micro upgrade. and flaky (as expected) on the 9 components with exceptions in minor upgrade. OLM is still not there, waiting for #30325 Same on minor upgrade. /hold cancel |
/test e2e-aws-ovn-serial-2of2 |
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 603cedb
|
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 603cedb
|
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 603cedb
|
/test ? |
@wking: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test e2e-aws-ovn-upgrade |
@hongkailiu: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: 603cedb
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 603cedb
New tests seen in this PR at sha: 603cedb
|
This is to cover the node rebooting case from the rule [1] that is introduced recently:
The test fails if
co/machine-config
never became Progressing=True during acluster upgrade, or
machine-config
became Progressing=True. This should nothave taken place as
machine-config
was rebooting the nodeswhich was the only thing ongoing to the cluster during that
time.
[1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164