Description
What happened:
Test 1:
step 1: create a cloneset with following setting (replicas=4, paused = false):
step 2: update the cloneset.spec.template to release a new version [As expected]:
get the cloneset:
the log:
I0719 09:37:04.282526 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=4, maxUnavailable=0, allPods=4, newRevisionPods=0, newRevisionActivePods=0, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=0, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:4 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:4 deleteReadyLimit:0 useSurge:4 useSurgeOldRevision:0 updateNum:4 updateMaxUnavailable:0}
Test 2:
step 1: create a cloneset with following setting (replicas=4, paused = false):
(similar to test 1, but with maxSurge = 50%
step 2: update the cloneset.spec.template to release a new version [As expected]:
get the cloneset:
the log:
I0719 09:28:51.366992 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=2, maxUnavailable=0, allPods=4, newRevisionPods=0, newRevisionActivePods=0, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=0, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:2 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:2 deleteReadyLimit:0 useSurge:2 useSurgeOldRevision:0 updateNum:4 updateMaxUnavailable:0}
so far so good
step 3: update the maxSurge from 50% to 100%
get the cloneset:
complete yaml:
apiVersion: apps.kruise.io/v1alpha1 kind: CloneSet metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps.kruise.io/v1alpha1","kind":"CloneSet","metadata":{"annotations":{},"labels":{"app":"echoserver"},"name":"echoserver","namespace":"bg"},"spec":{"minReadySeconds":99999,"replicas":4,"selector":{"matchLabels":{"app":"echoserver"}},"template":{"metadata":{"labels":{"app":"echoserver","version":"base"}},"spec":{"containers":[{"env":[{"name":"version","value":"base"},{"name":"PORT","value":"8080"},{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"POD_IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}},{"name":"NODE_NAME","value":"version2"}],"image":"cilium/echoserver:latest","imagePullPolicy":"IfNotPresent","name":"echoserver","ports":[{"containerPort":8080,"name":"http"}]}]}},"updateStrategy":{"maxSurge":"100%","maxUnavailable":0,"partition":0,"paused":false,"type":"ReCreate"}}} creationTimestamp: "2024-07-19T09:28:24Z" generation: 3 labels: app: echoserver name: echoserver namespace: bg resourceVersion: "14120354" uid: e9b9f64c-eb05-48a8-89be-4db735f2a011 spec: minReadySeconds: 99999 replicas: 4 revisionHistoryLimit: 10 scaleStrategy: {} selector: matchLabels: app: echoserver template: metadata: creationTimestamp: null labels: app: echoserver version: base spec: containers: - env: - name: version value: base - name: PORT value: "8080" - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: NODE_NAME value: version2 image: cilium/echoserver:latest imagePullPolicy: IfNotPresent name: echoserver ports: - containerPort: 8080 name: http protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 updateStrategy: maxSurge: 100% maxUnavailable: 0 partition: 0 type: ReCreate status: availableReplicas: 0 collisionCount: 0 currentRevision: echoserver-54bf67658f expectedUpdatedReplicas: 4 labelSelector: app=echoserver observedGeneration: 3 readyReplicas: 6 replicas: 6 updateRevision: echoserver-5464fddcfb updatedReadyReplicas: 2 updatedReplicas: 2
the log:
I0719 09:32:09.518530 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=4, maxUnavailable=0, allPods=6, newRevisionPods=2, newRevisionActivePods=2, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=2, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:0 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:0 deleteReadyLimit:0 useSurge:2 useSurgeOldRevision:0 updateNum:2 updateMaxUnavailable:2}
we can find that, the useSurge
is still 2, instead of 4 as test 1. the cloneset now still has 50% pods of new version instead of 100% as expected.
What you expected to happen:
the test 2 should has the consistent result with the test 1, ie. when changing maxSurge from 50% to 100%, controller is expected to scale 100% replicas of pods of new version, instead of not working like now.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
the "bug" is probably caused by this logic in function calculateDiffsWithExpectation
the
useSurge
is calculated from min(maxSurge, max(scaleSurge, updateSurge))
, where maxSurge is 4, scaleSurge is 0, updateSurge
is:
- in test 1, it is min(|4|, |-4|), and thus the final
useSurge
is 4 - in test 2, it is min(|4|,|-2|), and thus the final
useSurge
is 2, which caused this bug
Environment:
- Kruise version:
- Kubernetes version (use
kubectl version
): - Install details (e.g. helm install args):
- Others: