-
Notifications
You must be signed in to change notification settings - Fork 40
Add support for parallel node updates within a statefulset #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commits allows multiple nodes to be updated and deployed at the same time within a statefulset. Opting into this behavior requires the user to update the ClusterSpec to set OnDeleteUpdateStrategy to true. When this is set, the operator will watch for the operator.m3db.io/parallel-update annotation on statefulsets. When found, the operator will attempt to update each statefulset, rolling N many pods at a time where N is the value of the parallel-update annotation.
mu.Unlock() | ||
|
||
if seen != len(opts.expectedStatefulSets) { | ||
time.Sleep(100 * time.Millisecond) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continue and break logic actually caused the test method to exit before statefulset updates were finished so just remove and rely on the iterations limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally LG
pkg/controller/controller.go
Outdated
zap.Int64("generation", expected.Generation), | ||
zap.Int64("observed", expected.Status.ObservedGeneration), | ||
) | ||
} else if _, ok := actual.Annotations[annotations.ParallelUpdateInProgress]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like this method is doing a lot with several different flows. is there anyway we can break this up?
it seems updating the sts is the same between the 2 different rollout strategies, so that's probably something we could break out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, simplified this to make it a little more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just had one edge question on how the StatefulSet controller handles changes to a set's update strategy.
pkg/controller/controller.go
Outdated
expected *appsv1.StatefulSet, | ||
update bool, | ||
) (bool, error) { | ||
if !update { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it make sense to move the logic for checking the update
bool to where we call updateWithRollingUpdateStrategy
or updateWithOnDeleteStrategy
. Seems if update
is false both functions become noop's so just wondering if it would be simpler to only call them when update
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually still want to call updateWithOnDeleteStrategy
even if we're not updating the statefulset as we may still need to roll nodes in the set. Simplified the logic to make this a little more clear
pkg/controller/controller.go
Outdated
// The expectation is that these update methods will return false when they have updated | ||
// actual pods in the statefulset. This will stop any further processing of pods until | ||
// the updated ones are back healthy | ||
if cluster.Spec.OnDeleteUpdateStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know if we were to update a StatefulSet
's template
and its updateStrategy
if the controller will use the new updateStrategy
to update the set or the old one? Just wondering about a potential edge case where we update both, and the StatefulSet controller used the old strategy for the update but because we're checking cluster.Spec.OnDeleteUpdateStrategy
here we expect the new one will be used. I would think the controller would use the new strategy that's in the update but Kube has surprised me before 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I just tested this out and it does respect the new updateStrategy
. So when you change the template and set OnDelete
, k8s (correctly, imo) waits for you to roll the statefulset pods to pick up the update.
Codecov Report
@@ Coverage Diff @@
## master #283 +/- ##
==========================================
+ Coverage 75.99% 76.09% +0.10%
==========================================
Files 32 32
Lines 2416 2548 +132
==========================================
+ Hits 1836 1939 +103
+ Misses 432 430 -2
- Partials 148 179 +31 Continue to review full report at Codecov.
|
pkg/controller/controller.go
Outdated
c.logger.Error(err.Error()) | ||
return err | ||
} | ||
processNext = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have processNext = false
here and continue this loop, can we just return nil
so that we exit + re-trigger the entire reconciliation loop? That way we'll run all of our safety checks again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this would also match the old post-update behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, yes, that's much simpler. Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, wait. Only if the updateStrategy is rolling update can we do that. Still probably nice to bail early if using rolling update, so will modify for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, pushed a change to clarify this logic even more. It's worth noting that none of the semantics around this loop have changed here. In both the RollingUpdate and OnDelete cases, if changes are made to a statefulset, the operator exits this loop and relies on event updates (i.e. pod ready) to continue updating.
pkg/controller/controller.go
Outdated
} | ||
names = append(names, pod.Name) | ||
} | ||
logger.Info("restarting pods", zap.Any("pods", names)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can use zap.Strings
here since names
is a []string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, TIL!
pkg/controller/controller.go
Outdated
// NB(nate): K8s handles this for you when using the RollingUpdate update strategy. | ||
// However, since OnDelete removes k8s from the pod update process, it's our | ||
// responsibility to set this once done rolling out. | ||
set.Status.CurrentReplicas = set.Status.UpdatedReplicas | ||
set.Status.CurrentRevision = set.Status.UpdateRevision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising / also concerning to have to reconcile k8s' status for it. Are there docs somewhere on this?
If we have to do this, I would also prefer we use Update
here rather than a patch. That way if k8s has modified the statefulset in between when we decided to change this field + when we made the patch, we'll get a conflict and have to re-reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's very unfortunate. I pieced it together from a bug report (kubernetes/kubernetes#73492) and looking at the update code (https://github.com/kubernetes/kubernetes/blob/053e7f4b0430e1f3814bb16bdb506cd59bb26593/pkg/controller/statefulset/stateful_set_utils.go#L377). But definitely let me know if I'm off base here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, was looking closer at the StatefulSetInterface and saw this method:
UpdateStatus(*v1.StatefulSet) (*v1.StatefulSet, error)
What are your thoughts on this change then?
sts.Status.CurrentReplicas = sts.Status.UpdatedReplicas
sts.Status.CurrentRevision = sts.Status.UpdateRevision
if sts, err = c.kubeClient.AppsV1().StatefulSets(cluster.Namespace).UpdateStatus(sts); err != nil {
return false, err
}
if err = c.patchStatefulSet(sts, func(set *appsv1.StatefulSet) {
delete(set.Annotations, annotations.ParallelUpdateInProgress)
}); err != nil {
return false, err
}
Would that also generate a conflict since it's no longer patching for the revision update? Also, seems safe to patch to remove the annotation, but correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SGTM 👍 .
pkg/controller/controller.go
Outdated
return false, nil | ||
} | ||
|
||
func (c *M3DBController) getMaxPodsToUpdate(actual *appsv1.StatefulSet) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a free function rather than a method on *M3DBController
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very nice! The changes make this easy to follow.
pkg/controller/controller.go
Outdated
// NB(nate): K8s handles this for you when using the RollingUpdate update strategy. | ||
// However, since OnDelete removes k8s from the pod update process, it's our | ||
// responsibility to set this once done rolling out. | ||
set.Status.CurrentReplicas = set.Status.UpdatedReplicas | ||
set.Status.CurrentRevision = set.Status.UpdateRevision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SGTM 👍 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating. LG!
pkg/controller/controller.go
Outdated
} | ||
|
||
// updateStatefulSetNodes returns true if it updates any pods | ||
func (c *M3DBController) updateStatefulSetNodes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would be nice to be consistent with nodes vs pods in the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i lean towards pods since this is k8s code
* master: Backwards compatibility when using the original update annoation with an OnDelete update strategy (#284) Add support for parallel node updates within a statefulset (#283) Support namespace ExtendedOptions in cluster spec (#282) [controller] Support multi instance placement add (#275) [gomod] Update M3DB dependency (#277) [cmd] Fix instrument package name (#280) # Conflicts: # pkg/k8sops/annotations/annotations.go
This commits allows multiple nodes to be updated and deployed
at the same time within a statefulset. Opting into this behavior
requires the user to update the ClusterSpec to set OnDeleteUpdateStrategy
to true. When this is set, the operator will watch for the
operator.m3db.io/parallel-update annotation on statefulsets. When found,
the operator will attempt to update each statefulset, rolling N many
pods at a time where N is the value of the parallel-update annotation.
Thanks for contributing to the M3DB Operator! We'd love to accept your contribution, but we require that most issues
outside of trivial changes such as typo corrections have an issue associated with the pull request. So please open an
issue first!