Improve node management.#1569
Conversation
1. Introduce pending state. Pending nodes need validation before moving to healthy state. Resolve issues of duplicate ID and dead node drop issues. 2. Expose error and last update time in docker info. 3. Use connect success/failure to drive state transition between healthy and unhealthy. Signed-off-by: Dong Chen <dongluo.chen@docker.com>
|
Design LGTM. Will review the code and test shortly. ping @vieux for design approval. @dongluochen This might need a set of integration tests (or improve the existing ones) to validate that the state machine works and that a node transitions correctly from a state to another. At least for |
|
@abronan thanks Alex! I'll update test cases soon. |
|
@dongluochen small comment, but could we please put |
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
|
@vieux I updated no error to |
…l. Each pending node has its own validation interval according to failure count. So reducing sleep interval is not increasing validation frequency for unreachable nodes. Signed-off-by: Dong Chen <dongluo.chen@docker.com>
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
|
LGTM |
|
ping @docker/swarm-maintainers. |
cluster/engine.go
Outdated
There was a problem hiding this comment.
@dongluochen This is just for convenience but can we add a TODO and reference to the issue (#1508) here? Because the comment is there anyway :)
|
Tested and seems to work as expected, just very small nits but LGTM |
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
|
Thanks @abronan! I've updated code accordingly. |
|
Sweet! Thanks @dongluochen! |
|
🎉 |
Improve node management.
This change improves swarm node management according to proposal #1486.
Signed-off-by: Dong Chen dongluo.chen@docker.com