[experimental] Simple container rescheduling on node failure by vieux · Pull Request #1578 · docker-archive/classicswarm

vieux · 2016-01-04T22:08:41Z

See #1488 for details

When a node goes down, swarm will try to reschedule container on another machine.

Depends on moby/moby#19001 for IP stability.
Depends on #1569 for proper cleanup of old containers.

dongluochen · 2016-01-07T01:30:07Z

cluster/config.go

nit: update comment.

dongluochen · 2016-01-08T22:23:15Z

cluster/watchdog.go

Any reason this trigger whole cluster reschedule, instead of just this engine? I think there could be a lot of events for a big cluster. Isn't just checking this container enough?

mavenugo · 2016-01-09T21:45:36Z

@vieux we are working on the forced cleanup fix in moby/libnetwork#862.
you can use my private docker branch (https://github.com/mavenugo/docker/tree/epcleanup), where this is integrated fully with docker engine API which swarm can use to reschedule container reclaiming the locked resources. Would you like to give it a try in parallel as we make progress in getting this PRs merged in multiple repos ?

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>

Add rescheduling integration tests. Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>

fix tests and keep swarm id remove duplicate on node reconnect explicit failure Signed-off-by: Victor Vieux <vieux@docker.com>

Signed-off-by: Victor Vieux <vieux@docker.com>

moxiegirl · 2016-01-12T02:12:10Z

experimental/README.md~

@vieux This looks like it shouldn't be here it is a backup file

Signed-off-by: Victor Vieux <vieux@docker.com>

moxiegirl · 2016-01-12T13:57:09Z

LGTM

dongluochen · 2016-01-12T22:33:52Z

LGTM.

abronan · 2016-01-12T23:00:19Z

LGTM

[experimental] Simple container rescheduling on node failure

vieux added the area/API label Jan 4, 2016

vieux self-assigned this Jan 4, 2016

vieux added this to the 1.1.0 milestone Jan 4, 2016

GordonTheTurtle added the status/0-triage label Jan 4, 2016

vieux force-pushed the rescheduling branch from 4f796f6 to 2dffe17 Compare January 4, 2016 23:15

vieux added status/1-design-review priority/P2 and removed status/0-triage labels Jan 4, 2016

dongluochen reviewed Jan 7, 2016
View reviewed changes

cluster/config.go Outdated

Copy link
Copy Markdown

Contributor

dongluochen Jan 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment.

vieux force-pushed the rescheduling branch 2 times, most recently from ea7afd3 to b72d27f Compare January 8, 2016 22:17

dongluochen reviewed Jan 8, 2016
View reviewed changes

vieux force-pushed the rescheduling branch 2 times, most recently from 9352993 to 0220f36 Compare January 8, 2016 23:02

mavenugo mentioned this pull request Jan 9, 2016

Force endpoint delete moby/libnetwork#862

Merged

mavenugo mentioned this pull request Jan 9, 2016

an option to disconnect an endpoint from a network forcefully docker-archive-public/docker.engine-api#29

Merged

vieux force-pushed the rescheduling branch from 0220f36 to c316586 Compare January 11, 2016 19:44

aluzzardi and others added 3 commits January 11, 2016 15:59

cluster: Support multiple event handlers.

56941d0

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>

Add support for container rescheduling on node failure.

13f6021

Add rescheduling integration tests. Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>

add doc

78008f4

fix tests and keep swarm id remove duplicate on node reconnect explicit failure Signed-off-by: Victor Vieux <vieux@docker.com>

vieux force-pushed the rescheduling branch from c316586 to 78008f4 Compare January 11, 2016 23:59

improve eventHandlers locking

a2018c1

Signed-off-by: Victor Vieux <vieux@docker.com>

vieux changed the title ~~[experimental] [WIP] Container Rescheduling on node failure~~ [experimental] Simple container rescheduling on node failure Jan 12, 2016

moxiegirl reviewed Jan 12, 2016
View reviewed changes

experimental/README.md~ Outdated

Copy link
Copy Markdown

moxiegirl Jan 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vieux This looks like it shouldn't be here it is a backup file

move doc to experimental/

74dfe8b

Signed-off-by: Victor Vieux <vieux@docker.com>

vieux force-pushed the rescheduling branch from 2c735bb to 74dfe8b Compare January 12, 2016 02:16

abronan added a commit that referenced this pull request Jan 12, 2016

Merge pull request #1578 from aluzzardi/rescheduling

e121338

[experimental] Simple container rescheduling on node failure

abronan merged commit e121338 into docker-archive:master Jan 12, 2016

vieux deleted the rescheduling branch January 12, 2016 23:00

abronan mentioned this pull request Jan 12, 2016

Proposal Node failover #755

Closed

ChristianKniep pushed a commit to ChristianKniep/swarm that referenced this pull request Jul 27, 2017

Merge pull request docker-archive#1578 from aluzzardi/rescheduling

15a3e1e

[experimental] Simple container rescheduling on node failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] Simple container rescheduling on node failure#1578

[experimental] Simple container rescheduling on node failure#1578
abronan merged 5 commits intodocker-archive:masterfrom
aluzzardi:rescheduling

vieux commented Jan 4, 2016

Uh oh!

dongluochen Jan 7, 2016

Uh oh!

dongluochen Jan 8, 2016

Uh oh!

mavenugo commented Jan 9, 2016

Uh oh!

moxiegirl Jan 12, 2016

Uh oh!

moxiegirl commented Jan 12, 2016

Uh oh!

dongluochen commented Jan 12, 2016

Uh oh!

abronan commented Jan 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

vieux commented Jan 4, 2016

Uh oh!

dongluochen Jan 7, 2016

Choose a reason for hiding this comment

Uh oh!

dongluochen Jan 8, 2016

Choose a reason for hiding this comment

Uh oh!

mavenugo commented Jan 9, 2016

Uh oh!

moxiegirl Jan 12, 2016

Choose a reason for hiding this comment

Uh oh!

moxiegirl commented Jan 12, 2016

Uh oh!

dongluochen commented Jan 12, 2016

Uh oh!

abronan commented Jan 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants