Skip to content
This repository was archived by the owner on Feb 1, 2021. It is now read-only.

[experimental] Simple container rescheduling on node failure#1578

Merged
abronan merged 5 commits intodocker-archive:masterfrom
aluzzardi:rescheduling
Jan 12, 2016
Merged

[experimental] Simple container rescheduling on node failure#1578
abronan merged 5 commits intodocker-archive:masterfrom
aluzzardi:rescheduling

Conversation

@vieux
Copy link
Copy Markdown
Contributor

@vieux vieux commented Jan 4, 2016

See #1488 for details

When a node goes down, swarm will try to reschedule container on another machine.

Depends on moby/moby#19001 for IP stability.
Depends on #1569 for proper cleanup of old containers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment.

@vieux vieux force-pushed the rescheduling branch 2 times, most recently from ea7afd3 to b72d27f Compare January 8, 2016 22:17
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this trigger whole cluster reschedule, instead of just this engine? I think there could be a lot of events for a big cluster. Isn't just checking this container enough?

@vieux vieux force-pushed the rescheduling branch 2 times, most recently from 9352993 to 0220f36 Compare January 8, 2016 23:02
@mavenugo
Copy link
Copy Markdown
Contributor

mavenugo commented Jan 9, 2016

@vieux we are working on the forced cleanup fix in moby/libnetwork#862.
you can use my private docker branch (https://github.com/mavenugo/docker/tree/epcleanup), where this is integrated fully with docker engine API which swarm can use to reschedule container reclaiming the locked resources. Would you like to give it a try in parallel as we make progress in getting this PRs merged in multiple repos ?

aluzzardi and others added 3 commits January 11, 2016 15:59
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Add rescheduling integration tests.

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
fix tests and keep swarm id
remove duplicate on node reconnect
explicit failure

Signed-off-by: Victor Vieux <vieux@docker.com>
Signed-off-by: Victor Vieux <vieux@docker.com>
@vieux vieux changed the title [experimental] [WIP] Container Rescheduling on node failure [experimental] Simple container rescheduling on node failure Jan 12, 2016
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vieux This looks like it shouldn't be here it is a backup file

Signed-off-by: Victor Vieux <vieux@docker.com>
@moxiegirl
Copy link
Copy Markdown

LGTM

@dongluochen
Copy link
Copy Markdown
Contributor

LGTM.

@abronan
Copy link
Copy Markdown
Contributor

abronan commented Jan 12, 2016

LGTM

abronan added a commit that referenced this pull request Jan 12, 2016
[experimental] Simple container rescheduling on node failure
@abronan abronan merged commit e121338 into docker-archive:master Jan 12, 2016
@vieux vieux deleted the rescheduling branch January 12, 2016 23:00
@abronan abronan mentioned this pull request Jan 12, 2016
ChristianKniep pushed a commit to ChristianKniep/swarm that referenced this pull request Jul 27, 2017
[experimental] Simple container rescheduling on node failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants