Skip to content

Use BasicLifecycler for distributors and auto-forget#2154

Merged
56quarters merged 4 commits intomainfrom
56quarters/distributor-lifecycle
Jun 24, 2022
Merged

Use BasicLifecycler for distributors and auto-forget#2154
56quarters merged 4 commits intomainfrom
56quarters/distributor-lifecycle

Conversation

@56quarters
Copy link
Copy Markdown
Contributor

@56quarters 56quarters commented Jun 21, 2022

What this PR does

Use the BasicLifecycler in distributors for managing their lifecycle so
that we can take advantage of the "auto-forget" delegates feature. This
prevents the ring from filling up with "unhealthy" distributors that are
never removed. This wasn't a bug but it was confusing for users and
operators.

Which issue(s) this PR fixes or relates to

Fixes #2138

Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com

Checklist

  • Tests updated
  • [na] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment thread pkg/distributor/distributor.go Outdated
@56quarters 56quarters force-pushed the 56quarters/distributor-lifecycle branch 2 times, most recently from e461b3e to 2a5c10c Compare June 21, 2022 14:13
@56quarters 56quarters marked this pull request as ready for review June 21, 2022 14:13
@56quarters 56quarters requested a review from pstibrany June 21, 2022 18:11
@56quarters 56quarters force-pushed the 56quarters/distributor-lifecycle branch 3 times, most recently from 100669f to a096530 Compare June 22, 2022 15:56
@56quarters 56quarters requested a review from colega June 22, 2022 16:00
@56quarters 56quarters marked this pull request as draft June 22, 2022 21:32
@56quarters 56quarters force-pushed the 56quarters/distributor-lifecycle branch from c05e273 to 83c065d Compare June 23, 2022 00:03
Comment thread pkg/distributor/distributor_ring.go
Comment thread pkg/distributor/distributor.go
Comment thread pkg/distributor/distributor.go Outdated
@colega
Copy link
Copy Markdown
Contributor

colega commented Jun 23, 2022

LGTM, left some nitpicks, waiting to be un-drafted.

@krajorama
Copy link
Copy Markdown
Contributor

Hi, let me know if there's anything useful here: #684 , I'd like to close that old PR of mine - I never had time to finish it, plus lifecycler had some feature that I didn't know how to emulate.

Comment thread pkg/distributor/distributor.go Outdated
Comment thread pkg/distributor/distributor.go Outdated
Comment thread pkg/distributor/distributor.go Outdated
Comment thread pkg/distributor/instance_count.go Outdated
Comment thread pkg/distributor/distributor_ring.go Outdated
@56quarters 56quarters marked this pull request as ready for review June 23, 2022 15:35
@56quarters
Copy link
Copy Markdown
Contributor Author

Hi, let me know if there's anything useful here: #684 , I'd like to close that old PR of mine - I never had time to finish it, plus lifecycler had some feature that I didn't know how to emulate.

Definitely some useful parts!

I see some similarity with needing a number of healthy ring members:

  • Your PR adds a healthy instances + zone count to the BasicLifecycler.
  • This PR uses a BasicLifecyclerDelegate implementation to count healthy instances.

I'm not sure what the "MinReadyDuration" feature would look like as a delegate.

@56quarters 56quarters requested review from colega and pstibrany June 23, 2022 16:27
Copy link
Copy Markdown
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Nice work.

Comment thread pkg/distributor/instance_count_test.go Outdated
Use the BasicLifecycler in distributors for managing their lifecycle so
that we can take advantage of the "auto-forget" delegates feature. This
prevents the ring from filling up with "unhealthy" distributors that are
never removed. This wasn't a bug but it was confusing for users and
operators.

Fixes #2138

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters force-pushed the 56quarters/distributor-lifecycle branch from 315bc07 to ed3350f Compare June 24, 2022 15:15
Comment thread pkg/distributor/instance_count_test.go Outdated
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters merged commit 7e4accd into main Jun 24, 2022
@56quarters 56quarters deleted the 56quarters/distributor-lifecycle branch June 24, 2022 17:08
rlex added a commit to rlex/mimir that referenced this pull request Jun 28, 2022
* main: (63 commits)
  Add new section on website for links to blog posts, podcasts and talks. (grafana#2216)
  Rename codified errors to errors catalog (grafana#2256)
  Helm: add a step to contributing doc (grafana#2257)
  Signal that 2.2 release is now in progress. (grafana#2254)
  Removed migration of alertmanager local state files from old hierarchy (Cortex 1.8 and earlier) (grafana#2253)
  operations/mimir: Change multi_zone_ingester_max_unavailable to 25 (grafana#2251)
  Helm: weekly release (grafana#2252)
  Jsonnet: Configure ingester max global metadata per user and per metric (grafana#2250)
  Helm: metamonitor naming (grafana#2236)
  Mimir documentation about out-of-order (grafana#2183)
  Vendor latest mimir-prometheus/main (grafana#2243)
  Set CODEOWNERS to primary technical writer (grafana#2242)
  Use BasicLifecycler for distributors and auto-forget (grafana#2154)
  Docs: Basic documentation for deploying the ruler using jsonnet. (grafana#2127)
  Fix post merge reviews on 2187 (grafana#2230)
  Add tests for user metadata in the ingester (grafana#2184)
  Change the error message template for per-tenant limits (grafana#2234)
  helm: meta-monitoring (grafana#2068)
  Article about migrating from Consul to memberlist. Added documentation for /memberlist endpoint. (grafana#2166)
  Update runbooks to mention possibility to investigate memberlist KV store in various alerts (grafana#2158)
  ...
masonmei pushed a commit to udmire/mimir that referenced this pull request Jul 11, 2022
Use the BasicLifecycler in distributors for managing their lifecycle so
that we can take advantage of the "auto-forget" delegates feature. This
prevents the ring from filling up with "unhealthy" distributors that are
never removed. This wasn't a bug but it was confusing for users and
operators.

Fixes grafana#2138

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

// ringAutoForgetUnhealthyPeriods is how many consecutive timeout periods an unhealthy instance
// in the ring will be automatically removed after.
ringAutoForgetUnhealthyPeriods = 10
Copy link
Copy Markdown
Collaborator

@pracucci pracucci Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very high. I would be more aggressive with distributors, like 2 should be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate distributors hash ring to BasicLifecycler and add auto-forget feature

5 participants