Skip to content

FIX-699: Flush all the traces during stopping stage#2515

Closed
mghildiy wants to merge 3 commits intografana:mainfrom
mghildiy:fix699
Closed

FIX-699: Flush all the traces during stopping stage#2515
mghildiy wants to merge 3 commits intografana:mainfrom
mghildiy:fix699

Conversation

@mghildiy
Copy link
Copy Markdown
Contributor

@mghildiy mghildiy commented May 30, 2023

[Ingester] Add ability to completely flush on shutdown(WIP)

Which issue(s) this PR fixes:
Fixes #699

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala
Copy link
Copy Markdown
Contributor

This looks like a good start to me. Thanks for putting this up. Our CI has been struggling a little, so I kicked the failed jobs again.

@joe-elliott
Copy link
Copy Markdown
Collaborator

So the concern with this PR is that there is a chance that flushqueues will be empty even if there is still work to do as detailed here:

#699 (comment)

I see there are more comments in that issue about a potential approach. I will review when I get a chance.

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally had time to read your comment here: #699 (comment)

After digging through the code I agree with your analysis. Nice work deep diving this. Let's do some cleanup and then we're good to merge this.

Comment thread modules/ingester/ingester.go Outdated
i.markUnavailable()

if i.cfg.FlushAllOnShutdown {
i.lifecycler.SetUnregisterOnShutdown(true)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should set this. allow this value to be configured independently of flush_all_on_shutdown.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But i see it being done in flush.ShutDownHandler. Shouldn't we have same behaviour for stopping too?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown handler is called when we want to manually scale down the ingester statefulset. We call this endpoint to force flush all data from an ingester and remove all of its tokens from the ring. Setting UnregisterOnShutdown to true forces the ingester to remove all of its tokens from the ring.

https://grafana.com/docs/tempo/latest/api_docs/#shutdown

For this change it's possible to want an ingester to shutdown and completely flush all of its traces to the backend while leaving its tokens in the ring. The reason we sometimes leave tokens in the ring during shutdown is b/c it reduces ring churn while doing a rollout of the ingesters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now ShutDownHandler too would flush only if this new flag is true?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Shutdown should continue to do exactly what it does. It should not require a flag to set set.

@mghildiy
Copy link
Copy Markdown
Contributor Author

mghildiy commented Jun 2, 2023

Made changes, ran usual commands. And I see many unexpected files changed.
Am I missing something?
Also, when I run test locally, I see error:

make: gotestsum: No such file or directory

PS: Seems mac updates have disturbed things. I see segmentation fault 11.

@joe-elliott
Copy link
Copy Markdown
Collaborator

I'm unsure what caused the large changes. They should definitely not be present.

make: gotestsum: No such file or directory

We swapped to gotestsum recently for better test output in CI. It should be automatically installed if you run make tools

Copy link
Copy Markdown
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding doc.

@mghildiy
Copy link
Copy Markdown
Contributor Author

mghildiy commented Jun 5, 2023

Thank you for adding doc.

Better PR is here.

@joe-elliott
Copy link
Copy Markdown
Collaborator

Closing in favor of: #2538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ingester] Add ability to completely flush on shutdown

4 participants