FIX-699: Flush all the traces during stopping stage#2515
FIX-699: Flush all the traces during stopping stage#2515mghildiy wants to merge 3 commits intografana:mainfrom
Conversation
|
This looks like a good start to me. Thanks for putting this up. Our CI has been struggling a little, so I kicked the failed jobs again. |
|
So the concern with this PR is that there is a chance that flushqueues will be empty even if there is still work to do as detailed here: I see there are more comments in that issue about a potential approach. I will review when I get a chance. |
joe-elliott
left a comment
There was a problem hiding this comment.
I finally had time to read your comment here: #699 (comment)
After digging through the code I agree with your analysis. Nice work deep diving this. Let's do some cleanup and then we're good to merge this.
- Since the added lines are the same as what's in the ShutdownHandler let's move these out to a new function and use it in both places. Please comment why it works b/c it's a little unintuitive.
- Adding a test for this new method would be nice. Not sure how feasible that is.
- Add a changelog entry
- Document the new field in the ingester docs here: https://github.com/grafana/tempo/blob/main/docs/sources/tempo/configuration/_index.md#ingester
| i.markUnavailable() | ||
|
|
||
| if i.cfg.FlushAllOnShutdown { | ||
| i.lifecycler.SetUnregisterOnShutdown(true) |
There was a problem hiding this comment.
i don't think we should set this. allow this value to be configured independently of flush_all_on_shutdown.
There was a problem hiding this comment.
But i see it being done in flush.ShutDownHandler. Shouldn't we have same behaviour for stopping too?
There was a problem hiding this comment.
The shutdown handler is called when we want to manually scale down the ingester statefulset. We call this endpoint to force flush all data from an ingester and remove all of its tokens from the ring. Setting UnregisterOnShutdown to true forces the ingester to remove all of its tokens from the ring.
https://grafana.com/docs/tempo/latest/api_docs/#shutdown
For this change it's possible to want an ingester to shutdown and completely flush all of its traces to the backend while leaving its tokens in the ring. The reason we sometimes leave tokens in the ring during shutdown is b/c it reduces ring churn while doing a rollout of the ingesters.
There was a problem hiding this comment.
So now ShutDownHandler too would flush only if this new flag is true?
There was a problem hiding this comment.
No. Shutdown should continue to do exactly what it does. It should not require a flag to set set.
|
Made changes, ran usual commands. And I see many unexpected files changed. make: gotestsum: No such file or directory PS: Seems mac updates have disturbed things. I see segmentation fault 11. |
|
I'm unsure what caused the large changes. They should definitely not be present.
We swapped to gotestsum recently for better test output in CI. It should be automatically installed if you run |
knylander-grafana
left a comment
There was a problem hiding this comment.
Thank you for adding doc.
Better PR is here. |
|
Closing in favor of: #2538 |
[Ingester] Add ability to completely flush on shutdown(WIP)
Which issue(s) this PR fixes:
Fixes #699
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]