Sort traces on flush to ensure consistent payloads in the backend#606
Merged
joe-elliott merged 9 commits intografana:masterfrom Mar 23, 2021
Merged
Sort traces on flush to ensure consistent payloads in the backend#606joe-elliott merged 9 commits intografana:masterfrom
joe-elliott merged 9 commits intografana:masterfrom
Conversation
…letion, and logging
…ix to metric on wal deletion errors
…nic, handle error
…less have to be combined by the compactor
joe-elliott
approved these changes
Mar 23, 2021
joe-elliott
approved these changes
Mar 23, 2021
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
We have noticed that with RF3, the compactors seem to be combining more traces than expected, i.e. all ingesters should be receiving identical payloads and flushing identical bytes to the backend, and other than for long-running traces, the compactors should not have to recombine much. Researching shows that the assumption that ingesters are flushing identical bytes is incorrect. It was observed that the ingesters all flushed the same data in total for a trace, but internally the batches were not in the same order. This leads to differing bytes and to be recombined by the compactor.
This PR internally sorts the traces as they are flushed by the ingesters. The sort order doesn't really matter, as long as it is consistent. Right now it sorts bottom up by span start time, then span id.
Also considered possible causes for the different batch order: It is possible that
ring.DoBatchin the distributor is not issuing the batches for a given trace in the same order to each ingester. Tried sortingindexesto fix but it did not solve the issue.Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]