Implement polling tenants concurrently#3647
Conversation
db642c7 to
ee6de7c
Compare
|
I'm aware of a race in the tests when we check the error message if we exceed the threshold during polling. Since now we poll concurrently, we can't know which error will be the final error, and so an adjustment needs to be made to assert the error is one of n, rather than one specific. I'll follow up. In the meantime, feedback welcome. My thought to follow this up this PR is to modify the tenant loop, so we loop over tenants which are sorted by some metric, like block size, or last poll duration. This way we can start work on the ones that require the most effort first. |
joe-elliott
left a comment
There was a problem hiding this comment.
this lgtm. not approving b/c it's still marked draft
9f3c737 to
e642282
Compare
|
I've modified the error handling to be consistent, which required keeping of a |
9ad2b78 to
8a99b99
Compare
joe-elliott
left a comment
There was a problem hiding this comment.
looking good. one thought.
can we add the new config option to the docs?
8a99b99 to
b83b0a2
Compare
cd02cba to
52fd063
Compare
Here we make changes to the error handling to account for the additional complexity brought with the tenant concurrency. This changes the behavior of the blocklist_poll_tolerate_consecutive_errors configuration by applying to a single tenant, which instructs the poller to retry until the threshold is met. A new configuration parameter blocklist_poll_tolerate_tenant_failures has been added to account for the number of failing tenants that will be tolerated. This allows parts of the old behavior scoped to a single tenant, but also accounts for a more global picture. This means that a single failing tenant by default will not stop the entire polling process. Tests have been updated to account for this additional logic.
abef2bd to
2daab09
Compare
What this PR does:
Here we implement basic polling for tenants concurrently. This is another step
towards making the polling more efficient. The next step will be to implement
a weighted polling strategy to poll tenants by some priority. For now, I think
this is a reasonable start and should stand on its own.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]