[Rhythm] Live-store: downscale endpoint#5600
Conversation
9ece150 to
f632249
Compare
| // | ||
| // - DELETE | ||
| // Sets partition back from INACTIVE to ACTIVE state. | ||
| func (s *LiveStore) PreparePartitionDownscaleHandler(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Is there a risk of multiple parallel invocations causing issues here? IE in the HTTP POST where the partition state might change between line 53 and line 60?
There was a problem hiding this comment.
This endpoint is called by the rollout-operator which is run as a singleton. This handler is not thread-safe, but I'm not sure it's a real problem
5038522 to
6771900
Compare
| pvc_storage_class: error 'Must specify a live-store pvc storage class', | ||
| replicas: 0, | ||
| max_unavailable: 25, | ||
| downscale_delay: '30m', |
There was a problem hiding this comment.
I picked the value based on default in query_backend_after (30 min). Possibly, need to increase to 45 min to avoid edge cases?
There was a problem hiding this comment.
Updated to 35m
93d945e to
7553798
Compare
990f442 to
824f995
Compare
99e955f to
d7961f9
Compare
| live_store: { | ||
| partition_ring: { | ||
| delete_inactive_partition_after: $._config.live_store.downscale_delay, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Can you change this to tempo_live_store_config (line 112) so the blast radius is smaller? Otherwise this is propagated to all components
There was a problem hiding this comment.
Changed, should be better now
2c2f37d to
c28ef4b
Compare
|
+ review and doc fixes |
| if !primary then { | ||
| // Disable updates of ReplicaTemplate/live-store from non-primary (zone-b) statefulsets. | ||
| 'grafana.com/rollout-mirror-replicas-from-resource-write-back': 'false', | ||
| } |
There was a problem hiding this comment.
Can you briefly explain how this works? I thought that replicas were updated from the ReplicaTemplate to the StatefulSet. Does the ReplicaTemplate follow the replicas in one zone instead?
There was a problem hiding this comment.
I copied this from mimir's implementation.
Here PR where it was introduced: grafana/rollout-operator#169
My understanding is that if it sets to true (default), then changes in statefulset's replica count will write back to replicatemplate. If understand correctly, then changes to zone-a's sts (due to autoscaling, for example) changes replicatemplate -> changes also zone-b
There was a problem hiding this comment.
What's the strategy for updating replicas then? Update the primary sts?
c28ef4b to
4eaf471
Compare
|
+ rebase from latest main to resolve conflicts |
What this PR does: adds endpoint
/live-store/prepare-partition-downscaleto prepare for downscaling. Live-store downscale flow:/live-store/prepare-partition-downscaleINACTIVEstate. It is in read-only mode and does not receive new records./live-store/prepare-downscalewhich allows live-store to remove itself from partition ownersThe implementation of the endpoint is similar to Mimir ingester's endpoint with slight changes
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]