-
Notifications
You must be signed in to change notification settings - Fork 1.5k
backup-finalizer: do not set backup phase to Completed in-memory before PutBackupMetadata succeeds #9645
Description
Problem
In backup_finalizer_controller.go, the Reconcile function sets backup.Status.Phase = BackupPhaseCompleted in-memory before calling PutBackupMetadata. A defer registered at the top of the function unconditionally patches the backup CR to the cluster on any return path — including error returns.
This means: when PutBackupMetadata fails (e.g. due to object lock / immutability blocking the overwrite of velero-backup.json from Finalizing to Completed), the function returns an error and requeues — but the deferred patch has already written Phase=Completed to the cluster. Object storage retains Finalizing state while the cluster shows Completed.
Code path (as of main)
Ordering in Reconcile():
// 1. defer registered — fires on ALL return paths
defer func() {
client.RetryOnErrorMaxBackOff(r.resourceTimeout, func() error {
return r.client.Patch(ctx, backup, kbclient.MergeFrom(original))
})
}()
// ... (much later) ...
// 2. Phase set in-memory BEFORE upload
backup.Status.Phase = velerov1api.BackupPhaseCompleted // ~L196
// 3. Upload attempted
err = backupStore.PutBackupMetadata(backup.Name, backupJSON) // ~L226
if err != nil {
return ctrl.Result{}, errors.Wrap(err, "error uploading backup json")
// ^^^ deferred patch fires here with Phase=Completed already set
}
Fix
Move the in-memory phase assignment to after PutBackupMetadata (and PutBackupContents) succeed. This way, if either upload fails, the deferred patch will see the original Finalizing phase and patch the cluster back to (or keep it at) Finalizing, allowing the reconcile requeue to retry.
// encode and upload first
err = backupStore.PutBackupMetadata(backup.Name, backupJSON)
if err != nil {
return ctrl.Result{}, errors.Wrap(err, "error uploading backup json")
}
if len(operations) > 0 {
err = backupStore.PutBackupContents(backup.Name, outBackupFile)
if err != nil {
return ctrl.Result{}, errors.Wrap(err, "error uploading backup final contents")
}
}
// Only set phase after both uploads succeed
switch backup.Status.Phase {
case velerov1api.BackupPhaseFinalizing:
backup.Status.Phase = velerov1api.BackupPhaseCompleted
...
case velerov1api.BackupPhaseFinalizingPartiallyFailed:
backup.Status.Phase = velerov1api.BackupPhasePartiallyFailed
...
}Note: recordBackupMetrics and backup.Status.CompletionTimestamp should also move after the uploads, or be made idempotent, since they currently sit between the phase set and the upload.
Acceptance criteria
- When
PutBackupMetadatareturns an error inbackup-finalizer, the cluster backup CR must not be patched toCompletedorPartiallyFailed— it should remain inFinalizing/FinalizingPartiallyFailedso the reconcile can retry. - Same applies to
PutBackupContentsfailure. - Unit tests covering the deferred patch phase value when
PutBackupMetadatareturns an error.