Skip to content

backup-finalizer: do not set backup phase to Completed in-memory before PutBackupMetadata succeeds #9645

@kaovilai

Description

@kaovilai

Problem

In backup_finalizer_controller.go, the Reconcile function sets backup.Status.Phase = BackupPhaseCompleted in-memory before calling PutBackupMetadata. A defer registered at the top of the function unconditionally patches the backup CR to the cluster on any return path — including error returns.

This means: when PutBackupMetadata fails (e.g. due to object lock / immutability blocking the overwrite of velero-backup.json from Finalizing to Completed), the function returns an error and requeues — but the deferred patch has already written Phase=Completed to the cluster. Object storage retains Finalizing state while the cluster shows Completed.

Code path (as of main)

https://github.com/vmware-tanzu/velero/blob/main/pkg/controller/backup_finalizer_controller.go

Ordering in Reconcile():

// 1. defer registered — fires on ALL return paths
defer func() {
    client.RetryOnErrorMaxBackOff(r.resourceTimeout, func() error {
        return r.client.Patch(ctx, backup, kbclient.MergeFrom(original))
    })
}()

// ... (much later) ...

// 2. Phase set in-memory BEFORE upload
backup.Status.Phase = velerov1api.BackupPhaseCompleted  // ~L196

// 3. Upload attempted
err = backupStore.PutBackupMetadata(backup.Name, backupJSON)  // ~L226
if err != nil {
    return ctrl.Result{}, errors.Wrap(err, "error uploading backup json")
    // ^^^ deferred patch fires here with Phase=Completed already set
}

Fix

Move the in-memory phase assignment to after PutBackupMetadata (and PutBackupContents) succeed. This way, if either upload fails, the deferred patch will see the original Finalizing phase and patch the cluster back to (or keep it at) Finalizing, allowing the reconcile requeue to retry.

// encode and upload first
err = backupStore.PutBackupMetadata(backup.Name, backupJSON)
if err != nil {
    return ctrl.Result{}, errors.Wrap(err, "error uploading backup json")
}
if len(operations) > 0 {
    err = backupStore.PutBackupContents(backup.Name, outBackupFile)
    if err != nil {
        return ctrl.Result{}, errors.Wrap(err, "error uploading backup final contents")
    }
}

// Only set phase after both uploads succeed
switch backup.Status.Phase {
case velerov1api.BackupPhaseFinalizing:
    backup.Status.Phase = velerov1api.BackupPhaseCompleted
    ...
case velerov1api.BackupPhaseFinalizingPartiallyFailed:
    backup.Status.Phase = velerov1api.BackupPhasePartiallyFailed
    ...
}

Note: recordBackupMetrics and backup.Status.CompletionTimestamp should also move after the uploads, or be made idempotent, since they currently sit between the phase set and the upload.

Acceptance criteria

  • When PutBackupMetadata returns an error in backup-finalizer, the cluster backup CR must not be patched to Completed or PartiallyFailed — it should remain in Finalizing / FinalizingPartiallyFailed so the reconcile can retry.
  • Same applies to PutBackupContents failure.
  • Unit tests covering the deferred patch phase value when PutBackupMetadata returns an error.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions