Skip to content

fix: fail backup early when CSI VolumeSnapshot error persists in async poll#9727

Draft
priyansh17 wants to merge 4 commits intovelero-io:mainfrom
priyansh17:user/priyansh/issue-9674
Draft

fix: fail backup early when CSI VolumeSnapshot error persists in async poll#9727
priyansh17 wants to merge 4 commits intovelero-io:mainfrom
priyansh17:user/priyansh/issue-9674

Conversation

@priyansh17
Copy link
Copy Markdown
Collaborator

Please add a summary of your change

  • This bug can be reproduced by triggering a scenario where cloud snapshot provisioning fails (e.g., make Azure Disk run out of available capacity or permissions).
  • The problematic code path is in pkg/backup/actions/csi/volumesnapshot_action.go, in the Progress() method of volumeSnapshotBackupItemAction:
if boolptr.IsSetToTrue(vs.Status.ReadyToUse) {
    // ... continue to check VSC
} else if vs.Status.Error != nil {
    errorMessage := ""
    if vs.Status.Error.Message != nil {
        errorMessage = *vs.Status.Error.Message
    }
    p.log.Warnf("VolumeSnapshot has a temporary error %s. Snapshot controller will retry later.", errorMessage)
    return progress, nil  // <-- Returns NOT completed, no error propagated
}
  • This returns progress.Completed=false even for permanent errors, so the backup controller keeps polling Progress() for the full itemOperationTimeout (could be hours), instead of failing immediately.
  • The synchronous code path (WaitUntilVSCHandleIsReady) properly fails on timeout, but async Progress() does not.

The fix is to wait for a certain period of time before failing it/ skipping it.

Does your change fix a particular issue?

Fixes #(issue)
#9674

Please indicate you've done the following:

…CSISnapshotTimeout

Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…ation

Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…r handling in VolumeSnapshot progress

Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.00%. Comparing base (c5fa50b) to head (a6256ae).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9727      +/-   ##
==========================================
+ Coverage   60.97%   61.00%   +0.02%     
==========================================
  Files         384      384              
  Lines       36609    36629      +20     
==========================================
+ Hits        22324    22344      +20     
  Misses      12677    12677              
  Partials     1608     1608              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant