Skip to content

Auto-exclude Velero namespace from backups instead of hard error#9615

Open
achinmishra wants to merge 3 commits intovelero-io:mainfrom
achinmishra:achinmishra/velero-ns-backup-validation
Open

Auto-exclude Velero namespace from backups instead of hard error#9615
achinmishra wants to merge 3 commits intovelero-io:mainfrom
achinmishra:achinmishra/velero-ns-backup-validation

Conversation

@achinmishra
Copy link
Copy Markdown

@achinmishra achinmishra commented Mar 13, 2026

Thank you for contributing to Velero!

Summary

Auto-exclude the Velero namespace from backups with a warning instead of returning a hard validation error. This addresses reviewer feedback that a hard error would be a breaking change (plain velero backup create would fail).

Approach:

  • During prepareBackupRequest, if the Velero namespace (determined by request.Backup.Namespace) would be included by the backup's namespace filter, it is automatically appended to ExcludedNamespaces
  • A warning is logged so the user is aware of the auto-exclusion
  • This follows the existing precedent at backup_controller.go where namespaces with the velero.io/exclude-from-backup=true label are silently auto-excluded

Changes from the original approach:

  • Replaced hard validation error with auto-exclude + warning log
  • Removed the component=velero label lookup (unreliable across install methods per reviewer feedback) — only request.Backup.Namespace is used
  • Updated tests to verify auto-exclusion behavior instead of validation errors
  • Updated changelog wording

Test coverage includes: explicit include, implicit include (all namespaces), already excluded, glob patterns (include and exclude), single-char wildcards, and character class patterns.

Does your change fix a particular issue?

Fixes #9573

Please indicate you've done the following:

@github-actions github-actions Bot requested a review from blackpiglet March 13, 2026 06:31
@github-actions github-actions Bot requested a review from ywk253100 March 13, 2026 06:31
@achinmishra achinmishra marked this pull request as draft March 13, 2026 06:33
@achinmishra achinmishra marked this pull request as ready for review March 13, 2026 18:27
@github-actions github-actions Bot requested review from reasonerjt and sseago March 13, 2026 18:27
@reasonerjt reasonerjt requested review from Lyndon-Li and removed request for blackpiglet, reasonerjt and ywk253100 March 15, 2026 16:09
Comment thread pkg/controller/backup_controller.go
Copy link
Copy Markdown
Collaborator

@shubham-pampattiwar shubham-pampattiwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @achinmishra, preventing accidental self-backup is definitely something worth addressing.

I do have some concerns about the overall approach. The big one is that this is a breaking change for the default backup flow. When IncludedNamespaces is empty, Velero treats that as "back up everything." So after this PR, a plain velero backup create my-backup with no namespace flags would fail validation. Every existing user would need to go add --exclude-namespaces=velero to their backups and schedules after upgrade. That's pretty disruptive.

The change to defaultBackup() adding .ExcludedNamespaces(velerov1api.DefaultNamespace) kind of masks this, it's why ~17 test cases needed ExcludedNamespaces added to their expected specs, and why the assertion in Test_prepareBackupRequest_BackupStorageLocation had to be loosened from an exact error check to assert.Contains. When you have to modify the shared test infrastructure to make your change pass, that's usually a signal the change has a larger blast radius than intended.

I think the goal here is right, but the approach needs some rethinking. A few options worth considering:

  • Auto-exclude the Velero namespace during backup processing and log a warning, so nothing breaks
  • Only error when the Velero namespace is explicitly listed in IncludedNamespaces, not when the list is empty
  • Treat it as a warning rather than a hard validation error

Happy to discus, what do you guys think? @achinmishra @Lyndon-Li

Comment thread pkg/controller/backup_controller.go Outdated
Comment thread pkg/controller/backup_controller.go Outdated
@blackpiglet
Copy link
Copy Markdown
Contributor

I think the goal here is right, but the approach needs some rethinking. A few options worth considering:

  • Auto-exclude the Velero namespace during backup processing and log a warning, so nothing breaks
  • Only error when the Velero namespace is explicitly listed in IncludedNamespaces, not when the list is empty
  • Treat it as a warning rather than a hard validation error

I also think there is no need to error out when the velero namespace is included.
I vote for auto-exclude and log it as a warning on any cases, including wildcard and explicitly specified.

@Lyndon-Li
Copy link
Copy Markdown
Contributor

@shubham-pampattiwar @blackpiglet @achinmishra
Auto-exclude the Velero namespace may not be easy to implement and may be risky -- once it identifies wrong namespace, the backup is not as expected by the creator, but it still succeeds, so the creator doesn't know this situation.
And I don't think error out when Velero namespace is included is a breaking change because it doesn't work at all at present when Velero namespace is included; or even though the backup with includeNamespace=* could be submitted successfully, the backup result will be always fail. We just move the error report ahead.

@achinmishra
Copy link
Copy Markdown
Author

@shubham-pampattiwar @blackpiglet @Lyndon-Li

Thanks for the feedback everyone, really appreciate the thoughtful discussion here. I've been thinking through both solutions and wanted to lay out few things as I see them:

Hard validation error: It's explicit — the user knows exactly what's wrong and gets an actionable fix. It's not technically a breaking change since backups that include the "velero" namespace already fail during execution. We're just surfacing the failure earlier with a clear message but it does disrupt existing workflows — "velero backup create my-backup" with no flags would start failing after upgrade, and every existing schedule without --exclude-namespaces would need updating.

Auto-exclude + warning: It would be non-breaking, existing commands and schedules will keep working. There's already a precedent for this pattern in the codebase — we do the same thing for namespaces labeled "velero.io/exclude-from-backup=true" (they get silently appended to "ExcludedNamespaces"). The downside is it's a silent behavior change. If someone explicitly passes "--include-namespaces=velero", quietly excluding it feels wrong — I am guessing the users have specifically asked for it

Having said that, I'm leaning toward going with auto-exclude + warning to keep things non-breaking. I'll also drop the "component=velero" label lookup as @shubham-pampattiwar suggested — "request.Backup.Namespace" is the only reliable way to identify the velero namespace across install methods.

Let me know if you all agree and I'll update the PR.

@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch from ed937d5 to 9f32e6b Compare March 26, 2026 17:32
@achinmishra achinmishra changed the title Validate that Velero namespace is not included in backup Auto-exclude Velero namespace from backups instead of hard error Mar 26, 2026
@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch from 9f32e6b to ea25c18 Compare March 26, 2026 17:40
@kaovilai
Copy link
Copy Markdown
Collaborator

"request.Backup.Namespace" is the only reliable way to identify the velero namespace across install methods.

Use Kubernetes’ standard mechanism: read the pod’s own namespace from /var/run/secrets/kubernetes.io/serviceaccount/namespace

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.98%. Comparing base (df2686c) to head (5965bcf).

Files with missing lines Patch % Lines
pkg/controller/backup_controller.go 82.60% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9615   +/-   ##
=======================================
  Coverage   60.97%   60.98%           
=======================================
  Files         384      384           
  Lines       36609    36632   +23     
=======================================
+ Hits        22324    22341   +17     
- Misses      12677    12681    +4     
- Partials     1608     1610    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch 2 times, most recently from 8133faf to 8fa687d Compare March 31, 2026 02:36
Comment thread pkg/controller/backup_controller.go
@achinmishra
Copy link
Copy Markdown
Author

"request.Backup.Namespace" is the only reliable way to identify the velero namespace across install methods.

Use Kubernetes’ standard mechanism: read the pod’s own namespace from /var/run/secrets/kubernetes.io/serviceaccount/namespace

@kaovilai seems like both should be ok. Do you have a strong preference using one over another? From what I gathered, I am trying to reuse the existing pattern of the usage.

@kaovilai
Copy link
Copy Markdown
Collaborator

kaovilai commented Apr 7, 2026

If there exists a lot of "this is velero namespace" from backup.Namespace, could keep current change.

@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch from 5d85e1f to 73067cf Compare April 14, 2026 19:37
Velero does not support backing up its own namespace. Previously this
would fail during execution with no clear signal at request time.

This change auto-excludes the Velero namespace during backup preparation:
- When included via wildcard or empty includes (all namespaces): adds
  the Velero namespace to ExcludedNamespaces with a warning log.
- When explicitly listed alongside other namespaces: removes it from
  IncludedNamespaces, adds to ExcludedNamespaces, and logs a warning.
- When it is the only included namespace: returns a validation error
  to prevent a silent empty backup.

Signed-off-by: Achin Mishra <achinmishra@meta.com>
@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch from 73067cf to 5568ccf Compare April 14, 2026 19:38
@kaovilai
Copy link
Copy Markdown
Collaborator

need lint fix

Comment thread pkg/controller/backup_controller_test.go
Comment thread pkg/controller/backup_controller.go Outdated
Velero does not support backing up its own namespace. Previously this
would fail during execution with no clear signal at request time.

This change auto-excludes the Velero namespace during backup preparation:
- When included via wildcard or empty includes (all namespaces): adds
  the Velero namespace to ExcludedNamespaces with a warning log.
- When explicitly listed alongside other namespaces: removes it from
  IncludedNamespaces, adds to ExcludedNamespaces, and logs a warning.
- When it is the only included namespace: returns a validation error
  to prevent a silent empty backup.
- Avoids duplicate entries in ExcludedNamespaces if the namespace was
  already excluded by prior logic (e.g. exclude-from-backup label).

Signed-off-by: Achin Mishra <urs.achin.007@gmail.com>
@achinmishra achinmishra force-pushed the achinmishra/velero-ns-backup-validation branch from e5b3392 to 9bcf1c5 Compare April 14, 2026 21:57
@Lyndon-Li
Copy link
Copy Markdown
Contributor

So after this PR, a plain velero backup create my-backup with no namespace flags would fail validation

I am not sure, but I think including all namespaces never works because Velero cannot be used to backup/restore system states like CSI, CNI, because the running of Velero requiring them as the precondition.

Auto-exclude + warning

This still doesn't address my concern, if we exclude something on behave of users, once this is any mistake, Velero will silently break the SLA and users would not notice this until one day disaster happens and they see that something that should be in the backup has been missed. That is to say, it is making a severe problem from server just for fixing a problem that should be handled from client.


// Auto-exclude the Velero namespace from the backup.
// Velero does not support backing up its own namespace.
veleroNs := request.Backup.Namespace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two problems for making the code here:

  1. This code is called by every reconcile, it would be a performance bottleneck for large scale envs.
  2. At present, we have the filter things in the collection stage, even if we want to auto modify the filter result, we should do it in the same collection stage

@shubham-pampattiwar
Copy link
Copy Markdown
Collaborator

So after this PR, a plain velero backup create my-backup with no namespace flags would fail validation

I am not sure, but I think including all namespaces never works because Velero cannot be used to backup/restore system states like CSI, CNI, because the running of Velero requiring them as the precondition.

Auto-exclude + warning

This still doesn't address my concern, if we exclude something on behave of users, once this is any mistake, Velero will silently break the SLA and users would not notice this until one day disaster happens and they see that something that should be in the backup has been missed. That is to say, it is making a severe problem from server just for fixing a problem that should be handled from client.

@Lyndon-Li I think you raise a fair point about explicit intent. How about this compromise: when a user explicitly lists the velero namespace in --include-namespaces, we return a hard error so they know it's not supported. We only auto-exclude with a warning when it's implicitly included via the default "all namespaces" or a wildcard pattern. That way we're never silently overriding something the user specifically asked for. Would that address your concern?

hadExplicitIncludes := len(request.Spec.IncludedNamespaces) > 0
request.Spec.IncludedNamespaces = filtered

// If the Velero namespace was the only included namespace, the backup would
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achinmishra minor note: after #9684, empty IncludedNamespaces gets normalized to ["*"] before this code runs, so hadExplicitIncludes is always true. The logic still works correctly because the wildcard string stays in the filtered list, but the variable name is misleading. Consider renaming to something like hadIncludesBeforeFiltering, or adding a brief comment explaining the interaction with the normalization above

},
{
name: "character class wildcard matching velero namespace auto-excludes it",
backup: builder.ForBackup(velerov1api.DefaultNamespace, "backup-1").Phase(velerov1api.BackupPhaseReadyToStart).IncludedNamespaces("[vV]elero").Result(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achinmishra the character class test case ([vV]elero) sets expectValidationError: true, but that error comes from the existing namespace name validation, not from the auto-exclude logic. It might be clearer to either add a comment explaining why the validation error is expected, or split it into two separate test cases so each behavior is tested independently.

expectedIncludedNamespaces: []string{"default", "kube-system"},
expectedExcludedNamespaces: []string{"velero"},
},
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achinmishra could you add a test case where Velero is installed in a custom namespace (e.g. builder.ForBackup("openshift-adp", "backup-1"))? This would verify the auto-exclude works correctly for non-default installations like OADP.

@Lyndon-Li
Copy link
Copy Markdown
Contributor

Lyndon-Li commented Apr 24, 2026

@shubham-pampattiwar @achinmishra Here are my suggestions:

  1. Don't try to extract Velero filters in the core workflow, because it is going to be increasingly complex, in terms of both code and time
  2. Don't try to modify the core workflow because it should have been done at the frontend, e.g., during installation or backup submision
  3. Only try to fix it automatically for new Velero installations and do it during the installation --- add the velero.io/exclude-from-backup=true label for Velero's namespace from the installation CLI
  4. For old installations, don't do anything, so the backups will fail in the old way; or users need to add the same label manually
  5. For helm chart installation, open a similar issue to Velero helm chart repo for 3
  6. For other installations, they need to handle 3 themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error out when Velero namespace is included in the backup

5 participants