Auto-exclude Velero namespace from backups instead of hard error#9615
Auto-exclude Velero namespace from backups instead of hard error#9615achinmishra wants to merge 3 commits intovelero-io:mainfrom
Conversation
shubham-pampattiwar
left a comment
There was a problem hiding this comment.
Thanks for working on this @achinmishra, preventing accidental self-backup is definitely something worth addressing.
I do have some concerns about the overall approach. The big one is that this is a breaking change for the default backup flow. When IncludedNamespaces is empty, Velero treats that as "back up everything." So after this PR, a plain velero backup create my-backup with no namespace flags would fail validation. Every existing user would need to go add --exclude-namespaces=velero to their backups and schedules after upgrade. That's pretty disruptive.
The change to defaultBackup() adding .ExcludedNamespaces(velerov1api.DefaultNamespace) kind of masks this, it's why ~17 test cases needed ExcludedNamespaces added to their expected specs, and why the assertion in Test_prepareBackupRequest_BackupStorageLocation had to be loosened from an exact error check to assert.Contains. When you have to modify the shared test infrastructure to make your change pass, that's usually a signal the change has a larger blast radius than intended.
I think the goal here is right, but the approach needs some rethinking. A few options worth considering:
- Auto-exclude the Velero namespace during backup processing and log a warning, so nothing breaks
- Only error when the Velero namespace is explicitly listed in IncludedNamespaces, not when the list is empty
- Treat it as a warning rather than a hard validation error
Happy to discus, what do you guys think? @achinmishra @Lyndon-Li
I also think there is no need to error out when the velero namespace is included. |
|
@shubham-pampattiwar @blackpiglet @achinmishra |
|
@shubham-pampattiwar @blackpiglet @Lyndon-Li Thanks for the feedback everyone, really appreciate the thoughtful discussion here. I've been thinking through both solutions and wanted to lay out few things as I see them: Hard validation error: It's explicit — the user knows exactly what's wrong and gets an actionable fix. It's not technically a breaking change since backups that include the "velero" namespace already fail during execution. We're just surfacing the failure earlier with a clear message but it does disrupt existing workflows — "velero backup create my-backup" with no flags would start failing after upgrade, and every existing schedule without --exclude-namespaces would need updating. Auto-exclude + warning: It would be non-breaking, existing commands and schedules will keep working. There's already a precedent for this pattern in the codebase — we do the same thing for namespaces labeled "velero.io/exclude-from-backup=true" (they get silently appended to "ExcludedNamespaces"). The downside is it's a silent behavior change. If someone explicitly passes "--include-namespaces=velero", quietly excluding it feels wrong — I am guessing the users have specifically asked for it Having said that, I'm leaning toward going with auto-exclude + warning to keep things non-breaking. I'll also drop the "component=velero" label lookup as @shubham-pampattiwar suggested — "request.Backup.Namespace" is the only reliable way to identify the velero namespace across install methods. Let me know if you all agree and I'll update the PR. |
ed937d5 to
9f32e6b
Compare
9f32e6b to
ea25c18
Compare
Use Kubernetes’ standard mechanism: read the pod’s own namespace from |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9615 +/- ##
=======================================
Coverage 60.97% 60.98%
=======================================
Files 384 384
Lines 36609 36632 +23
=======================================
+ Hits 22324 22341 +17
- Misses 12677 12681 +4
- Partials 1608 1610 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8133faf to
8fa687d
Compare
@kaovilai seems like both should be ok. Do you have a strong preference using one over another? From what I gathered, I am trying to reuse the existing pattern of the usage. |
|
If there exists a lot of "this is velero namespace" from backup.Namespace, could keep current change. |
5d85e1f to
73067cf
Compare
Velero does not support backing up its own namespace. Previously this would fail during execution with no clear signal at request time. This change auto-excludes the Velero namespace during backup preparation: - When included via wildcard or empty includes (all namespaces): adds the Velero namespace to ExcludedNamespaces with a warning log. - When explicitly listed alongside other namespaces: removes it from IncludedNamespaces, adds to ExcludedNamespaces, and logs a warning. - When it is the only included namespace: returns a validation error to prevent a silent empty backup. Signed-off-by: Achin Mishra <achinmishra@meta.com>
73067cf to
5568ccf
Compare
|
need lint fix |
Velero does not support backing up its own namespace. Previously this would fail during execution with no clear signal at request time. This change auto-excludes the Velero namespace during backup preparation: - When included via wildcard or empty includes (all namespaces): adds the Velero namespace to ExcludedNamespaces with a warning log. - When explicitly listed alongside other namespaces: removes it from IncludedNamespaces, adds to ExcludedNamespaces, and logs a warning. - When it is the only included namespace: returns a validation error to prevent a silent empty backup. - Avoids duplicate entries in ExcludedNamespaces if the namespace was already excluded by prior logic (e.g. exclude-from-backup label). Signed-off-by: Achin Mishra <urs.achin.007@gmail.com>
e5b3392 to
9bcf1c5
Compare
I am not sure, but I think including all namespaces never works because Velero cannot be used to backup/restore system states like CSI, CNI, because the running of Velero requiring them as the precondition.
This still doesn't address my concern, if we exclude something on behave of users, once this is any mistake, Velero will silently break the SLA and users would not notice this until one day disaster happens and they see that something that should be in the backup has been missed. That is to say, it is making a severe problem from server just for fixing a problem that should be handled from client. |
|
|
||
| // Auto-exclude the Velero namespace from the backup. | ||
| // Velero does not support backing up its own namespace. | ||
| veleroNs := request.Backup.Namespace |
There was a problem hiding this comment.
I see two problems for making the code here:
- This code is called by every reconcile, it would be a performance bottleneck for large scale envs.
- At present, we have the filter things in the collection stage, even if we want to auto modify the filter result, we should do it in the same collection stage
@Lyndon-Li I think you raise a fair point about explicit intent. How about this compromise: when a user explicitly lists the velero namespace in |
| hadExplicitIncludes := len(request.Spec.IncludedNamespaces) > 0 | ||
| request.Spec.IncludedNamespaces = filtered | ||
|
|
||
| // If the Velero namespace was the only included namespace, the backup would |
There was a problem hiding this comment.
@achinmishra minor note: after #9684, empty IncludedNamespaces gets normalized to ["*"] before this code runs, so hadExplicitIncludes is always true. The logic still works correctly because the wildcard string stays in the filtered list, but the variable name is misleading. Consider renaming to something like hadIncludesBeforeFiltering, or adding a brief comment explaining the interaction with the normalization above
| }, | ||
| { | ||
| name: "character class wildcard matching velero namespace auto-excludes it", | ||
| backup: builder.ForBackup(velerov1api.DefaultNamespace, "backup-1").Phase(velerov1api.BackupPhaseReadyToStart).IncludedNamespaces("[vV]elero").Result(), |
There was a problem hiding this comment.
@achinmishra the character class test case ([vV]elero) sets expectValidationError: true, but that error comes from the existing namespace name validation, not from the auto-exclude logic. It might be clearer to either add a comment explaining why the validation error is expected, or split it into two separate test cases so each behavior is tested independently.
| expectedIncludedNamespaces: []string{"default", "kube-system"}, | ||
| expectedExcludedNamespaces: []string{"velero"}, | ||
| }, | ||
| } |
There was a problem hiding this comment.
@achinmishra could you add a test case where Velero is installed in a custom namespace (e.g. builder.ForBackup("openshift-adp", "backup-1"))? This would verify the auto-exclude works correctly for non-default installations like OADP.
|
@shubham-pampattiwar @achinmishra Here are my suggestions:
|
Thank you for contributing to Velero!
Summary
Auto-exclude the Velero namespace from backups with a warning instead of returning a hard validation error. This addresses reviewer feedback that a hard error would be a breaking change (plain
velero backup createwould fail).Approach:
prepareBackupRequest, if the Velero namespace (determined byrequest.Backup.Namespace) would be included by the backup's namespace filter, it is automatically appended toExcludedNamespacesbackup_controller.gowhere namespaces with thevelero.io/exclude-from-backup=truelabel are silently auto-excludedChanges from the original approach:
component=velerolabel lookup (unreliable across install methods per reviewer feedback) — onlyrequest.Backup.Namespaceis usedTest coverage includes: explicit include, implicit include (all namespaces), already excluded, glob patterns (include and exclude), single-char wildcards, and character class patterns.
Does your change fix a particular issue?
Fixes #9573
Please indicate you've done the following:
make new-changelog) or comment/kind changelog-not-requiredon this PR.site/content/docs/main.