Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero v1.12.3 Fail to Ignore Resources in Terminating Phase #7777

Open
nwakalka opened this issue May 7, 2024 · 2 comments
Open

Velero v1.12.3 Fail to Ignore Resources in Terminating Phase #7777

nwakalka opened this issue May 7, 2024 · 2 comments
Assignees
Labels
area/fs-backup Icebox We see the value, but it is not slated for the next couple releases. Needs investigation

Comments

@nwakalka
Copy link

nwakalka commented May 7, 2024

What steps did you take and what happened:

  • Running a suite of E2E test cases, where we are testing regular backups and restore as well as cluster backups.
  • As soon as regular backup and restore is completed, ( where namespace is in termination phase) and cluster backup is triggered, which backed up terminating namespace.
  • Upon executing Velero for cluster backup, it failed to exclude resources, specifically namespaces, that were in the terminating phase.

What did you expect to happen:

  • Expecting to Ignore Resources which are terminating phase.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
[root@runner-jgnwu6xf-project-14702-concurrent-0 tmp]# kubectl exec -it mcs-velero-69b6f59bdc-tr7p9 -c mcs-velero -n mcs-backup -- /velero backup logs cb-e2e-klu-tgphvd --insecure-skip-tls-verify|grep level=error
time="2024-04-26T09:55:26Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim-new\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=new-label-app-845dbc7d96-t7h46
time="2024-04-26T09:55:27Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=label-app-585cccb667-tjbtn
time="2024-04-26T09:55:28Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim-new\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=new-label-app-845dbc7d96-t7h46
[root@runner-jgnwu6xf-project-14702-concurrent-0 tmp]# kubectl exec -it mcs-velero-69b6f59bdc-tr7p9 -c mcs-velero -n mcs-backup -- /velero backup describe cb-e2e-klu-tgphvd

  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Steps Taken:

  • Initiated a cluster backup using Velero.
  • During the backup process, observed that an old namespace and its associated pod were in the terminating phase.
  • Velero attempted to resolve the pod and identified a Persistent Volume (PV) mount associated with it.
  • Velero then attempted to grab the Persistent Volume Claim (PVC) referenced in the PV.
  • However, the attempt to grab the PVC failed because the namespace associated with it had already been terminated and was no longer available.

What Happened:

The cluster backup was initiated while certain resources, including a namespace and its associated pod, were still in the terminating phase. Velero proceeded with the backup process and attempted to resolve the resources. It successfully identified the PV mount associated with the pod but encountered a failure when attempting to grab the PVC referenced in the PV. This failure occurred because the namespace, to which the PVC belonged, had already been terminated by that time. As a result, Velero was unable to complete the backup process for the PVC, leading to potential inconsistencies in the backup data.

Environment:

  • Velero version (use velero version): v1.12.3

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@qiuming-best
Copy link
Contributor

If you back up one deleting resources namespace, the error reported by Velero is as expected, we should not ignore the errors

@blackpiglet
Copy link
Contributor

blackpiglet commented May 9, 2024

I agree with @qiuming-best.
The reason is Velero cannot understand the k8s resource's dependency.
Velero collects the backup k8s resources by the alphabet order in most cases.

As a result, Velero can skip the resources already having a Deletion Timestamp, but it cannot understand the namespace-scoped resource's namespace's Deletion Timestamp meaning.

@qiuming-best qiuming-best added the Icebox We see the value, but it is not slated for the next couple releases. label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fs-backup Icebox We see the value, but it is not slated for the next couple releases. Needs investigation
Projects
None yet
Development

No branches or pull requests

3 participants