What steps did you take and what happened:
Upgrade velero from 1.4.2 to 1.5.1, controller starts printing in a loop that the backup storage location cannot be found. This has been working since pre-velero days.
What did you expect to happen:
Velero would continue working as expected
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velerotime="2020-09-17T13:16:05Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-09-17T13:16:06Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-09-17T13:16:06Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-09-17T13:17:06Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-09-17T13:17:07Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-09-17T13:17:07Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-09-17T13:17:39Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-09-17T13:17:40Z" level=error msg="Error updating backup location phase" backupstoragelocation=default controller=backupstoragelocation error="backupstoragelocations.velero.io \"default\" not found" logSource="pkg/controller/backupstoragelocation_controller.go:115"
kubectl get backupstoragelocations.velero.io default -oyamlapiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
creationTimestamp: "2019-04-26T12:10:41Z"
generation: 422143
name: default
namespace: velero
resourceVersion: "396650219"
selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default
uid: 4f840d3b-681c-11e9-a059-42010aa4003f
spec:
objectStorage:
bucket: development-velero
provider: gcp
status:
lastSyncedRevision: 98023057-3395-4e86-9576-a61613299b65
lastSyncedTime: "2020-09-17T13:19:30.516388304Z"
phase: Available
I exported the backupstoragelocation manifest after rolling back, that's why the lastSyncedTime is later than the logs shared above.
Anything else you would like to add:
Rolling back to 1.4.2 mitigates this problem. I assume this is related to the changes introduced here: https://github.com/vmware-tanzu/velero/pull/2870
Environment:
velero version): 1.5.1velero client config get features): kubectl version): v1.16.13-gke.401Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
time="2020-09-17T13:17:07Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
is not necessarily an error, as the controller loops more often than individual backup locations are checked. I think we can look into turning down the frequency of the message, or changing the wording.
However, the error is concerning. I see that you have a manifest for the default location - were you able to make any backups at all on v1.5.1?
Actually I assumed from these messages that it just doesn't work so I quickly rolled back. I tested now and triggered a manual backup - it worked as expected. So I guess this message **"Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available"** should be reworded? It makes it sounds like there isn't a backup location available at all...
That error though "Error updating backup location phase" appeared only once, I'm wondering if it has something to do with the pod shutting down during this loop or something?
I just made sure that a scheduled backup goes through as expected
Yeah, I agree that the wording could be revisited, so I'm going to leave this open to address it.
The error may have been due to the shut down, yes, though it seems like it's related to the BSL not being present in the kubernetes API server at all. However, if it's continuing to work and finding it, then perhaps it was a race condition with the pod shutdown.
Just found this issue as a new user trying to set Velero up against an AWS cluster I'm looking at sending offline to cold storage for a few weeks. Hoping to have this as a safeguard that it will come back up on the other side of that window. Have been beating my head against wondering why I couldn't complete basic setup before proceeding to attempt to use this product, but after reason this thread, perhaps I'll just move on to the documentation about scheduling and see what happens if I disregard this error that has been cycling roughly every 60 seconds for the past 30+ hours.
I ended on this github issue, because I interpreted this message, not as debug output, but as an error I'm supposed to be worried about.
Maybe that should be logged at debug level instead of info, no?
I do not think velero server version 1.5.1 is causing issue here because we are using 1.5.1 version since it's released and we are not getting this issue.
Only once we had faced this issue recently and to overcome this issue we have reinstalled velero server again and it worked.
I'm having the same issue, updated from 1.4.2 to 1.5.2 and getting the same error.
velero time="2020-10-23T19:38:34Z" level=error msg="Unable to update the request" controller=serverstatusrequest error="serverstatusrequests.velero.io \"velero-cli-kvq2h\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/internal/velero/serverstatusrequest.go:49" error.function="github.com/vmware-tanzu/velero/internal/velero.(*ServerStatus).PatchStatusProcessed" logSource="pkg/controller/server_status_request_controller.go:89" phase= serverStatusRequest=velero/velero-cli-kvq2h
velero time="2020-10-23T19:38:34Z" level=error msg="Unable to update the request" controller=serverstatusrequest error="serverstatusrequests.velero.io \"velero-cli-mf6mm\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/internal/velero/serverstatusrequest.go:49" error.function="github.com/vmware-tanzu/velero/internal/velero.(*ServerStatus).PatchStatusProcessed" logSource="pkg/controller/server_status_request_controller.go:89" phase= serverStatusRequest=velero/velero-cli-mf6mm
velero time="2020-10-23T19:38:34Z" level=error msg="Unable to update the request" controller=serverstatusrequest error="serverstatusrequests.velero.io \"velero-cli-zjtzn\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/internal/velero/serverstatusrequest.go:49" error.function="github.com/vmware-tanzu/velero/internal/velero.(*ServerStatus).PatchStatusProcessed" logSource="pkg/controller/server_status_request_controller.go:89" phase= serverStatusRequest=velero/velero-cli-zjtzn
velero time="2020-10-23T19:38:34Z" level=error msg="Unable to update the request" controller=serverstatusrequest error="serverstatusrequests.velero.io \"velero-cli-8hqg4\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/internal/velero/serverstatusrequest.go:49" error.function="github.com/vmware-tanzu/velero/internal/velero.(*ServerStatus).PatchStatusProcessed" logSource="pkg/controller/server_status_request_controller.go:89" phase= serverStatusRequest=velero/velero-cli-8hqg4
velero time="2020-10-23T19:38:38Z" level=error msg="Error patching backup location's last-synced time" backupLocation=default controller=backup-sync error="backupstoragelocations.velero.io \"default\" not found" error.file="/go/src/github.com/vmware -tanzu/velero/pkg/controller/backup_sync_controller.go:311" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:311"
For me, this is stopping Velero from working at all. When I run velero version from the client, it just times out getting the server version. I can't run any other velero commands even though the Pod does seem to be up and running.
Reverting to 1.4.2 seems to work fine.
@longwa This looks like a separate error based on the messages; they're from a different controller.
@nrb Yeah, I guess you are right. It's very similar in behavior though. I'll open a separate issue for it.
I'm getting the same issue and it spamming the log every 5-10 seconds.
It is a fresh install using the azure provider. Backups are working fine though. Just can't get rid of it filling the log.
time="2020-10-28T12:39:59Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:39:59Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-10-28T12:40:00Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:40:00Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-10-28T12:40:03Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:40:03Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-10-28T12:40:08Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:40:08Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2020-10-28T12:40:18Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:40:19Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-28T12:40:19Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
@cdobinsky This is an expected message at this point in time, not an error.
These log messages have been improved with #3047 so I am going to close this issue.
I also have this error message but manual backup execution doesn't work either and prints the same error message during backup.
Upgraded from 1.4.0.
Reverting the installation back to 1.4.0 resolves the problem. Helm chart is installed in both cases with Helm 3 with the same chart-values.yaml.
+1 on spending a few hours wondering what I was doing wrong, when in fact the system was working
+1 on this - I wasted several hours trying to figure out what is wrong.
Most helpful comment
I also have this error message but manual backup execution doesn't work either and prints the same error message during backup.
Upgraded from 1.4.0.
Reverting the installation back to 1.4.0 resolves the problem. Helm chart is installed in both cases with Helm 3 with the same chart-values.yaml.