What steps did you take and what happened:
We're using velero 1.2.0 to back up our EKS cluster and some EFS PVs. The velero service account is using AWS's eks.amazonaws.com/role-arn annotation to attach a role to it. This works fine for backing up data from velero, however restic does not seem to support pulling the AWS token properly with that mechanism, and it is using anonymous authentication, and failing to store data in S3.
What did you expect to happen:
Restic should be able to back up using the same credentials velero uses
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
Error message from backup logs:
time="2020-01-14T22:06:29Z" level=error msg="Error backing up item" backup=velero/infra-test-efs-backup-for-realz3 error="restic repository is not ready: error running command=restic init --repo=s3:s3-us-east-1.amazonaws.com/velero-platform-v2-backup/restic/kube-system --password-file=/tmp/velero-restic-credentials-kube-system355348533 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create key in repository at s3:s3-us-east-1.amazonaws.com/velero-platform-v2-backup/restic/kube-system failed: client.PutObject: Access Denied\n\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/repository_ensurer.go:144" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*repositoryEnsurer).EnsureRepo" group=v1 logSource="pkg/backup/resource_backupper.go:288" name=test-pod2 namespace= resource=pods
Restic pod logs:
time="2020-01-13T21:26:09Z" level=info msg="Setting log-level to INFO"
time="2020-01-13T21:26:09Z" level=info msg="Starting Velero restic server v1.2.0 (5d008491bbf681658d3e372da1a9d3a21ca4c03c)" logSource="pkg/cmd/cli/restic/server.go:62"
time="2020-01-13T21:26:10Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:156"
time="2020-01-13T21:26:10Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76"
time="2020-01-13T21:26:10Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:79"
time="2020-01-13T21:26:10Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:199"
time="2020-01-13T21:26:10Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76"
time="2020-01-13T21:26:10Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:79"
time="2020-01-13T21:26:10Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:83"
time="2020-01-13T21:26:10Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:83"
W0114 10:25:17.438939 1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/cmd/cli/restic/server.go:197: watch of *v1.Secret ended with: too old resource version: 11493244 (11502245)
W0114 17:48:06.516931 1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/cmd/cli/restic/server.go:197: watch of *v1.Secret ended with: too old resource version: 11609586 (11656444)
AWS Error message:
{
"recipientAccountId": "REDACTED",
"userIdentity": {
"type": "AWSAccount",
"principalId": "",
"accountId": "ANONYMOUS_PRINCIPAL"
},
"responseElements": null,
"errorMessage": "Access Denied",
"requestID": "REDACTED",
"managementEvent": false,
"eventID": "REDACTED",
"userAgent": "[Minio (linux; amd64) minio-go/v6.0.14]",
"eventName": "PutObject",
"resources": [
{
"type": "AWS::S3::Object",
"ARN": "REDACTED/restic/kube-system/keys/165975608458061c33f1fd9d74c71589a39c1c603bd680ed027a26a7a42b3aa7"
},
{
"type": "AWS::S3::Bucket",
"ARN": "REDACTED",
"accountId": "REDACTED"
}
],
"readOnly": false,
"eventVersion": "1.06",
"eventTime": "2020-01-14T22:05:04Z",
"vpcEndpointId": "REDACTED",
"requestParameters": {
"key": "restic/kube-system/keys/165975608458061c33f1fd9d74c71589a39c1c603bd680ed027a26a7a42b3aa7",
"bucketName": "REDACTED",
"Host": "REDACTED"
},
"awsRegion": "us-east-1",
"eventSource": "s3.amazonaws.com",
"sharedEventID": "REDACTED",
"additionalEventData": {
"CipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
"bytesTransferredOut": 243,
"bytesTransferredIn": 0,
"x-amz-id-2": "REDACTED"
},
"sourceIPAddress": "REDACTED",
"errorCode": "AccessDenied",
"eventType": "AwsApiCall"
}
Anything else you would like to add:
This was an issue for us last fall with velero. The solution was to update the AWS SDK to the latest version, and things magically worked afterwards. Not sure what might be needed since restic is using minio though. Looks like the latest version of restic uses minio 6.0.43 so maybe updating to that is enough?
For more info on the service-account IAM role stuff:
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
Environment:
velero version): 1.2.0velero client config get features): kubectl version): 1.14.9/etc/os-release): amazon-linuxTried building a local image of velero using restic 0.9.6 and that doesn't appear to have solved it for us, so not sure what would be necessary here.
Hi @geofffranks -- I think this is a limitation of restic itself, as it uses the minio SDK rather than the AWS one, so this mode of authentication likely isn't supported. You could try bringing it up there (https://github.com/restic/restic or https://forum.restic.net/), though I'm not sure how successful that would be. Sorry I don't have a better option for you..
@skriss Did you find a way to enable IAM for service account while using restic/velero
Restic is using minio which already has support for IAM for service accounts (https://github.com/minio/minio-go/pull/1183). I created a feature request for restic to enable this feature as well (https://github.com/restic/restic/issues/2703).
@skriss According to https://github.com/restic/restic/pull/2733, restic now contains the fix for use IRSA.
Great, now just need to wait for a new restic release that contains it! cc @vmware-tanzu/velero-maintainers
@skriss Looks like restic have made a new release 0.10.0 in the last couple of days that contains restic/restic#2733
Is there an ETA for a velero docker image release bundled with restic 0.10.0?
@skriss
Any update on this yet? Or is it still dead in the water?
We need to upgrade Restic and test it. https://github.com/vmware-tanzu/velero/issues/3490
I did a test with Velero 1.6.0 and IRSA and I was able to do the backups to S3 using IAM role.
This issue may be closed...
Most helpful comment
@skriss Looks like restic have made a new release 0.10.0 in the last couple of days that contains restic/restic#2733