Hi, I'm trying to solve the nginx use-case but using S3 for object storage.
I'm trying to restore a backup which is created by running this command ark backup create nginx-backup --selector app=nginx --snapshot-volumes. The command used for restoring is ark restore create nginx-backup --restore-volumes.
The backup however is being created successfully and the backup files are uploaded to the object storage and the snapshot is getting created. The issue that I'm facing is that it is pointing to the same PV while restoring in a different k8s cluster. And the pod that is supposed to restore is struck in STATUS ContainerCreating. Is there anyway that I can get it to create a new PV while restoring in a different cluster?
Output of ark backup describe nginx-backup
Name: nginx-backup
Namespace: heptio-ark
Labels: <none>
Annotations: <none>
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: app=nginx
Snapshot PVs: true
TTL: 720h0m0s
Hooks: <none>
Phase: Completed
Backup Format Version: 1
Expiration: 2018-04-07 19:10:11 +0000 UTC
Validation errors: <none>
Persistent Volumes:
pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c:
Snapshot ID: snap-0c9aad251b280516d
Type: gp2
Availability Zone: us-east-1a
IOPS: <N/A>
Output of ark restore describe nginx-backup-20180308194129
Name: nginx-backup-20180308194129
Namespace: heptio-ark
Labels: <none>
Annotations: <none>
Backup: nginx-backup
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: nodes
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: true
Phase: Completed
Validation errors: <none>
Warnings: <none>
Errors: <none>
yaml file used to create nginx-example
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
namespace: nginx-example
spec:
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nginx-logs
persistentVolumeClaim:
claimName: nginx-logs
containers:
- image: nginx:1.7.9
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/var/log/nginx"
name: nginx-logs
readOnly: false
---
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: my-nginx
namespace: nginx-example
spec:
ports:
- port: 80
targetPort: 80
selector:
app: nginx
type: LoadBalancer
NOTE: I'm using the k8s nodes iam instance profile to provide EC2 and S3 access to the ark server instead of a secret as mentioned in the example use-case.
Hi @bhargavpss, thanks for your report. FYI we're in kube slack in #ark-dr if you want real-time Q&A :-)
Are your 2 clusters in the same cloud region?
Could you please show kubectl -n nginx-example describe pod?
Hey @ncdc,
Yes, the two clusters are in the same region us-east-1 and the same account as well.
Output of kubectl -n nginx-example describe pod
Warning FailedMount 29m attachdetach AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: S-a2ASLzwJ9nBVU6f0aibSyj2D3pZgyJOyINWAq2dPNTNh9zIfR18JmUHovsgW3V5v0VprZLG7Ih-Xr_7TDgntbeITrxxG1jJ6kLAtNYpAOnjvPmBvuc8skXP5NSg5-lx8NHXtDM6URR0-SX3QjY1bAUtrVEs9iCEbB5TyiIpcxJ9KB8d25XFSDjCrxD6pW69zsEZwiDz3roPizswzWKTdu-zg2J3C0N-IoqtgkTMfwVPwQXigJBdDWYbzO2JkbIGOq_3T-i46fDEJaBR_7MpXRjymYyHC-QAmAoriQU4Ompmlg9cJFBj0hhjPimHr1By9xgy0O7KdrBpPpGj40FVK5XgPBTMrpaAhGk6oJJEGiXCPiuwubl4l3APYGtYtVj6SGWiqJfODq3PyNj7R2g6JiyAEMW2r36E-c56Ezo-8cqdmGkKEeFuFSyVebpabnQpO9IkZTqCUX7RViI73yH7bHBtboKT0gwPd7zg76_IpfUYkiaoqbkLWUyP03E1VWL7HGXfUSe6K2Ix_bM8-qHl6VCUB8oxbVDleHO0uiH6Jm-PmSe-I3WDMpF8MlJTsgOxLQp2Yx9pnEA6Js-MGe9UdsjYpyJ_pKC2vpyMJxRZ-fKszyhVon5frBsbtZS48Isam38BgIK-qZBc7_41B7rBKbBjdG6NZJV5R-VYhYW-dE-R5VHcEcRxBiqXtCoSw7X3Ix2zkOyqvdCNAM\n\tstatus code: 403, request id: 4b8cf55d-872c-4103-b0b1-7255c540a1c9"
Warning FailedMount 29m attachdetach AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: w0GQMhnWgO3aIGEe35lnsOpwS63v21ouwE-BPntKb_uhYZuj5NVwrPzs0vSFE-2Fqh2XKqrHAuMZ220gsjP51yxmLmdcRK4_4ZmfY9UyKG5PTamND34W65_6uADj2C_kZVrh28E5ut4N5gDd8WNF0jpjabBu8D27Ym8P_f2C0rHe3Vu-1IpuwHHMrjashgEW-7FK8-yymrhQW3822swy8ycGQQbViBiG8Himybev2wz5-ni5MN0LVZS8ifoTTOq4l7JiTBogQQ9bSPFWA96F_h0Rt-czFhZdyljXKxdPXgmffCG-PDoFCEo0zqWzoMccmdQd8M458Pz8YfUW0EI4WM64SdkVIwhX5yqsADWODU9ZDAS_9ClLca4tA5Th92HhV-2p4k2VbpDNLxL9jw_kLSY9K-SUtBn0woGiR9Mjljfx303XrlyFppSIYd0yptkyxLGWxm2Dk_cbBLvZf26iGt0fv0RzOVE4LA0jHu8jGT1YbMp62Cqz9-qtIwnwl18kA8FbQL0oGwPznhrKqTVlCjkAuUeWChZs2XpOpzvb13v-d2Ttg1SGjt8i2W7qJBrM2sV5X6Ir2CyxRqRKbOr4EwEa4QQRKrTkRcp5qVb3_p7QPQ7zdYMPxLmzMn7zpvUaLYbi_hatBG4c5ptyJQ4cHuj8Yl8Klk8d4EjN3AWU7nTKqPme5xHwh8s8xLG01N8z1bybv-9U9IbfkgA\n\tstatus code: 403, request id: f56d6588-a117-4ded-b02c-8b5b868d82ba"
Warning FailedMount 28m attachdetach AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: Nw_DiAjFtK-wC88Xzi5dC_1BUdncs1YycTIhGUfX9jwLfOowJveeMlR9dCL5zQV_whY30_UHejiXy39cAGnNs0Pr8YkatBuPHBcBngiy_cSxhzp2wNbhNyJxx3jZ3pF_0HMNuzJmn5oSRl2v7NBcHR9WQuADr7fzrAJs21nZBGgZjGT201_AE8iNZyu2yGqPNMmJX0ZaOdaMqy2XDPvR795WUGnaqrqr7D_mQNQohrNLAzQvN0iJa30iCGQYtg4FnDJy0QV65Zlr-U640hIOf08lVfMjc8Tn5X_a3XkqOhBtittW9tYoN0YO_W0_YuGvrnZM5xx0py1Vk57saQaBfmZz_Ii1wAqjmaVhLSJz7vwrgUJQrF6OUfqfvSrWOEIxisC3hXDHOYu3chJMD63UOFyhdoUuu3hGxyjb-LfgO2TaMZl0bnXlj1p4YXAq2wTP1xQbv2bOXsdanOUvrlwhpt_0p2SIDCWZkCs2y92SfQV6rUAhRIRBvyi_sCc2v6dbae9or6hyMLlqDRAoZuh5bivW4JrbkbilfU01A-tQ46PHgcsWYWzYOb6_aNgVaigaZSF4DLbORAl1HG41u5SpjtF8Om1lGOiCtb4EFenEHCdjqp7MpL744jbxd2Se45yuKhKfVh0IMXcF75zPt42cg5TnA0lXYOQ-J9EwTA63gV-ELx2n_hh5SL0_0c98PxevXIP7eIOqklkw0s4\n\tstatus code: 403, request id: c004f9d0-cee8-4979-927c-508f986f29b9"
Warning FailedMount 27m attachdetach AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: pZtXYGQz9FOeEercFJxYzcLJnueLAZWw6ztjrziTUym342AL4B4LL7qqWrTTLQSnq3tYHtQHeCPZlteda4C6VLGKOjvC62dIxJxW88b27jfwlDpYp3nUYQApLqRbxWezafaBjpkCdyS9t2-jJ5HX39fzHiIWQLFsJ67xG1H9n7TVoHGMTeUDkqml3eY8wLZa4C_KUC0XwKODU-A9Eo57sp-EXgddfSwAthGTsdK0oM_ncJu8KlWWpOoTf9qb0D1BDtDRBdSVZ3W82D8Lg5T35zX5ENdcFWcfAeF7mMwOcxMT36aA8Gs5s88isXjzR7aMvrrmRpy0gWtqPLHbq3thOp1070NMCVj6YdOSb8ya5wp4CYNUsj_kG8d_caDD-D4DKOzXgKHs7ads72UK05H5z04wcBIhKZmC7GfijBl_c_ySWv4q-UiKtOSwLb0ANTWqgPoF98XfdmdNAwfoo070hQTlc_wfaab6hNx2epFcglu1mEN-cA2nEBlm7t4LRt8b2Sk9LyqPWv_Fw7OajH_mVgq20G_y0lBo_8eOWW0F9pL_bjkcAWQHCDNBMqyGcbk8ja_VPfEDOvTQW8MtDQqEJo-MlPuItgouRs_thvPHevwuMwrflVFq6ewklfmwmgsnHVPb84OcFQQnCDJIbu4pp7So2po0c35VHjtEWRhrgiMhJknEhNq47jE0JMP6nvG0RtnODMpm5i-PJnQ\n\tstatus code: 403, request id: 6bbb9bfd-8c92-4ce1-bf03-2e16d071969d"
Warning FailedMount 1m (x13 over 25m) attachdetach (combined from similar events): AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: VJ9jW-_EPhGBd2Syur9KvvqPClxfPZib3vjnh7r31sDaeaoviZPb0dajrJSSm8WOeIJAFzdOfN-Qd8l7lXMY1fUP1VOWLsr1oEV-ziS-cgePKSoaFEEV2Qa7uLGO0n2EUOc-YWqz9mECbUST756o8P1uIJwnhtCliDU2iTzs7CxXY_Q2W8k4rkhQiYYRoIJ7ATbbS-oB69vU8ZCugflr09TG0JhPTHybCaLVtJHUnSJiSNuuTGw8YpYmtiiz_6TRtxbN6hGkZJtnb8OD1bluvqRPb9w8Pb9f58vs6du-Y9K-bCz7gJo7qK1sW9WW3Xf8fvU0nP_iqgOhJ_RQbRcIylibtZzDOfpLE8jFKySy5Nay5Z3QnBDQy4L5FeJWk0PrXhsRcZJIS8CMteJdh6G1D1VI0k9c2DuMS-dNEOfu1pnG8FH0i0oEL_pR4ZDm_ArsgZVw6n7bT0Elz8a4XhEiKDme_mWlNVA8VQPRSebZL9vL6IG9edD-sJdXHZQCrmKsowbeyhWBXKeIGJjUQFaRBj_NkditbNWvbVoes2BIgiNW37VsSedzX0SoZe3Kt0dB3RMgPQcN_Q7Mft0JJ5R8BCym9DwQE2AKT0eIhSWB7tMxqBaX1xou-CGHIbJ-stt0Xt077XghB9EFAk2pkmMXBpydvZZuXDvJJQyj3Z00CVf-iCS6UKgzAvfN8jB7SeWVvpKyNNanVn4t\n\tstatus code: 403, request id: 8447989d-7052-4b0d-9102-b5c6d1be36c2"
Warning FailedMount 44s (x13 over 28m) kubelet, ip-172-20-49-240.ec2.internal Unable to mount volumes for pod "nginx-deployment-644cf84b7f-6m6wh_nginx-example(b599b5ee-2308-11e8-82f8-0e8c84c7abe0)": timeout expired waiting for volumes to attach/mount for pod "nginx-example"/"nginx-deployment-644cf84b7f-6m6wh". list of unattached/unmounted volumes=[nginx-logs]
Warning FailedSync 44s (x13 over 28m) kubelet, ip-172-20-49-240.ec2.internal Error syncing pod
This seems to be the reason why: Error attaching EBS volume "vol-03d0039cdcca2fc9a" to instance "i-09b03f4a4bf903960": "UnauthorizedOperation: You are not authorized to perform this operation. Maybe something is wrong with ip-172-20-49-240.ec2.internal's cloud credentials / iam instance profile?
@mattmoyer do you have any aws queries you can suggest to see why this is happening?
Or if not the node, then the controller manager.
No. The Instance profile for the instance ip-172-20-49-240.ec2.internal has full EC2 access. I think the logs say UnauthorizedOperation because it is trying to attach the volume pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c and it is not possible because it is already in use by the first cluster.
If you see the PersistentVolumes section in the output of ark backup describe nginx-backup the same volume is being pointed by the restore which is not possible.
The name of the PV/PVC can be identical between 2 clusters - that's just a Kubernetes identifier. Ark changes the EC2 volume ID in the PV to be the ID of the new volume that is created from the snapshot.
You can probably look at the IDs of the volumes in the EC2 console - I would expect them to be different.
You can also decode the Encoded authorization failure message to get more info about what went wrong. See https://docs.aws.amazon.com/cli/latest/reference/sts/decode-authorization-message.html for more details.
e.g. the ID of the volume that's failing to attach is vol-03d0039cdcca2fc9a. You can check the ID of the volume in the original cluster by doing kubectl describe pv/<name of pv> and you should see a different vol-* ID.
Hey, thank you so much for the help.
The volumes ID's are indeed different. The issue was that EC2 permissions just for the node is not sufficient because this particular action is being carried out by the master. I gave the required permissions to the master but there seems to be an another issue.
The volume that is created from the snapshot is getting attached is device dev/sdf (default) while the instance is looking for dev/xvdf. I'm not sure if we can specify this somewhere as a flag but I have tried manually attaching a new volume created from the snapshot and it seems to be working.
I probably need to customize ark for this purpose. It would be good if we have some kind of flag/config where we can specify this.
The volume that is created from the snapshot is getting attached is device dev/sdf (default) while the instance is looking for dev/xvdf
What component specifically is looking for /dev/xvdf?
Also, what EC2 instance type(s) are you using?
Did you create the original EBS volume by hand, or use a PVC for dynamic provisioning?
Did you do anything to influence the device to be /dev/xvdf or was everything handled by the system in an automated fashion?
The instance which the volume is getting attached to
Here's the output of kubectl describe pod -n nginx-example after giving the required permissions to master instance profile
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned nginx-deployment-644cf84b7f-h9mnb to ip-172-20-49-240.ec2.internal
Normal SuccessfulMountVolume 22m kubelet, ip-172-20-49-240.ec2.internal MountVolume.SetUp succeeded for volume "default-token-kv5pc"
Warning FailedMount 9m (x7 over 21m) attachdetach AttachVolume.Attach failed for volume "pvc-2db45bb0-22f8-11e8-82f2-0e21f011a24c" : disk attachment of "vol-03d0039cdcca2fc9a" to "ip-172-20-49-240.ec2.internal" failed: requested device "/dev/xvdf" but found "/dev/sdf"
The instance type is t2.medium.
Yes. I deleted the attached volume that is device /dev/sdf and manually attached a new volume and particularly specified the device type /dev/xvdf and it is working now.
What kubernetes version? Kubernetes is in charge of attaching the volume and selecting the device name, so I would expect all this to work automatically and transparently.
Any chance your instance was using paravirtualization instead of HVM?
The kubernetes version is 1.8.6
I have deployed the cluster using kops. The AMI used for the nodes is k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14 (ami-8ec0e1f4)
Can you confirm, either in the EC2 web console, or using the CLI, the instance's virtualization type is listed as hvm?
Basically at this point you're running into what is potentially a Kubernetes bug, as best I can tell. Kubernetes is 100% responsible for attaching volumes to instances and requesting a device name, specifically in the /dev/xvd** format. Can you maybe check your CloudTrail logs to see if something else was responsible for attaching the disk as /dev/sdf?
We'll see if we can reproduce with a kops deployment, as well.
Here is the output of aws ec2 describe-instances --filters "Name=instance-id,Values=i-09b03f4a4bf903960" --query Reservations[].Instances[].[BlockDeviceMappings,RootDeviceName,RootDeviceType,VirtualizationType]
[
[
[
{
"DeviceName": "/dev/xvda",
"Ebs": {
"Status": "attached",
"DeleteOnTermination": true,
"VolumeId": "vol-0081ff8b3ebc74cb1",
"AttachTime": "2018-03-08T19:31:19.000Z"
}
},
{
"DeviceName": "/dev/xvdf",
"Ebs": {
"Status": "attached",
"DeleteOnTermination": false,
"VolumeId": "vol-0d3cc93a2d2324d8f",
"AttachTime": "2018-03-08T20:55:49.000Z"
}
},
{
"DeviceName": "/dev/xvdcv",
"Ebs": {
"Status": "attached",
"DeleteOnTermination": false,
"VolumeId": "vol-03d0039cdcca2fc9a",
"AttachTime": "2018-03-08T21:01:11.000Z"
}
}
],
"/dev/xvda",
"ebs",
"hvm"
]
]
Ok and this is after you manually attached it as xvdf I assume.
yes
OK I just created a cluster using kops and can reproduce the Unauthorized operation. We'll dig in and see if we can figure it out.
OK I gave my master full ec2 access. I now see it attaching the volume to the node using /dev/xvd**, although for some reason it's timing out waiting for it to attach (not sure why).
Ok it finally attached. I am able to verify the volume created from the snapshot had the data I expected (i.e. what Ark backed up).
If you have a reliable reproducer, we'll be happy to investigate further. At this time I honestly don't know why your disk was attached to /dev/sdf when Kubernetes requested /dev/xvdf.
Do you have logs from the controller-manager pod? Here is a sample from mine:
$ kubectl -n kube-system logs kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal
I0309 16:31:33.057885 5 aws.go:1435] Assigned mount device cm -> volume vol-xxxxxxxxxxxxxxxxx
I0309 16:31:33.248799 5 aws.go:1708] AttachVolume volume="vol-xxxxxxxxxxxxxxxxx" instance="i-xxxxxxxxxxxxxxxxx" request returned {
AttachTime: 2018-03-09 16:31:33.262 +0000 UTC,
Device: "/dev/xvdcm",
InstanceId: "i-xxxxxxxxxxxxxxxxx",
State: "attaching",
VolumeId: "vol-xxxxxxxxxxxxxxxxx"
}
I0309 16:31:33.430355 5 aws.go:1557] Waiting for volume "vol-xxxxxxxxxxxxxxxxx" state: actual=attaching, desired=attached
I0309 16:31:43.534104 5 aws.go:1458] Releasing in-process attachment entry: cm -> volume vol-xxxxxxxxxxxxxxxxx
I0309 16:31:43.534133 5 operation_generator.go:278] AttachVolume.Attach succeeded for volume "pvc-84cf2e47-23b4-11e8-93ed-0e5a3b306290" (UniqueName: "kubernetes.io/aws-ebs/vol-xxxxxxxxxxxxxxxxx") from node "ip-xxx-xxx-xxx-xxx.ec2.internal"
According to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html:
Depending on the block device driver of the kernel, the device could be attached with a different name than you specified. For example, if you specify a device name of /dev/sdh, your device could be renamed /dev/xvdh or /dev/hdh. In most cases, the trailing letter remains the same. In some versions of Red Hat Enterprise Linux (and its variants, such as CentOS), even the trailing letter could change (/dev/sda could become /dev/xvde). In these cases, the trailing letter of each device name is incremented the same number of times. For example, if /dev/sdb is renamed /dev/xvdf, then /dev/sdc is renamed /dev/xvdg. Amazon Linux AMIs create a symbolic link for the name you specified to the renamed device. Other AMIs could behave differently.
@bhargavpss can you retry restoring once or twice and see if you get the same /dev/sdf type of behavior consistently? (FWIW, if you do, that's a Kubernetes issue, not an Ark one, unfortunately.)
Also FYI I think #341 should fix the Unauthorized issue we're seeing when the master tries to attach a volume created from a snapshot.
@bhargavpss we're going to use this issue to track the fact that a kops master isn't allowed to attach ark-restored ebs volumes. Once #341 merges, it'll close out this issue.
If you continue to encounter problems with the device naming, please reach out either here or on Slack and we'll do everything we can to help. Thanks!
I just ran into this same problem 馃槃 Thanks to Steve, the fix has been merged in master. Upgrading my deployment to point to master and testing this out
If you're using an IAM policy for Ark in AWS, make sure you add ec2:DescribeSnapshots to the policy.
Will I be able to recover from old backups? It seems like the ec2:DescribeSnapshots policy is used to store some metadata while creating backups
If you have EBS snapshots that were created by an older version of Ark that doesn't have #341, you'll need to manually add the KubernetesCluster tag to them.
Name=KubernetesCluster?
Yes, with the value set appropriately.