Rke: Failed to download etcd snapshot from GCS(S3 comptabile backend)

Created on 25 Apr 2020  路  6Comments  路  Source: rancher/rke

RKE version: 1.0.4

Docker version: (docker version,docker info preferred) 19.03

Operating system and kernel: (cat /etc/os-release, uname -r preferred) Ubuntu 18.04

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) GCE

cluster.yml file: - nodes:

  • address: 10.148.72.59
    port: "22"
    internal_address: ""
    role:

    • controlplane

    • etcd

      hostname_override: ""

      user: rke_user

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/rke_user

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

  • address: 10.148.72.60
    port: "22"
    internal_address: ""
    role:

    • worker

      hostname_override: ""

      user: rke_user

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/rke_user

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

    • address: 10.148.72.61

      port: "22"

      internal_address: ""

      role:

    • worker

      hostname_override: ""

      user: rke_user

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/rke_user

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

      services:

      etcd:

      snapshot: null

      retention: ""

      creation: ""

      backup_config:

      interval_hours: 1

      retention: 6

      ssh_key_path: ~/.ssh/rke_user

      ssh_cert_path: ""

      ssh_agent_auth: false

      authorization:

      mode: rbac

      options: {}

      ignore_docker_version: false

      kubernetes_version: "v1.15.9-rancher1-1"

      private_registries: []

      cluster_name: "rancher-ha-cluster"

      cloud_provider:

      name: ""

      prefix_path: ""

      addon_job_timeout: 30

      bastion_host:

      address: ""

      port: ""

      user: ""

      ssh_key: ""

      ssh_key_path: ""

      ssh_cert: ""

      ssh_cert_path: ""

      monitoring:

      provider: ""

      options: {}

      node_selector: {}

      restore:

      restore: false

      snapshot_name: ""

      dns: null

Steps to Reproduce: Bring up a rke cluster with 3 nodes, do a rke snapshot-save on GCS(S3 compatible backend).. Bakcup is successfull, file is uploaded to GCS

2) Try to restore the backup with the following command:

rke etcd snapshot-restore --config cluster.yaml --name "snap1" --s3 --access-key $accesskey --secret-key $secretkey --bucket-name "my-bucket" --s3-endpoint "storage.googleapis.com" --folder "backups"

Results:

INFO[0000] Running RKE version: v1.0.4
INFO[0000] Restoring etcd snapshot bkp1
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] [dialer] Setup tunnel for host [10.148.72.60]
INFO[0000] [dialer] Setup tunnel for host [10.148.72.61]
INFO[0000] [dialer] Setup tunnel for host [10.148.72.59]
INFO[0000] Checking if container [cert-deployer] is running on host [10.148.72.59], try rancher/rke#1
INFO[0000] Image [rancher/rke-tools:v0.1.52] exists on host [10.148.72.59]
INFO[0000] Starting container [cert-deployer] on host [10.148.72.59], try rancher/rke#1
INFO[0001] Checking if container [cert-deployer] is running on host [10.148.72.59], try rancher/rke#1
INFO[0006] Checking if container [cert-deployer] is running on host [10.148.72.59], try rancher/rke#1
INFO[0006] Removing container [cert-deployer] on host [10.148.72.59], try rancher/rke#1
INFO[0006] [etcd] Get snapshot [bkp1] on host [10.148.72.59]
INFO[0006] Image [rancher/rke-tools:v0.1.52] exists on host [10.148.72.59]
INFO[0006] Starting container [etcd-download-backup] on host [10.148.72.59], try rancher/rke#1
INFO[0006] [etcd] Successfully started [etcd-download-backup] container on host [10.148.72.59]
INFO[0006] Waiting for [etcd-download-backup] container to exit on host [10.148.72.59]
INFO[0006] Container [etcd-download-backup] is still running on host [10.148.72.59]
INFO[0007] Waiting for [etcd-download-backup] container to exit on host [10.148.72.59]
INFO[0007] Removing container [etcd-download-backup] on host [10.148.72.59], try rancher/rke#1
FATA[0007] Failed to download etcd snapshot from s3, exit code [1]: time="2020-04-24T11:56:13Z" level=fatal msg="A header or query you provided requested a function that is not implemented."

Most helpful comment

okay, so this seems to be an issue with the rancher/rke-tools which uses Listobjectsv2 api call to download the snapshot from S3 compatible backend( which in my case is GCS) and since GCS doesn't support ListObjectsV2 call yet, it fails.

Can we include this by saying if the S3 endpoint is google.storageapis.com use ListObjects else use ListObjectsV2.

All 6 comments

okay, so this seems to be an issue with the rancher/rke-tools which uses Listobjectsv2 api call to download the snapshot from S3 compatible backend( which in my case is GCS) and since GCS doesn't support ListObjectsV2 call yet, it fails.

Can we include this by saying if the S3 endpoint is google.storageapis.com use ListObjects else use ListObjectsV2.

This issue/PR has been automatically marked as stale because it has not had activity (commit/comment/label) for 60 days. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

is there any solution for this issue? I get the same error. I'm trying to restore from GCP bucket.

Thank you in advance.

This should be fairly simple to implement based on the condition given above where we use a different function if endpoint == storage.googleapis.com.

Faced the same issue on Rancher 2.5.3 Via Rancher UI when I try to restore snapshot

This cluster is currently Updating. Failed to download etcd snapshot from s3, exit code [1]: time="2020-12-03T19:07:45Z" level=fatal msg="A header or query you provided requested a function that is not implemented."
I am using S3 Region Endpoint : storage.googleapis.com

Snapshot works while restoring it fails with above error.

Reproduced the issue on 2.5.2

  • Deploy a cluster, add S3 endpoints as storage.googleapis.com.
  • Give in accesskey, secretkey and bucketname.
  • After the cluster comes up Active, take a snapshot
  • Snapshot is saved successfully.
  • Restore from this snapshot
  • Error seen: Failed to download etcd snapshot from s3, exit code [1]: time="2020-12-09T21:04:32Z" level=fatal msg="A header or query you provided requested a function that is not implemented."

On master-head commit id: 278c1f988c and 2.5-head commit id: b2953cbc

  • Deploy a cluster, add S3 endpoints as storage.googleapis.com.
  • Give in accesskey, secretkey and bucketname.
  • After the cluster comes up Active, take a snapshot
  • Add a workload.
  • Take another snapshot.
  • Restore from the first snapshot.
  • Cluster is restored successfully and comes back up Active.
  • Workload will not be available in the cluster.
Was this page helpful?
0 / 5 - 0 ratings