Terraform-aws-eks: Cannot find 'wget' and build then times out

Created on 12 Apr 2020  路  13Comments  路  Source: terraform-aws-modules/terraform-aws-eks

I have issues

Building the EKS cluster times out due to the script not finding 'wget'

I'm submitting a...

  • [x] bug report
  • [ ] feature request
  • [ ] support request - read the FAQ first!
  • [ ] kudos, thank you, warm fuzzy

What is the current behavior?

module.my-cluster.null_resource.wait_for_cluster[0]: Still creating... [4m50s elapsed]
module.my-cluster.null_resource.wait_for_cluster[0] (local-exec): /bin/sh: wget: command not found
module.my-cluster.null_resource.wait_for_cluster[0] (local-exec): /bin/sh: wget: command not found
module.my-cluster.null_resource.wait_for_cluster[0]: Still creating... [5m0s elapsed]
module.my-cluster.null_resource.wait_for_cluster[0] (local-exec): TIMEOUT


Error: Error running command 'for i in `seq 1 60`; do wget --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1': exit status 1. Output: /bin/sh: wget: command not found
/bin/sh: wget: command not found

If this is a bug, how to reproduce? Please include a code sample if relevant.

Run this script

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "tennis-eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-2a", "us-east-2b", "us-east-2c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  enable_vpn_gateway = true

  tags = {
    Terraform = "true"
    Environment = "dev"
  }
}

data "aws_eks_cluster" "cluster" {
  name = module.my-cluster.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.my-cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.9"
}

module "my-cluster" {
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = "tennis-eks-cluster"
  cluster_version = "1.15"
  subnets         = module.vpc.private_subnets
  vpc_id          = module.vpc.vpc_id

  worker_groups = [
    {
      instance_type = "m4.large"
      asg_max_size  = 5
    }
  ]
}

What's the expected behavior?

The cluster should be built and the process end gracefully.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version:
  • OS: unknown
  • Terraform version:
Terraform v0.12.19
+ provider.aws v2.57.0
+ provider.kubernetes v1.11.1
+ provider.local v1.4.0
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.template v2.1.2

Any other relevant info

Most helpful comment

using curl instead of wget:

  wait_for_cluster_cmd   = "for i in `seq 1 60`; do curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1"

All 13 comments

Not a bug.

Due to the way EKS service works we have to ping the new kubernetes endpoint until it responds correctly. The terraform kubernetes provider does not do retries and instead dies instantly. The module thus has a null_resource block that runs wget in a loop by default.

You have a few options here:

  • turn off applying aws-auth configmap with the module. This skips the endpoint liveness check. You will then have to configure aws-auth yourself: manage_aws_auth = false
  • install wget into your deployment environment
  • change the liveness ping command to something else that does work in your environment. See wait_for_cluster_cmd and wait_for_cluster_interpreter variables.

It still seems like a bug. If the module is going to use wget it should ensure its there (or do something else). Otherwise the result is the same, the module breaks.
$0.02

@gamename what do you suggest ? The module use wget by default, it's written in docs and variables spacs https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/variables.tf#L201-L211

@barryib Good question. Try adding something in the assumptions section that indicates wget be supported in the EKS node. Because if it isn't, then timing out is the expected behavior.

Getting hit with this currently as well. Would it make sense to add a local-exec that checks if wget is installed? My stack is stuck creating and just hit the 40th minute mark for this check.

I added wget shortly after I saw what was happening, but it's been 37 minutes since then

Maybe. But how to ensure easily (without putting depends on everywhere) ? It will be the first action to run in terraform ?

I think writing an explicit requirement for wget in docs or asking user to change wait_for_cluster_cmd to fit his need is enough.

But anyway, I'll happy to review any PR to fix this.

using curl instead of wget:

  wait_for_cluster_cmd   = "for i in `seq 1 60`; do curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1"

running into same issue

Please, use curl or whatever you want to call an http endpoint. There is a variable to help you set what you want. See https://github.com/terraform-aws-modules/terraform-aws-eks/issues/829#issuecomment-626132066.

Keep this issue open untill we improve the doc to highlight this "requirement".

Same error here, thanks to @twzhangyang it works. Edit your eks-cluster.tf and add the wait_for_cluster_cmd overrided var, here is mine:

module "eks" {
  source       = "terraform-aws-modules/eks/aws"
  cluster_name = local.cluster_name
  subnets      = module.vpc.private_subnets

  tags = {
    Environment = "training"
    GithubRepo  = "terraform-aws-eks"
    GithubOrg   = "terraform-aws-modules"
  }

  vpc_id = module.vpc.vpc_id

  worker_groups = [
    {
      name                          = "worker-group-1"
      instance_type                 = "t2.small"
      additional_userdata           = "echo foo bar"
      asg_desired_capacity          = 2
      additional_security_group_ids = [aws_security_group.worker_group_mgmt_one.id]
    },
    {
      name                          = "worker-group-2"
      instance_type                 = "t2.medium"
      additional_userdata           = "echo foo bar"
      additional_security_group_ids = [aws_security_group.worker_group_mgmt_two.id]
      asg_desired_capacity          = 1
    },
  ]

  wait_for_cluster_cmd = "for i in `seq 1 60`; do curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1"
  cluster_version = "1.17"
}

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

thanks @PierrickMartos and @twzhangyang

I tried to create the cluster as you mentioned but it fails with below error.

Added: the below line in my EKS cluster creation.
wait_for_cluster_cmd = "for i in seq 1 60; do curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1"

Error details:
Error: Error running command 'for i in seq 1 60; do curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1': exec: "/bin/sh": file does not exist. Output:

Version details: I use Windows 10 as my environment and connect to AWS for EKS provisioning.
Terraform v0.12.26
provider.aws v2.70.0
provider.kubernetes v1.12.0
provider.local v1.4.0
provider.null v2.1.2
provider.random v2.3.0
provider.template v2.1.2

@BalajiSivarajRajan set wait_for_cluster_interpreter to match your interpreter https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/variables.tf#L206.

By default, you don't have /bin/sh in windows. See FAQ

Was this page helpful?
0 / 5 - 0 ratings