Terraform-aws-eks: Cluster never comes available after moving to 9.0.0

Created on 29 Feb 2020  Â·  9Comments  Â·  Source: terraform-aws-modules/terraform-aws-eks

After moving to 9.0.0, the cluster availability check in null_resource.wait_for_cluster fails to detect when the cluster becomes available. The result is the is that it spins forever waiting. This appears to be related to 750

Workaround is to override the wait_for_cluster_cmd and use the default value prior to 750 e.g.
wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

I'm submitting a...

  • [ x] bug report
  • [ ] feature request
  • [ ] support request - read the FAQ first!
  • [ ] kudos, thank you, warm fuzzy

What is the current behavior?

module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [20s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [30s elapsed]
...
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m42s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m52s elapsed]
eventually times out

If this is a bug, how to reproduce? Please include a code sample if relevant.

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "9.0.0"
cluster_name = local.eks_cluster_name
cluster_version = var.eks_k8s_version
subnets = var.private_subnet_ids
vpc_id = var.vpc_id
enable_irsa = true
tags = merge(var.eks_tags,local.env_tags)
cluster_enabled_log_types = var.cluster_enabled_log_types
cluster_log_retention_in_days = var.cluster_log_retention_in_days
workers_additional_policies = concat(["${aws_iam_policy.alb_ingress_node_policy.id}","arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy","arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"],var.workers_additional_policies)
workers_group_defaults = var.workers_group_defaults
worker_groups_launch_template = var.worker_groups_launch_template
node_groups_defaults = var.node_groups_defaults
node_groups = var.node_groups

manage_aws_auth = true
map_roles = [
{
rolearn = data.aws_iam_role.sso_admin.arn
username = "sso-admin"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_pu.arn
username = "sso-pu"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_ro.arn
username = "sso-ro"
groups = ["system:authenticated"]
},

]
}

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: 9.0.0
  • OS: os-x
  • Terraform version: 0.12.21

Any other relevant info

Most helpful comment

It looks that AWS API endpoint accepts only TLS 1.2 protocol which is case here.

Your Wget version looks to be 1.17.1 which is from 2015, so quite old. I have checked GNU Wget 1.18 and newer which are working as expected.

So you should update wget or you can return to previous method via overwriting default value for wait_for_cluster_cmd.

All 9 comments

wget --no-check-certificate -O - $ENDPOINT/healthz
--2020-02-29 11:20:14-- https://bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)... 3.229.39.55, 52.3.37.102
Connecting to bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)|3.229.39.55|:443... connected.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.

curl -k -s $ENDPOINT/healthz
ok%

What is your wget version ? can you please share your wget output in debug mode --debug?

wget --debug --no-check-certificate -O - https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Setting --check-certificate (checkcertificate) to 0
Setting --output-document (outputdocument) to -
DEBUG output created by Wget 1.17.1 on darwin14.5.0.

Reading HSTS entries from /Users/geiner/.wget-hsts
URI encoding = ‘UTF-8’
--2020-02-29 17:12:50-- https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)... 3.225.XXX.XXX, 52.23.XXX.XXX
Caching 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com => 3.225.XXX.XXX 52.23.XXX.XXX
Connecting to 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)|3.225.XXX.XXX|:443... connected.
Created socket 6.
Releasing 0x00007fccd7001940 (new refcount 1).
Initiating SSL handshake.
SSL handshake failed.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Closed fd 6
Unable to establish SSL connection.
Saving HSTS entries to /Users/geiner/.wget-hsts

It looks that AWS API endpoint accepts only TLS 1.2 protocol which is case here.

Your Wget version looks to be 1.17.1 which is from 2015, so quite old. I have checked GNU Wget 1.18 and newer which are working as expected.

So you should update wget or you can return to previous method via overwriting default value for wait_for_cluster_cmd.

Yes as @daroga0002 mentioned, you're trying to do TLS 1.0 instead of TLS 1.2. Upgrade your wget or use curl please.

@robgeiner Closing this. Feel free to reopen this issue if https://github.com/terraform-aws-modules/terraform-aws-eks/issues/757#issuecomment-593883841 doesn't help.

@robgeiner I am a terraform noob here. I am following this -
https://learn.hashicorp.com/terraform/kubernetes/provision-eks-cluster
And hitting the wget issue. Where should I add this

wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

to make it work without failing for wget. Thanks

Yep, what @daroga0002 said. For example,
```
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "11.0.0"
cluster_name = local.eks_cluster_name
wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"
...
}

Thanks it worked.

Was this page helpful?
0 / 5 - 0 ratings