Every time, I create a cluster I get this error message:
* module.eks.null_resource.update_config_map_aws_auth: Error running command
'kubectl apply -f /k8s/config-map-aws-auth_cluster.yaml --kubeconfig /k8s/kubeconfig_cluster': exit status 1.
Output: Unable to connect to the server: dial tcp xxx.20.110.151:443: i/o timeout
if I run terraform apply again, it will work without any error.
Create a cluster using the module.
No, error message while creating the cluster.
I do not know what causes the issue.
Terraform v0.11.10
+ provider.aws v1.48.0
+ provider.local v1.1.0
+ provider.null v1.0.0
+ provider.template v1.0.0
OK, it looks like a problem that https://github.com/terraform-aws-modules/terraform-aws-eks/pull/187 will solve.
I upgraded to 1.8.0, but then it just silently fails:
module.eks.null_resource.update_config_map_aws_auth: Provisioning with 'local-exec'...
module.eks.null_resource.update_config_map_aws_auth (local-exec): Executing: ["/bin/sh" "-c" "for i in {1..5}; do kubectl apply -f /k8s/config-map-aws-auth_cluster.yaml --kubeconfig /k8s/kubeconfig_cluster && break || sleep 10; done"]
module.eks.aws_iam_role_policy_attachment.workers_autoscaling: Creation complete after 1s (ID: cluster20181207234911216400000003-2018120723491235770000000a)
the effect is my worker nodes will not join the cluster:
$ kubectl get nodes
No resources found.
in version 1.7.0 it throws an error:
Releasing state lock. This may take a few moments...
Error: Error applying plan:
1 error(s) occurred:
* module.eks.null_resource.update_config_map_aws_auth: Error running command 'kubectl apply -f /k8s/config-map-aws-auth_cluster.yaml --kubeconfig /k8s/kubeconfig_cluster': exit status 1. Output: Unable to connect to the server: dial tcp 23.20.165.97:443: i/o timeout
I use terraform apply again and it works. At the end all nodes join the cluster.
I still don't understand how so many people have this error 馃槅 Are you on some unreliable wifi or something?
No, I am connected via ethernet and fiber.
Maybe using the Kubernetes Provider instead of the null resource can help with this issue.
@max-rocket-internet Another thing I realized is that the issues started the moment I upgraded from Terraform 11.08 to 11.10. This can be a coincidence, because I changed other things as well.
Maybe using the Kubernetes Provider instead of the null resource can help with this issue.
Does it's have some retry or timeout logic? I don't think kubectl itself has any logic like this we could use unfortunately.
started the moment I upgraded from Terraform 11.08 to 11.10
Could be but but I think the error message i/o timeout comes directly from kubectl?
I'm open for new solutions. Basically we need the loop to exit with a status of not 0 if it reaches the end.
I did not have the issue with module version 2.0.
Still an issue @Jeeppler ?
@max-rocket-internet it does not seem to be an issue any more. You can close this issue.
OK great!
@max-rocket-internet thanks for checking back.
Hi, I'm facing the same issue. Slightly different timeout message, but essentially the same issue.
Here's the exact error message:
module.eks.null_resource.update_config_map_aws_auth (local-exec): error: unable to recognize "./config-map-aws-<redacted>.yaml": Get https://<redacted>.yl4.eu-west-1.eks.amazonaws.com/api?timeout=32s: net/http: TLS handshake timeout
It's definitely not related to a network issue (wifi or otherwise) as I've experienced it when connected to different networks.
To be honest I haven't tried this on other machines other than my laptop.
I execute terraform within a container.
OS: mac os x 10.13.6
Terraform: 0.11.11
Terraform AWS provider plugin: 1.54.0
Docker Desktop version: Community 2.0.0.0-mac81 (29211)
I am currently trying to test increasing the number of retries (e.g. from 5 to 50) in the aforementioned loop, i.e. for i in {1..5}; do kubectl apply [...]
Further information:
If I taint the resource (terraform taint -module=eks null_resource.update_config_map_aws_auth) and run terraform apply again, the operation succeeds almost immediately.
Update setting the retries to 50 didn't help: the timeout happens exactly as before.
setting the retries to 50 didn't help: the timeout happens exactly as before.
Interesting. What could possibly be the problem then?
Just like @marcelloromani I run Terraform in a container. However, I run it Docker on Linux Mint and use Ubuntu 18.04 LTS Bionic as container base image.
After building up and tearing down EKS clusters yesterday, I can confirm @marcelloromani issue. What I observed is that it sometimes works out of the box and sometimes fails. It is like flipping a coin.
This is what I receive if it does not work:
$ kubectl get nodes
No resources found.
this is what I can see in the Terraform output:
module.eks.null_resource.update_config_map_aws_auth: Still creating... (30s elapsed)
module.eks.null_resource.update_config_map_aws_auth (local-exec): Unable to connect to the server: dial tcp 18.215.4.135:443: i/o timeout
...
module.eks.null_resource.update_config_map_aws_auth: Creation complete after 41s (ID: 6879543812975939879)
Furthermore, it seems like the module.eks.null_resource.update_config_map_aws_auth is running before the worker is available:
module.eks.null_resource.update_config_map_aws_auth: Still creating... (10s elapsed)
module.eks.aws_launch_configuration.workers: Still creating... (10s elapsed)
module.module.eks.aws_launch_configuration.workers: Creation complete after 11s (ID: mycluster-worker_group_mycluster2019010816241860230000000d)
module.module.eks.aws_autoscaling_group.workers: Creating...
arn: "" => "<computed>"
...
module.eks.null_resource.update_config_map_aws_auth: Still creating... (20s elapsed)
module.eks.aws_autoscaling_group.workers: Still creating... (10s elapsed)
module.eks.null_resource.update_config_map_aws_auth: Still creating... (30s elapsed)
module.module.eks.null_resource.update_config_map_aws_auth (local-exec): Unable to connect to the server: dial tcp 18.215.4.135:443: i/o timeout
module.eks.aws_autoscaling_group.workers: Still creating... (20s elapsed)
module.eks.null_resource.update_config_map_aws_auth: Still creating... (40s elapsed)
module.module.eks.null_resource.update_config_map_aws_auth: Creation complete after 41s (ID: 6879543812975939879)
module.eks.aws_autoscaling_group.workers: Still creating... (30s elapsed)
module.eks.aws_autoscaling_group.workers: Still creating... (40s elapsed)
module.module.eks.aws_autoscaling_group.workers: Creation complete after 40s (ID: mycluster-worker_group_mycluster2019010816242854200000000e)
I am not sure if this is suppose to be like that. My mental model is that, the auto scaling group and worker have to be up and running before we can run the config map null resource.
In addition, is there any reason this module uses a null provider and a local exec resource instead of the kubernetes provider resource for the config map?
@max-rocket-internet please reopen this issue. I though it was fixed, but more testing revealed it is not fixed.
Yes sir!
@Jeeppler Are you using zsh by any chance?
Have you had a look at my PR https://github.com/terraform-aws-modules/terraform-aws-eks/pull/245
In addition, is there any reason this module uses a null provider and a local exec resource instead of the kubernetes provider resource for the config map?
@Jeeppler I had the same question.
@marcelloromani I do not use ZSH. I use Bash. However, we both have in common that we use Docker containers to run Terraform in. Maybe, the there is some networking issue.
@Jeeppler In order to try and debug the cause of the nodes not joining the cluster, I read the boot log of the (2 in my case) EC2 instances, and found a lot of "Unauthorized" entries.
In my particular case, the issue was that the aws auth config map wasn't applied to the cluster, which doesn't seem to be the case for you.
Thought I'd mention anyway in the hope to be of help.
Closing after no update in a long time. Feel free to reopen 馃檪
@max-rocket-internet I think I am running into a similar issue
Using:
Terraform version 0.11.13
terraform-aws-modules EKS version 4.0.2
terraform-aws-modules VPC version 1.64.0
When I run Terraform apply I see this output (Terraform returns a success btw, no error code).
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (30s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): Unable to connect to the server: dial tcp XXX.XXX.238.240:443: i/o timeout
module.eks.eks.aws_autoscaling_group.workers_launch_template: Still creating... (30s elapsed)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (40s elapsed)
module.eks.eks.aws_autoscaling_group.workers_launch_template: Still creating... (40s elapsed)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (50s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.aws_autoscaling_group.workers_launch_template: Still creating... (50s elapsed)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m0s elapsed)
module.eks.module.eks.aws_autoscaling_group.workers_launch_template: Creation complete after 56s (ID: xxx-dev-eks-02019050919521734110000000f)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m10s elapsed)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m20s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m30s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m40s elapsed)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (1m50s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m0s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m10s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m20s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m30s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m40s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth (local-exec): error: You must be logged in to the server (the server has asked for the client to provide credentials)
module.eks.eks.null_resource.update_config_map_aws_auth: Still creating... (2m50s elapsed)
module.eks.module.eks.null_resource.update_config_map_aws_auth: Creation complete after 2m58s (ID: 4109496625117494020)
Running kubectl get nodes returns No resources found.
Not sure how to solve this but I think its a related issue?
@bishtawi Are you sure you are providing the correct KUBECONFIG?
Pretty sure?
Right after I run terraform apply I can run
aws --profile xxx-dev eks update-kubeconfig --name xxx-dev-eks
kubectl apply -f config-map-aws-auth_xxx-dev-eks.yaml
with no errors. And that gets the worker nodes to connect to the cluster.
And terraform is using the same aws profile
EDIT: ok, no. Running kubectl get nodes --kubeconfig xxx-dev-eks gets me the You must be logged in to the server (Unauthorized) error.
Comparing what aws update-kubeconfig and terraform's kubeconfig, looks like I am missing:
env:
- name: AWS_PROFILE
value: xxx-dev
And of course theres a terraform option for that: kubeconfig_aws_authenticator_env_variables.
@Jeeppler I saw this issue as well when I used terraform from within a container using https://github.com/toolbox-cli/toolbox; even though it has been working for weeks. I switched to local install of terraform v0.11.13 until I debug the toolbox container configuration.
The only recent change is that we nested our terraform code into another module level; if that makes sense to you. Perhaps, that in combination is the issue. As I mentioned above, this setup was working for me until we did more nested modules. Perhaps it's a shell issue within my container.
This issue seems related to https://github.com/terraform-aws-modules/terraform-aws-eks/issues/341, though
Most helpful comment
Pretty sure?Right after I run
terraform applyI can runwith no errors. And that gets the worker nodes to connect to the cluster.
And terraform is using the same aws profile
EDIT: ok, no. Running
kubectl get nodes --kubeconfig xxx-dev-eksgets me theYou must be logged in to the server (Unauthorized)error.Comparing what
aws update-kubeconfigand terraform's kubeconfig, looks like I am missing:And of course theres a terraform option for that:
kubeconfig_aws_authenticator_env_variables.