Terraform-aws-eks: Fails to delete EKS cluster before attempting to recreate it

Created on 11 Feb 2019  路  12Comments  路  Source: terraform-aws-modules/terraform-aws-eks

I have issues

This module fails to recreate the EKS cluster on a configuration change.

I'm submitting a...

  • [X] bug report
  • [ ] feature request
  • [ ] support request
  • [ ] kudos, thank you, warm fuzzy

What is the current behavior?

I have previously launched my EKS cluster into public subnets:

module "foo-cluster" {
  // ...
  subnets = ["${module.vpc.public_subnets}"]
}

Now I want to switch to private subnets:

module "foo-cluster" {
  // ...
  subnets = ["${module.vpc.private_subnets}"]
}

terraform plan (excerpt):

-/+ module.foo.module.foo-cluster.aws_eks_cluster.this (new resource required)
      id:                                         "foo-cluster" => <computed> (forces new resource)
      vpc_config.0.subnet_ids.1090847432:         "" => "subnet-..." (forces new resource)
      vpc_config.0.subnet_ids.1158933479:         "subnet-..." => "" (forces new resource)
      vpc_config.0.subnet_ids.368999755:          "subnet-..." => "" (forces new resource)
      vpc_config.0.subnet_ids.3812061593:         "" => "subnet-..." (forces new resource)
      vpc_config.0.subnet_ids.4007587422:         "" => "subnet-..." (forces new resource)
      vpc_config.0.subnet_ids.818659184:          "subnet-..." => "" (forces new resource)

terraform apply throws this error:

* aws_eks_cluster.this: error creating EKS Cluster (foo-cluster): ResourceInUseException: Cluster already exists with name: foo-cluster

In the AWS console I can see that it did not attempt to delete the cluster before attempting to recreate it. This error was produced pretty instantly.

Environment details

Mac OS 10.13

Terraform and module versions:

* provider.aws: version = "~> 1.57"
* provider.external: version = "~> 1.0"
* provider.kubernetes: version = "~> 1.5"
* provider.local: version = "~> 1.1"
* provider.null: version = "~> 2.0"
* provider.random: version = "~> 2.0"
* provider.template: version = "~> 2.0"

Most helpful comment

I'm not clear why this wouldn't be something to be fixed on the TF provider side? If the API behaves improperly (tries create first), why not adjust the provider to explicitly delete first?

We have other components that rely on the cluster name, and adding random strings to solve this problem cascades into our tooling.

All 12 comments

Funny that just today I wanted to change subnets and I had the same problem.
For me, -/+ (destroy and recreate) is not acceptable for changing cluster on production but it seems that this is not terraform-aws-eks module problem.
It looks that you are not able to change subnets on EKS (please correct me if I'm wrong)

As an idea, can try to change the name of the cluster as well to make sure that the new cluster should be created?

I'm not sure I follow, if you change the name the cluster it be -/+ together with workers/IAM/scalling groups...

according to the boto3 docs, you can't add/remove subnets therefore I doubt that terraform has good solution to this

Yea looking at the provider implementation, an update to anything but the cluster version is going to force recreation. @confiq , you went to the same place I did. Within the boto3 sdk docs, you see the same operation supported but none others that do updates. No coincidence here: if the resource had a graceful way to migrate between subnetworks or perform other changes in place, we'd see SDKs exposing methods to do that easily.

This module can only do as well as the upstream provider, and unfortunately in this case, the best we can offer is a migration path. You could spin up a new cluster in the desired subnet, migrate all workloads, and cut traffic over to the new set of load balancers. Good luck!

@brandoconnor I'm not quite following - the issue I reported is that this module does not actually delete a cluster like it says it is going to. I'm not speaking to the reasons why it might want to delete a cluster and then recreate it. The issue is that it doesn't actually delete it before attempting to recreate it.

I should have been more specific - I was addressing @confiq 's issue.

And looking closer, Anton's response is probably what you're after: you can't have 2 clusters of the same name in the same region. The provider for whatever reason seems to try creation before destroying the initial cluster, hence your problem. If you looked at AWS API log, you'll probably find a failure to create an EKS cluster rather than a failed delete.

I don't think there's any fix to do in this module to resolve your problem, but on your end, add some randomization in your cluster names. As you move subnets, the new cluster will come up live with a unique name, and all operations happen as expected.

@brandoconnor That is exactly what I was thinking for a solution. I attempted to move the worker nodes between subnets which resulted in the same. I am thinking to use a 4-char random for the cluster name.

I'm not clear why this wouldn't be something to be fixed on the TF provider side? If the API behaves improperly (tries create first), why not adjust the provider to explicitly delete first?

We have other components that rely on the cluster name, and adding random strings to solve this problem cascades into our tooling.

Same stuff. This is hella frustrating. Maintainers should've provided the CloudFormation template for VPC creation wtih proper subnet policies then. Like end-2-end instruction what to do with this repo to just deploy EKS with N nodes. F#!$

I tainted the cluster to recreate it:

terraform taint module.my-cluster.aws_eks_cluster.this

The plan output looks correct:

terraform plan

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
+/- create replacement and then destroy
 <= read (data resources)

Terraform will perform the following actions:

  # module.my-cluster.data.aws_iam_policy_document.worker_autoscaling will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "worker_autoscaling"  {
      ...

  # module.my-cluster.data.template_file.kubeconfig will be read during apply
  # (config refers to values not yet known)
 <= data "template_file" "kubeconfig"  {
      ...

  # module.my-cluster.data.template_file.userdata[0] will be read during apply
  # (config refers to values not yet known)
 <= data "template_file" "userdata"  {
      ...

  # module.my-cluster.aws_autoscaling_group.workers[0] will be updated in-place
  ~ resource "aws_autoscaling_group" "workers" {
      ...

  # module.my-cluster.aws_eks_cluster.this is tainted, so must be replaced
+/- resource "aws_eks_cluster" "this" {
      ...

  # module.my-cluster.aws_iam_policy.worker_autoscaling[0] will be updated in-place
  ~ resource "aws_iam_policy" "worker_autoscaling" {
      ...

  # module.my-cluster.aws_launch_configuration.workers[0] must be replaced
+/- resource "aws_launch_configuration" "workers" {
      ...

  # module.my-cluster.null_resource.update_config_map_aws_auth[0] will be created
  + resource "null_resource" "update_config_map_aws_auth" {
      ...

  # module.my-cluster.random_pet.workers[0] must be replaced
+/- resource "random_pet" "workers" {
      ...

Plan: 4 to add, 2 to change, 3 to destroy.

Then when I go to execution, terraform doesn't even try deleting it first:

terraform apply "tfplan"

module.my-cluster.aws_eks_cluster.this: Creating...

Error: error creating EKS Cluster (my): ResourceInUseException: Cluster already exists with name: my
{
  ClusterName: "my",
  Message_: "Cluster already exists with name: my"
}

  on .terraform/modules/my-cluster/terraform-aws-modules-terraform-aws-eks-b69c8fb/cluster.tf line 9, in resource "aws_eks_cluster" "this":
   9: resource "aws_eks_cluster" "this" {

At this point I don't know what to do..

Hey @brandonjbjelland, could we reopen this issue. As the previous post from @tonglil shows, the provider will try to create before destroy it. So the order be wrong and should be fixed.

I use Terraform 0.12.29 and aws 2.70

I don't think this repo should consider an upstream bug an issue that we also independently need to track. If you have ideas as to how to resolve the problem within this repo, maintainers are listening. Otherwise, keep the comments in the issue on the provider.

Was this page helpful?
0 / 5 - 0 ratings