Terraform-provider-kubernetes: 'Already exist' after delete on GKE.

Created on 21 Nov 2019  ·  23Comments  ·  Source: hashicorp/terraform-provider-kubernetes

_This issue was originally opened by @AK-GD as hashicorp/terraform#23446. It was migrated here as a result of the provider split. The original body of the issue is below._


Terraform Version

Terraform v0.12.16

  • provider.google v3.0.0-beta.1
  • provider.kubernetes v1.10.0

Issue:

Not sure if this is a bug but the app behaviour is strange. Trying to re-run apply after it initially failed on timeout. GKE has some delay removing resources. Possibly API did sent 'complete' reply but physically resource is still present in destroying state, looks like terraform doesn't handle it. After the failure 'kubectl get pod' was still showing some pods in 'terminating' state.

google_container_cluster.default: Modifications complete after 5m0s [id=projects/experiments/locations/us-west1-b/clusters/carts-001]
kubernetes_deployment.carts1: Destroying... [id=default/carts-deployment-1]
kubernetes_deployment.carts2: Destroying... [id=default/carts-deployment-2]
kubernetes_deployment.carts1: Destruction complete after 2s
kubernetes_deployment.carts2: Destruction complete after 2s
kubernetes_deployment.carts1: Creating...
kubernetes_deployment.carts2: Creating...

Error: Failed to create deployment: object is being deleted: deployments.apps "carts-deployment-1" already exists

on deployments.tf line 27, in resource "kubernetes_deployment" "carts1":
27: resource "kubernetes_deployment" "carts1" {

bug

Most helpful comment

But maybe things are getting sidetracked here. How child objects of the deployment are delete is not the issue

The deployment is delete according to terraform and recreation fails because the it still exists.
The provider has to handle this in a correct way

A lot of people are facing this issue

All 23 comments

We have this on EKS as well with Kubernetes 1.13

Experiencing the same issue when tainting and then running apply on deployments.

Terraform v0.12.2

  • provider.google v2.20.0
  • provider.kubernetes v1.10.0
    K8s v1.14 (1.14.8-gke.17)

Didn't have this with kubernetes-provider v1.9.0

This also happens under the following setup:
Terraform v0.12.18

  • provider.google v3.3.0
  • provider.kubernetes v1.10.0
    K8s v1.14 (1.14.8-gke.17)

I am experiencing the same issue as well. It is killing the deployment but doesn't appear to be waiting long enough for it to complete before installing the new one. Rerunning the terraform script typically works the second time.

  • Terraform: v0.11.14
  • provider.Google v.2.20.0
  • provider kubernetes v1.10.0

We are facing the same issue, too. Actually it is critical issue, because it deletes deployment and there was no any working pods.

Terraform: latest (v.0.12.18)
Provider: AWS EKS
Kubernetes: Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

I want to provide logs to help.

module.microservice.data.aws_eks_cluster.primary: Refreshing state...
module.microservice.data.aws_eks_cluster_auth.primary: Refreshing state...
module.microservice.kubernetes_deployment.main: Refreshing state... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_horizontal_pod_autoscaler.main: Refreshing state... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_service.main: Refreshing state... [id=default/ms-****-v1]
module.microservice.kubernetes_deployment.main: Destroying... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_deployment.main: Destruction complete after 0s
module.microservice.kubernetes_deployment.main: Creating...

Error: Failed to create deployment: object is being deleted: deployments.apps "*******-ms-****-v1" already exists

  on .terraform/modules/microservice/deployment.tf line 1, in resource "kubernetes_deployment" "main":
   1: resource "kubernetes_deployment" "main" {


Exited with code exit status 1

Actually, deployment deleted successfully from kubernetes, but it fails on create. Rerunning helps. But there is service unavailability issue.

Same issue. Easy to reproduce by deploying a resource, tainting, and trying to re-deploy the same resource.

Terraform v0.12.18

provider.google v3.3.0
provider.kubernetes v1.10.0
K8s v1.14 (1.14.8-gke.17)

I had a similar issue, being bitten by delayed delete, when I was trying to update a K8s resource.

However I contemplated why Terraform wanted to delete my resource instead of patching it with just a handful of updates I've made. By examining Terraform's plan it turned out that I had modified a field that I was not supposed to be updated on an existing resource, hence was the delete and re-create. (I vaguely remember, perhaps I was updating the selector of a replicaset by including the ever-updating commit-id in the labels.)

After fixing my templates Terraform did not want to delete my resource any more, but patched it politely and therefore it was a happy workaround making me to write better templates.

Does anyone know if the response that is returned from the kubernetes api:

"Confirmed. I am going to delete your object"

vs

"Confirmed. I deleted your object"

As far as I know, all of kubernetes is based on controllers/queues and it takes time for the controllers to change the state of all objects. The behaviour is 'I have received your request, I can do it and will make it happen"

@curtbushko I have not looked into it in detail but the behavior of kubectl delete deployment and terraform seem to differ.

Kubectl is deleting the deployment instantly. The pods that were part of the deployment linger for several seconds but they are eventually deleted as well (due to --cascade=true being the default). This seems to suggest that the API is returning "Confirmed. I deleted your object" but I'm not positive.

If I watch the state of the kubernetes deployment from both the command line and the GKE dashboard when I do terraform apply it looks like terraform is cleaning up the pods first (and possibly other stuff) before removing the deployment. I see the pod ready count go to 0, GKE displays the message "Does not have minimum availability" for some time, eventually the deployment is removed.

So it looks to me like terraform is performing several steps as part of the delete process but returning before waiting for the steps to complete.

@jasonmcboyd After a minimal amount of looking at the code, delete is doing:

var (
    cascadeDeletePolicy = metav1.DeletePropagationForeground
    deleteOptions       = metav1.DeleteOptions{
        PropagationPolicy: &cascadeDeletePolicy,
    }
)

The only possible options from the kubernetes API are "Orphan","Background" and "Foreground". The comment for Foreground says:

    // The object exists in the key-value store until the garbage collector
    // deletes all the dependents whose ownerReference.blockOwnerDeletion=true
    // from the key-value store.  API sever will put the "foregroundDeletion"
    // finalizer on the object, and sets its deletionTimestamp.  This policy is
    // cascading, i.e., the dependents will be deleted with Foreground.

Which sounds like the correct thing.

But maybe the problem is that 'ownerReference.blockOwnerDeletion=true' needs to be set for all deployment children on creation...

I think you're right curtbushko, and we can add this . From https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#-strong-write-operations-deployment-v1-apps-strong-

blockOwnerDeletion boolean | If  true, AND if the owner has the "foregroundDeletion" finalizer, then the  owner cannot be deleted from the key-value store until this reference  is removed. Defaults to false. To set this field, a user needs "delete"  permission of the owner, otherwise 422 (Unprocessable Entity) will be  returned.

We don't have to set it on creation only though. We can add it just before deletion, since we basically always want to wait until it is completely deleted, especially in cases where a destroy is walking up the dependency tree.

I have this issue when using terraform with a digitalocean kubernetes clsuter as well. I am running terraform on Terraform Cloud.

Whenever a kubernetes_deployment change happens, I will get an error like:

Error: Failed to create deployment: object is being deleted: deployments.apps "nginx-ingress-controller" already exists

But when I check the kubernetes cluster, the deployment is deleted. I would then need to trigger another terraform run, and the next run applies the deployment correctly.

The trouble with this bug is it makes changes non-atomic.

Although the action of deleting a deployment or other high level resource to recreate it if there is a non-modifiable error is not ideal, at least if it runs smoothly it is predictable.

Currently with the current situation, you need to apply the terraform, only for it to break, then reapply it leading to quite a long period where there are no deployments available with the new changes. It's a bit of a pain.

In our case this happens after changing deployment state with kubectl. For example if we are scaling up or down some deployment with kubectl manually, new deployment with terraform scripts will give an error.

In our case this happens after changing deployment state with kubectl. For example if we are scaling up or down some deployment with kubectl manually, new deployment with terraform scripts will give an error.

I think I'm observing the same behaviour as @amirashad - TBH I see nothing wrong with scaling up and down with kubectl - terraform should either modify the deployment to match it's state or ignore it, if lifecycle ignore_changes is set accordingly. Stopping with the error in this case is not a desired behavior. :wink:

...- terraform should either modify the deployment to match it's state or ignore it, if lifecycle ignore_changes is set accordingly. Stopping with the error in this case is not a desired behavior. 😉

Has anyone successfully worked around this with lifecycle ignore_changes? In our case, we see this error even without auto- or manually-scaling (AWS EKS).

Trancing the terraform calls It asks for a forground delete

````
2020-06-25T13:16:05.134+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: -----------------------------------------------------
2020-06-25T13:16:05.209+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: 2020/06/25 13:16:05 [INFO] Deleting deployment: "iam-login-dpl"
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: 2020/06/25 13:16:05 [DEBUG] Kubernetes API Request Details:
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: ---[ REQUEST ]---------------------------------------
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: DELETE /apis/apps/v1/iam-login-dpl
HTTP/1.1

2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4:
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: {
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "kind": "DeleteOptions",
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "apiVersion": "apps/v1",
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "propagationPolicy": "Foreground"
2020-06-25T13:16:05.212+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: }
2020-06-25T13:16:05.212+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4:
``

But maybe things are getting sidetracked here. How child objects of the deployment are delete is not the issue

The deployment is delete according to terraform and recreation fails because the it still exists.
The provider has to handle this in a correct way

A lot of people are facing this issue

@aareet thank you for adding a "bug" label to this. I am running into this, and am going to attempt to see if I can introduce a sleep somewhere to make a quick workaround..

Still happening in AWS EKS 1.17 with latest kubernetes provider

Opened #937 to address this

@aareet
looks promising, can it be released in 1.12.1?

This is in the changelog for v1.13.0

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings