Terraform-provider-kubernetes: 'Already exist' after delete on GKE.

Created on 21 Nov 2019 · 23Comments · Source: hashicorp/terraform-provider-kubernetes

_This issue was originally opened by @AK-GD as hashicorp/terraform#23446. It was migrated here as a result of the provider split. The original body of the issue is below._

Terraform Version

Terraform v0.12.16

provider.google v3.0.0-beta.1
provider.kubernetes v1.10.0

Issue:

Not sure if this is a bug but the app behaviour is strange. Trying to re-run apply after it initially failed on timeout. GKE has some delay removing resources. Possibly API did sent 'complete' reply but physically resource is still present in destroying state, looks like terraform doesn't handle it. After the failure 'kubectl get pod' was still showing some pods in 'terminating' state.

google_container_cluster.default: Modifications complete after 5m0s [id=projects/experiments/locations/us-west1-b/clusters/carts-001]
kubernetes_deployment.carts1: Destroying... [id=default/carts-deployment-1]
kubernetes_deployment.carts2: Destroying... [id=default/carts-deployment-2]
kubernetes_deployment.carts1: Destruction complete after 2s
kubernetes_deployment.carts2: Destruction complete after 2s
kubernetes_deployment.carts1: Creating...
kubernetes_deployment.carts2: Creating...

Error: Failed to create deployment: object is being deleted: deployments.apps "carts-deployment-1" already exists

on deployments.tf line 27, in resource "kubernetes_deployment" "carts1":
27: resource "kubernetes_deployment" "carts1" {

bug

Source

hashibot[bot]

👍40 👀3

Most helpful comment

But maybe things are getting sidetracked here. How child objects of the deployment are delete is not the issue

The deployment is delete according to terraform and recreation fails because the it still exists.
The provider has to handle this in a correct way

A lot of people are facing this issue

mark-00 on 25 Jun 2020

👍7

All 23 comments

We have this on EKS as well with Kubernetes 1.13

SpamapS on 10 Dec 2019

👍7

Experiencing the same issue when tainting and then running apply on deployments.

Terraform v0.12.2

provider.google v2.20.0
provider.kubernetes v1.10.0
K8s v1.14 (1.14.8-gke.17)

Didn't have this with kubernetes-provider v1.9.0

This also happens under the following setup:
Terraform v0.12.18

provider.google v3.3.0
provider.kubernetes v1.10.0
K8s v1.14 (1.14.8-gke.17)

eddy-curv on 11 Dec 2019

👍2

I am experiencing the same issue as well. It is killing the deployment but doesn't appear to be waiting long enough for it to complete before installing the new one. Rerunning the terraform script typically works the second time.

Terraform: v0.11.14
provider.Google v.2.20.0
provider kubernetes v1.10.0

mornindew on 12 Dec 2019

👍5

We are facing the same issue, too. Actually it is critical issue, because it deletes deployment and there was no any working pods.

Terraform: latest (v.0.12.18)
Provider: AWS EKS
Kubernetes: Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

I want to provide logs to help.

module.microservice.data.aws_eks_cluster.primary: Refreshing state...
module.microservice.data.aws_eks_cluster_auth.primary: Refreshing state...
module.microservice.kubernetes_deployment.main: Refreshing state... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_horizontal_pod_autoscaler.main: Refreshing state... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_service.main: Refreshing state... [id=default/ms-****-v1]
module.microservice.kubernetes_deployment.main: Destroying... [id=default/*******-ms-****-v1]
module.microservice.kubernetes_deployment.main: Destruction complete after 0s
module.microservice.kubernetes_deployment.main: Creating...

Error: Failed to create deployment: object is being deleted: deployments.apps "*******-ms-****-v1" already exists

  on .terraform/modules/microservice/deployment.tf line 1, in resource "kubernetes_deployment" "main":
   1: resource "kubernetes_deployment" "main" {


Exited with code exit status 1

Actually, deployment deleted successfully from kubernetes, but it fails on create. Rerunning helps. But there is service unavailability issue.

amirashad on 8 Jan 2020

👍4

Same issue. Easy to reproduce by deploying a resource, tainting, and trying to re-deploy the same resource.

Terraform v0.12.18

provider.google v3.3.0
provider.kubernetes v1.10.0
K8s v1.14 (1.14.8-gke.17)

tjhiggins on 9 Jan 2020

👍6

I had a similar issue, being bitten by delayed delete, when I was trying to update a K8s resource.

However I contemplated why Terraform wanted to delete my resource instead of patching it with just a handful of updates I've made. By examining Terraform's plan it turned out that I had modified a field that I was not supposed to be updated on an existing resource, hence was the delete and re-create. (I vaguely remember, perhaps I was updating the selector of a replicaset by including the ever-updating commit-id in the labels.)

After fixing my templates Terraform did not want to delete my resource any more, but patched it politely and therefore it was a happy workaround making me to write better templates.

DenesPal on 13 Jan 2020

👍1

Does anyone know if the response that is returned from the kubernetes api:

"Confirmed. I am going to delete your object"

"Confirmed. I deleted your object"

As far as I know, all of kubernetes is based on controllers/queues and it takes time for the controllers to change the state of all objects. The behaviour is 'I have received your request, I can do it and will make it happen"

curtbushko on 23 Jan 2020

@curtbushko I have not looked into it in detail but the behavior of kubectl delete deployment and terraform seem to differ.

Kubectl is deleting the deployment instantly. The pods that were part of the deployment linger for several seconds but they are eventually deleted as well (due to --cascade=true being the default). This seems to suggest that the API is returning "Confirmed. I deleted your object" but I'm not positive.

If I watch the state of the kubernetes deployment from both the command line and the GKE dashboard when I do terraform apply it looks like terraform is cleaning up the pods first (and possibly other stuff) before removing the deployment. I see the pod ready count go to 0, GKE displays the message "Does not have minimum availability" for some time, eventually the deployment is removed.

So it looks to me like terraform is performing several steps as part of the delete process but returning before waiting for the steps to complete.

jasonmcboyd on 23 Jan 2020

@jasonmcboyd After a minimal amount of looking at the code, delete is doing:

var (
    cascadeDeletePolicy = metav1.DeletePropagationForeground
    deleteOptions       = metav1.DeleteOptions{
        PropagationPolicy: &cascadeDeletePolicy,
    }
)

The only possible options from the kubernetes API are "Orphan","Background" and "Foreground". The comment for Foreground says:

    // The object exists in the key-value store until the garbage collector
    // deletes all the dependents whose ownerReference.blockOwnerDeletion=true
    // from the key-value store.  API sever will put the "foregroundDeletion"
    // finalizer on the object, and sets its deletionTimestamp.  This policy is
    // cascading, i.e., the dependents will be deleted with Foreground.

Which sounds like the correct thing.

But maybe the problem is that 'ownerReference.blockOwnerDeletion=true' needs to be set for all deployment children on creation...

curtbushko on 23 Jan 2020

I think you're right curtbushko, and we can add this . From https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#-strong-write-operations-deployment-v1-apps-strong-

blockOwnerDeletion boolean | If  true, AND if the owner has the "foregroundDeletion" finalizer, then the  owner cannot be deleted from the key-value store until this reference  is removed. Defaults to false. To set this field, a user needs "delete"  permission of the owner, otherwise 422 (Unprocessable Entity) will be  returned.

We don't have to set it on creation only though. We can add it just before deletion, since we basically always want to wait until it is completely deleted, especially in cases where a destroy is walking up the dependency tree.

SpamapS on 31 Jan 2020

I have this issue when using terraform with a digitalocean kubernetes clsuter as well. I am running terraform on Terraform Cloud.

Whenever a kubernetes_deployment change happens, I will get an error like:

Error: Failed to create deployment: object is being deleted: deployments.apps "nginx-ingress-controller" already exists

But when I check the kubernetes cluster, the deployment is deleted. I would then need to trigger another terraform run, and the next run applies the deployment correctly.

tnguyen14 on 17 Apr 2020

👍1

The trouble with this bug is it makes changes non-atomic.

Although the action of deleting a deployment or other high level resource to recreate it if there is a non-modifiable error is not ideal, at least if it runs smoothly it is predictable.

Currently with the current situation, you need to apply the terraform, only for it to break, then reapply it leading to quite a long period where there are no deployments available with the new changes. It's a bit of a pain.

essjayhch on 21 Apr 2020

👍3

In our case this happens after changing deployment state with kubectl. For example if we are scaling up or down some deployment with kubectl manually, new deployment with terraform scripts will give an error.

amirashad on 12 May 2020

👍2

In our case this happens after changing deployment state with kubectl. For example if we are scaling up or down some deployment with kubectl manually, new deployment with terraform scripts will give an error.

I think I'm observing the same behaviour as @amirashad - TBH I see nothing wrong with scaling up and down with kubectl - terraform should either modify the deployment to match it's state or ignore it, if lifecycle ignore_changes is set accordingly. Stopping with the error in this case is not a desired behavior. :wink:

ivanilves on 8 Jun 2020

...- terraform should either modify the deployment to match it's state or ignore it, if lifecycle ignore_changes is set accordingly. Stopping with the error in this case is not a desired behavior. 😉

Has anyone successfully worked around this with lifecycle ignore_changes? In our case, we see this error even without auto- or manually-scaling (AWS EKS).

JeffBor on 24 Jun 2020

Trancing the terraform calls It asks for a forground delete

````
2020-06-25T13:16:05.134+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: -----------------------------------------------------
2020-06-25T13:16:05.209+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: 2020/06/25 13:16:05 [INFO] Deleting deployment: "iam-login-dpl"
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: 2020/06/25 13:16:05 [DEBUG] Kubernetes API Request Details:
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: ---[ REQUEST ]---------------------------------------
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: DELETE /apis/apps/v1/iam-login-dpl
HTTP/1.1

2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4:
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: {
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "kind": "DeleteOptions",
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "apiVersion": "apps/v1",
2020-06-25T13:16:05.211+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: "propagationPolicy": "Foreground"
2020-06-25T13:16:05.212+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4: }
2020-06-25T13:16:05.212+0200 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.3_x4:
``

mark-00 on 25 Jun 2020

But maybe things are getting sidetracked here. How child objects of the deployment are delete is not the issue

The deployment is delete according to terraform and recreation fails because the it still exists.
The provider has to handle this in a correct way

A lot of people are facing this issue

mark-00 on 25 Jun 2020

👍7

@aareet thank you for adding a "bug" label to this. I am running into this, and am going to attempt to see if I can introduce a sleep somewhere to make a quick workaround..

hcharley on 15 Jul 2020

👍2

Still happening in AWS EKS 1.17 with latest kubernetes provider

JnMik on 5 Aug 2020

Opened #937 to address this

DrFaust92 on 5 Aug 2020

👍4

@aareet
looks promising, can it be released in 1.12.1?

eddy-curv on 17 Aug 2020

👀2

This is in the changelog for v1.13.0

hcharley on 2 Sep 2020

👍1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot[bot] on 10 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings