Terraform-provider-kubernetes: Terraform delete succesfully the kubernetes_deployment resource, but the resource is still alive in Kubernetes cluster

Created on 5 Aug 2020  ·  6Comments  ·  Source: hashicorp/terraform-provider-kubernetes

Hello guys.
I updated to EKS 1.17, so I think it's pretty much related to that, but it seems terraform is not able to delete my deployments anymore. When I terraform destroy a kubernetes_deployment resources, it return success but the resources are still alive in the cluster.

Kubernetes platform

AWS EKS 1.17

Terraform Version and Provider Version

Terraform v0.12.25

  • provider.aws v3.0.0
  • provider.kubernetes v1.12.0

Affected Resource(s)

kubernetes_deployment
Child replicaset and pods

Debug Output

module.deployment.kubernetes_service.service: Destroying... [id=message-bus/event-store-stream]
module.deployment.kubernetes_horizontal_pod_autoscaler.hpa: Destroying... [id=message-bus/event-store-stream]
module.deployment.kubernetes_deployment.deployment: Destroying... [id=message-bus/event-store-stream]
module.deployment.kubernetes_horizontal_pod_autoscaler.hpa: Destruction complete after 0s
module.deployment.kubernetes_service.service: Destruction complete after 0s
module.deployment.kubernetes_deployment.deployment: Destruction complete after 0s
Destroy complete! Resources: 3 destroyed.

Expected Behavior

I expect the deployment, and his associated replicaset and pods to be deleted

Actual Behavior

Only the service and the HPA gets deleted
Running a kubectl get will still show the resources that should have been deleted.

Any hint ?

bug

Most helpful comment

I activated the Control plane logging in cloudwatch and find out I had a tons of cert-manager errors.
Deep in the noise I found this garbage collector error

unable to sync caches for garbage collector
timed out waiting for dependency graph builder sync during GC sync

I deleted every cert-manager resources thinking it could overload the garbage collector and boum, the deletion of resources started working again.

crazy stuff

All 6 comments

I had a similar issue in 1.15 and 1.16.

Opened #937 in hopes of solving it.

@DrFaust92 in my case it seems it will never delete.

But I guess you're PR would still bring something relevant for my use case. Maybe the apply would have eventually time out, since it's not getting deleted

I ran the terraform destroy with TF_LOG="debug", I can see kubernetes returns 200 even if no deletion occurs.
I'm trying to delete a container named zipkin in monitoring namespace
Here's the kubernetes reply

{
  "kind": "Deployment",
  "apiVersion": "apps/v1",
  "metadata": {
    "name": "zipkin",
    "namespace": "monitoring",
    "selfLink": "/apis/apps/v1/namespaces/monitoring/deployments/zipkin",
    "uid": "0897d35b-d26b-481d-8b93-acc6694d4dcd",
    "resourceVersion": "26976668",
    "generation": 2,
    "creationTimestamp": "2020-08-05T18:20:50Z",
    "deletionTimestamp": "2020-08-05T18:21:28Z",
    "deletionGracePeriodSeconds": 0,
    "labels": {
      "name": "zipkin"
    },
    "annotations": {
      "deployment.kubernetes.io/revision": "1"
    },
    "finalizers": [
      "foregroundDeletion"
    ]
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "name": "zipkin"
      }
    },
    "template": {
      "metadata": {
        "creationTimestamp": null,
        "labels": {
          "name": "zipkin"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "zipkin",
            "image": "openzipkin/zipkin",
            "resources": {

            },
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "Always"
          }
        ],
        "restartPolicy": "Always",
        "terminationGracePeriodSeconds": 10,
        "dnsPolicy": "ClusterFirst",
        "nodeSelector": {
          "nodegroup": "eks-monitoring"
        },
        "automountServiceAccountToken": false,
        "shareProcessNamespace": false,
        "securityContext": {

        },
        "schedulerName": "default-scheduler"
      }
    },
    "strategy": {
      "type": "RollingUpdate",
      "rollingUpdate": {
        "maxUnavailable": 1,
        "maxSurge": 2
      }
    },
    "revisionHistoryLimit": 10,
    "progressDeadlineSeconds": 600
  },
  "status": {
    "observedGeneration": 2,
    "replicas": 1,
    "updatedReplicas": 1,
    "readyReplicas": 1,
    "availableReplicas": 1,
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastUpdateTime": "2020-08-05T18:20:50Z",
        "lastTransitionTime": "2020-08-05T18:20:50Z",
        "reason": "MinimumReplicasAvailable",
        "message": "Deployment has minimum availability."
      },
      {
        "type": "Progressing",
        "status": "True",
        "lastUpdateTime": "2020-08-05T18:20:52Z",
        "lastTransitionTime": "2020-08-05T18:20:50Z",
        "reason": "NewReplicaSetAvailable",
        "message": "ReplicaSet \"zipkin-5d6d665688\" has successfully progressed."
      }
    ]
  }
}

What is this about "Has succesfully progressed" and "deployment has minimum availability", I wonder

I activated the Control plane logging in cloudwatch and find out I had a tons of cert-manager errors.
Deep in the noise I found this garbage collector error

unable to sync caches for garbage collector
timed out waiting for dependency graph builder sync during GC sync

I deleted every cert-manager resources thinking it could overload the garbage collector and boum, the deletion of resources started working again.

crazy stuff

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings