Terraform-provider-azurerm: AKS load balancer public IP dependency problem

Created on 19 Nov 2019 · 10Comments · Source: terraform-providers/terraform-provider-azurerm

Using an azure public IP, when removing the IP from a kubernetes LoadBalancer service and deleting it, Terraform tries to delete the IP first. This fails because in Azure the IP cannot be deleted if it is still attached to the LoadBalancer service.

I can't think of any workaround for this.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.15
+ provider.azuread v0.6.0
+ provider.azurerm v1.35.0
+ provider.kubernetes v1.10.0
+ provider.local v1.4.0

Affected Resource(s)

variable "aks_configure_static_ip" {
    default = false
}

data "azurerm_kubernetes_cluster" "myfm-cluster" {
    name = "chaaksmyfm01"
    resource_group_name = data.azurerm_resource_group.project-rg.name
}

data "azurerm_resource_group" "aks-node-rg" {
  name = data.azurerm_kubernetes_cluster.myfm-cluster.node_resource_group
}

resource "azurerm_public_ip" "aks-public-ip" {

  count = var.aks_configure_static_ip ? 1 : 0
  #count = 0

  name = "chaakspubip"
  resource_group_name = data.azurerm_resource_group.aks-node-rg.name
  location = data.azurerm_resource_group.aks-node-rg.location

  allocation_method = "Static"

}

provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.myfm-cluster.kube_config.0.cluster_ca_certificate)
}

resource "kubernetes_service" "nginx_ingress_service" {

    metadata {
        name = "ingress-nginx"
        namespace = kubernetes_namespace.ingress_nginx.metadata.0.name

        labels = {
            "app.kubernetes.io/name": "ingress-nginx"
            "app.kubernetes.io/part-of": "ingress-nginx"
        }
    }

    spec {
        type = "LoadBalancer"
        external_traffic_policy = "Local"

        selector = {
            "app.kubernetes.io/name": "ingress-nginx"
            "app.kubernetes.io/part-of": "ingress-nginx"
        }

        port {
            name = "http"
            port = 80
            target_port = "http"
        }
        port {
            name = "https"
            port = 443
            target_port = "https"
        }

        load_balancer_ip = length(azurerm_public_ip.aks-public-ip.*) == 1 ? azurerm_public_ip.aks-public-ip[0].ip_address : null

    }

}

Debug Output

https://gist.github.com/erlacherl-city/d577efcc6736eb2533e3b3d43455c561

Panic Output

Expected Behavior

Terraform should redeploy the service first without the public IP, and then delete the IP.

Actual Behavior

Terraform tries to destroy the public IP first and fails.

Steps to Reproduce

Set aks_configure_static_ip variable to true
terraform apply creates the public IP and attaches it to the Kubernetes LoadBalancer service
Set aks_configure_static_ip variable to true
terraform apply tries to delete the public IP and fails because it is still attached

bug servickubernetes-cluster upstream-microsoft

Source

erlacherl-city

All 10 comments

So I've had a pretty bad time actually getting this public IP destroyed. I'll have to do some more testing but it seems AKS (at least with basic networking as I am running right now) doesn't really have all its dependencies properly set up and doesn't do the proper clean-up itself.

erlacherl-city on 19 Nov 2019

You have to first remove the services with type load balancer from the cluster using kubectl. This will remove the ip from the load balancer, and then terraform can do it's thing.

giggio on 5 Dec 2019

Yes we have this same issue. As above, you have to target destroy the NGINX Ingress first which will release control over the public IP.

PirateBread on 6 Dec 2019

👍1

Just ran into this today. FWIW, unlike @erlacherl-city we are using Azure CNI rather than basic networking.

gholmes on 15 Jan 2020

😕1

We are struggling with this gem too. We are not using kubernetes terraform provider to manage our cluster, but are using terraform to manage the infrastructure. I would expect the kubernetes provider here to make the changes suggested, removing the ingress controller before deleting the IP. In our case, it becomes a process issue, as we must manually run kubectl scripts to remove the ingress controller before removing the public IP. Issues like this have caused us to steer away from using the kubernetes and helm providers with terraform.

kaylacrowder on 21 Jan 2020

You don't need to remove the ingress controller, only services of type LoadBalancer (or, to simplify, all services). It is a one liner.

That said, I agree that the AKS provider should be able to do this by itself. It is a really simple thing to do and they already have management pods running, anyway, so add this functionality to them.

giggio on 22 Jan 2020

@jluk I'd agree with @giggio here this probably makes sense for the AKS ARM RP to do, since the Kubernetes control plane isn't necessarily available (network rules/disabled) - is there an upstream task tracking this?

tombuildsstuff on 22 Jan 2020

I just opened an issue at the AKS repo: https://github.com/Azure/AKS/issues/1405

giggio on 22 Jan 2020

Since this is a bug in AKS (rather than something we can fix directly in Terraform) I'm going to close this in favour of the upstream issue: https://github.com/Azure/AKS/issues/1405

tombuildsstuff on 14 Apr 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!