Terraform-provider-kubernetes: Allow kubernetes provider to take dependencies on other providers (e.g. Google for container engine)

Created on 16 Mar 2018 · 3Comments · Source: hashicorp/terraform-provider-kubernetes

Using tf 0.11.4, and having the following configuration:

provider "kubernetes" {
  host = "${google_container_cluster.primary.endpoint}"

  client_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
  client_key = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
  cluster_ca_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}

When running terraform plan / apply The Kubernetes resources are being checked and hitting timeout when the cluster was terminated not through Terraform, instead of realising the cluster needs to be recreated we hit the following error:

Error: Error refreshing state: 3 error(s) occurred:

* kubernetes_secret.gitlab-container-registry: kubernetes_secret.gitlab-container-registry: Get https://192.168.199.100:8443/api/v1/namespaces/default/secrets/regcred: dial tcp 192.168.199.100:8443: i/o timeout
* kubernetes_namespace.test: 1 error(s) occurred:

* kubernetes_namespace.test: kubernetes_namespace.test: Get https://192.168.199.100:8443/api/v1/namespaces/testz9g4m: dial tcp 192.168.199.100:8443: i/o timeout
* kubernetes_pod.web: 1 error(s) occurred:

* kubernetes_pod.web: kubernetes_pod.web: Get https://192.168.199.100:8443/api/v1/namespaces/default/pods/web: dial tcp 192.168.199.100:8443: i/o timeout

Adding depends_on to all individual resources doesn't help either.

It would be great if the k8s provider can figure out the cluster does not exist and needs creating (when it is handled by another module / provider) instead of timing out like that and stucking the state in a limbo.

Thanks!

bug

Source

synhershko

👍1

All 3 comments

This is currently an upstream Terraform bug. This can work in some cases, but unfortunately not every case.

See:
https://stackoverflow.com/questions/50088355/terraform-how-to-create-a-kubernetes-cluster-on-google-cloud-gke-with-namespa
https://github.com/hashicorp/terraform/issues/12393
https://github.com/hashicorp/terraform/issues/4149

paultyng on 24 May 2018

😕2

My actual workaround for this is to check whether the kubernetes nodes are all in Ready state with a simple script which polls the master as shown below:

````bash

!/usr/bin/env bash

set -e

function check_deps() {
test -f $(which kubectl) || error_exit "kubectl command not detected in path, please install it"
}

define arguments

for i in "$@"
do
case ${i} in
-t=|--timeout=)
TIMEOUT="${i#=}"
shift # past argument=value
;;
-i=|--interval=)
INTERVAL="${i#=}"
shift # past argument=value
;;
-k=|--kubeconfig_path=)
KUBECONFIG_PATH="${i#=}"
shift # past argument=value
;;
-n=|--min_nodes=)
MIN_NODES="${i#=}"
shift # past argument=value
;;
*)
# unknown option
;;
esac
done

((END_TIME=${SECONDS}+${TIMEOUT}))
START_TIME=${SECONDS}
echo "The script ends at ${END_TIME}"
echo "Timeout is ${TIMEOUT}"
echo "Interval: ${INTERVAL}"

while ((${SECONDS} < ${END_TIME}))
do
healthy_cnt=$(kubectl get nodes --kubeconfig=${KUBECONFIG_PATH} | egrep "Ready" | awk '{print $1}' | wc -w)

if [[ ${healthy_cnt} -ge ${MIN_NODES} ]]
then
echo "Cluster is ready."
exit 0
fi

elapsed=${SECONDS-START_TIME}
echo "Still waiting for the Cluster to be in Ready state... Elapsed ${elapsed}sec"
sleep ${INTERVAL}
done

echo "Timeout Exceed (${TIMEOUT}sec): Cluster is not running"
exit 1
w following *null_resource*:terraform
resource "null_resource" "poll" {
count = "${var.wait_for_ready_state? 1 : 0}"

provisioner "local-exec" {
command = "sh ${var.scripts_dir}/poll_cluster.sh -k=${var.kubeconfig_filename} -t=${var.timeout} -i=${var.interval} -n=${var.nodes}"
}
}
````

finally, I make all kubernetes resources depending on that null_resource:

````terraform
resource "kubernetes_service" "api" {
metadata {
name = "backend"

labels {
  app     = "backend"
  suite   = "api"
}

}

spec {
port {
name = "http-port"
port = "${var.egress_port}"
target_port = "8080"
}

selector {
  app = "backend"
}

type = "${var.ingress_type}"

}
depends_on = ["null_resource.poll"]
}
````

DanielMorales9 on 15 May 2019

👍1

I have the same problem here. Any news for this dependencies? I create a node cluster and when the resource recreate with another node pool, kubernetes provider take the first endpoint causing a timeout