Using tf 0.11.4, and having the following configuration:
provider "kubernetes" {
host = "${google_container_cluster.primary.endpoint}"
client_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
client_key = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
cluster_ca_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}
When running terraform plan / apply The Kubernetes resources are being checked and hitting timeout when the cluster was terminated not through Terraform, instead of realising the cluster needs to be recreated we hit the following error:
Error: Error refreshing state: 3 error(s) occurred:
* kubernetes_secret.gitlab-container-registry: kubernetes_secret.gitlab-container-registry: Get https://192.168.199.100:8443/api/v1/namespaces/default/secrets/regcred: dial tcp 192.168.199.100:8443: i/o timeout
* kubernetes_namespace.test: 1 error(s) occurred:
* kubernetes_namespace.test: kubernetes_namespace.test: Get https://192.168.199.100:8443/api/v1/namespaces/testz9g4m: dial tcp 192.168.199.100:8443: i/o timeout
* kubernetes_pod.web: 1 error(s) occurred:
* kubernetes_pod.web: kubernetes_pod.web: Get https://192.168.199.100:8443/api/v1/namespaces/default/pods/web: dial tcp 192.168.199.100:8443: i/o timeout
Adding depends_on to all individual resources doesn't help either.
It would be great if the k8s provider can figure out the cluster does not exist and needs creating (when it is handled by another module / provider) instead of timing out like that and stucking the state in a limbo.
Thanks!
This is currently an upstream Terraform bug. This can work in some cases, but unfortunately not every case.
See:
https://stackoverflow.com/questions/50088355/terraform-how-to-create-a-kubernetes-cluster-on-google-cloud-gke-with-namespa
https://github.com/hashicorp/terraform/issues/12393
https://github.com/hashicorp/terraform/issues/4149
My actual workaround for this is to check whether the kubernetes nodes are all in Ready state with a simple script which polls the master as shown below:
````bash
set -e
function check_deps() {
test -f $(which kubectl) || error_exit "kubectl command not detected in path, please install it"
}
for i in "$@"
do
case ${i} in
-t=|--timeout=)
TIMEOUT="${i#=}"
shift # past argument=value
;;
-i=|--interval=)
INTERVAL="${i#=}"
shift # past argument=value
;;
-k=|--kubeconfig_path=)
KUBECONFIG_PATH="${i#=}"
shift # past argument=value
;;
-n=|--min_nodes=)
MIN_NODES="${i#=}"
shift # past argument=value
;;
*)
# unknown option
;;
esac
done
((END_TIME=${SECONDS}+${TIMEOUT}))
START_TIME=${SECONDS}
echo "The script ends at ${END_TIME}"
echo "Timeout is ${TIMEOUT}"
echo "Interval: ${INTERVAL}"
while ((${SECONDS} < ${END_TIME}))
do
healthy_cnt=$(kubectl get nodes --kubeconfig=${KUBECONFIG_PATH} | egrep "Ready" | awk '{print $1}' | wc -w)
if [[ ${healthy_cnt} -ge ${MIN_NODES} ]]
then
echo "Cluster is ready."
exit 0
fi
elapsed=${SECONDS-START_TIME}
echo "Still waiting for the Cluster to be in Ready state... Elapsed ${elapsed}sec"
sleep ${INTERVAL}
done
echo "Timeout Exceed (${TIMEOUT}sec): Cluster is not running"
exit 1
w following *null_resource*:
terraform
resource "null_resource" "poll" {
count = "${var.wait_for_ready_state? 1 : 0}"
provisioner "local-exec" {
command = "sh ${var.scripts_dir}/poll_cluster.sh -k=${var.kubeconfig_filename} -t=${var.timeout} -i=${var.interval} -n=${var.nodes}"
}
}
````
finally, I make all kubernetes resources depending on that null_resource:
````terraform
resource "kubernetes_service" "api" {
metadata {
name = "backend"
labels {
app = "backend"
suite = "api"
}
}
spec {
port {
name = "http-port"
port = "${var.egress_port}"
target_port = "8080"
}
selector {
app = "backend"
}
type = "${var.ingress_type}"
}
depends_on = ["null_resource.poll"]
}
````
I have the same problem here. Any news for this dependencies? I create a node cluster and when the resource recreate with another node pool, kubernetes provider take the first endpoint causing a timeout