Terraform v0.12.24
provider.google v3.13.0
provider.kubernetes v1.11.1
kubernetes_namespacegoogle_container_cluster resourceThe GKE i created using the following
# the GKE cluster
resource "google_container_cluster" "k8s_cluster" {
# ...
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
ip_allocation_policy { } # enables VPC-native
master_authorized_networks_config {
cidr_blocks {
{
cidr_block = "0.0.0.0/0"
display_name = "World"
}
}
}
# ...
}
# and then the GKE cluster node pool
resource "google_container_node_pool" "node_pool" {
# ...
cluster = google_container_cluster.k8s_cluster.name
# ...
}
Then i am calling the k8s namespace by the following
data "google_client_config" "google_client" {}
# following this example https://www.terraform.io/docs/providers/google/d/datasource_client_config.html#example-usage-configure-kubernetes-provider-with-oauth2-access-token
provider "kubernetes" {
version = "1.11.1"
load_config_file = false
host = google_container_cluster.k8s_cluster.endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = base64decode(
google_container_cluster.k8s_cluster.master_auth.0.cluster_ca_certificate
)
}
resource "kubernetes_namespace" "namespaces" {
depends_on = [google_container_node_pool.node_pool]
for_each = ["my-ns"]
metadata {
name = each.value
}
}
And then i am running terraform apply SEVERAL times.
For the first run of terraform apply it should create everything normally with:
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
For every subsequent run of terraform apply it should refresh the states of all the resources like usual and then say:
No changes. Infrastructure is up-to-date.
The refresh of kubernetes_namespace namespace is failing intermittently - sometime it works sometimes it doesn't.
For the first run of terraform apply i got this all good: โ
โ
โ
kubernetes_namespace.namespaces["my-ns"]: Creating...
kubernetes_namespace.namespaces["my-ns"]: Creation complete after 1s [id=my-ns]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
And then for the subsequent runs of terraform apply, it gets upto the refresh stage all fine: โ
โ
โ
data.google_container_cluster.k8s_cluster: Refreshing state...
kubernetes_namespace.namespaces["my-ns"]: Refreshing state... [id=my-ns]
but then, either it passes or it fails _intermittently_ with the following error: โ โ โ
Error: Get http://localhost/api/v1/namespaces/my-ns: dial tcp 127.0.0.1:80: connect: connection refused
Note that, this failuer is happenning _intermittently_. So it tells me that the public-private connectivity is fine. But then sometimes it's getting the connection sometimes it's not.
Where should i look into for this?
terraform apply
My TFState file is in the GCS bucket. Not that it matters in this issue. But just FYI.
It's facing the following errors "intermittently"
# when creating
Error: Post http://localhost/api/v1/namespaces: dial tcp 127.0.0.1:80: connect: connection refused.
# when refreshing
Error: Get http://localhost/api/v1/namespaces/my-ns: dial tcp 127.0.0.1:80: connect: connection refused
# when deleting
Error: Delete http://localhost/api/v1/namespaces/my-ns: dial tcp 127.0.0.1:80: connect: connection refused
hello.... just checking in.
Anyone has any idea about this?
hello... i have solved my problem adding load_config_file = false to my provider configuration
i have also used load_config_file = false in my provider config. I stated that in my post also.
Were you facing intermittent failures as i wrote about?
hey @syedrakib,
it seems you're querying private endpoint instead of the public one.
when you use enable_private_endpoint = false in private cluster config it means that it will create an additional public endpoint and you can get it from private_cluster_config[0].public_endpoint
try replacing host with google_container_cluster.k8s_cluster.private_cluster_config[0].public_endpoint so your config looks like the following snippet
provider "kubernetes" {
version = "1.11.1"
load_config_file = false
host = google_container_cluster.k8s_cluster.private_cluster_config[0].public_endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = base64decode(
google_container_cluster.k8s_cluster.master_auth.0.cluster_ca_certificate
)
Hi @vova-tarasov . I just check the TFState file and noticed that the value of google_container_cluster.k8s_cluster.private_cluster_config[0].public_endpoint and google_container_cluster.k8s_cluster.endpoint are exactly the same.
Hence, it's not querying the private_endpoint.

However, just for the sake of it, I changed the host value to google_container_cluster.k8s_cluster.private_cluster_config[0].public_endpoint and still got the following errors "intermittently":
# when creating
Error: Post http://localhost/api/v1/namespaces: dial tcp 127.0.0.1:80: connect: connection refused.
# when refreshing
Error: Get http://localhost/api/v1/namespaces/my-ns: dial tcp 127.0.0.1:80: connect: connection refused
# when deleting
Error: Delete http://localhost/api/v1/namespaces/my-ns: dial tcp 127.0.0.1:80: connect: connection refused
Sometimes it's hitting these errors sometimes it's not - "intermittently"
@syedrakib can you reproduce this issue for us with TF_LOG set to TRACE (https://www.terraform.io/docs/internals/debugging.html) and share the logs with us so we can try to investigate this issue further?
Dear @aareet . Here is the trace log with TF_LOG=TRACE in a updated link: https://www.dropbox.com/sh/e8i9hn7k8osar3k/AABENCs3MbcfTlCMjbcHVyJ8a
Sorry for the delay in response with this trace.
I am just running terraform apply a few times.
Sometimes it's trying to reach: 35.198.227.163 and says Post https://35.198.227.163/api/v1/namespaces: dial tcp 35.198.227.163:443: i/o timeout
Sometimes it's trying to reach: localhost and says Post http://localhost/api/v1/namespaces: dial tcp 127.0.0.1:80: connect: connection refused
--- just by running terraform apply subsequently without changing anything in the modules or in the TF files.
I don't understand why is it behaving differently upon every terraform apply?
On the first run of terraform apply, it successfully created the k8s namespace - with Host: 35.198.227.163
For several re-run of terraform apply it was fine and refreshed the namespace all well
Then on one of the runs of terraform apply (without modifying any TF file), it failed to refresh the k8s namespace - with host = localhost
Why the change from 35.198.227.163 to localhost is the problem.
Terraform is indeed getting "publicEndpoint": "35.198.227.163" from the k8s_cluster module (state id=projects/my_gcp_project_id/locations/asia-southeast1-a/clusters/gke-f2u5). But still the kubernetes provider is intermittently using localhost. Why so?
Updated link to log files: https://www.dropbox.com/sh/e8i9hn7k8osar3k/AABENCs3MbcfTlCMjbcHVyJ8a
The kubernetes provider i used is
provider "kubernetes" {
version = "1.11.1"
load_config_file = false
host = google_container_cluster.k8s_cluster.endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = base64decode(google_container_cluster.k8s_cluster.master_auth.0.cluster_ca_certificate)
}
I seem to have a similar issue after upgrading from 1.9.0 to the latest 1.11.2. I am using EKS. One difference is that in my case the message:
Get "http://localhost/api/v1/namespaces/XXXX": dial tcp 127.0.0.1:80: connect: connection refused
occurs every time I run the plan.
I'm also experiencing this issue, using 1.11.3. It's very consistent for me now, though was previously intermittent for a few patch levels now.
Edit: I found my problem was actually related to some refactoring into modules I was doing. I created a module that was using the provider, but I forgot to override the default provider configuration when I called that module. So, it was trying to use the default provider (which didn't actually exist anymore, due to refactoring to make specific providers that authenticated using different cluster's configuration details), and so there wasn't any configuration to say load_config_file = false, and it was picking up a local kubeconfig file in my path, and trying to connect to another cluster that no longer existed. Once I properly passed in configuration to point the module to use a Kubernetes provider that was properly configured to authenticate against the appropriate cluster, everything was cool again.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐ค ๐ , please reach out to my human friends ๐ [email protected]. Thanks!
Most helpful comment
I'm also experiencing this issue, using 1.11.3. It's very consistent for me now, though was previously intermittent for a few patch levels now.Edit: I found my problem was actually related to some refactoring into modules I was doing. I created a module that was using the provider, but I forgot to override the default provider configuration when I called that module. So, it was trying to use the default provider (which didn't actually exist anymore, due to refactoring to make specific providers that authenticated using different cluster's configuration details), and so there wasn't any configuration to say
load_config_file = false, and it was picking up a local kubeconfig file in my path, and trying to connect to another cluster that no longer existed. Once I properly passed in configuration to point the module to use a Kubernetes provider that was properly configured to authenticate against the appropriate cluster, everything was cool again.