Terraform-provider-kubernetes: k8s Provider stuck: dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"

Created on 30 Oct 2020  路  11Comments  路  Source: hashicorp/terraform-provider-kubernetes

Terraform Version and Provider Version

Terraform version: 0.12.24
Go runtime version: go1.14.3
terragrunt version v0.23.10
kubernetes provider api version  "client.authentication.k8s.io/v1alpha1"
terraform-provider-kubernetes_v1.13.3_x4
eks  - v 1.16

Affected Resource(s)

  • kubernetes_config_map
  • potentially something deeper
  • "provider.kubernetes (close)"

Terraform Configuration Files

tf_configs.zip

Debug Output

2020/10/30 14:26:38 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:26:38 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:26:38 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:26:38 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:26:43 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:26:43 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:26:43 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:26:43 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
kubernetes_config_map.dashboard_recommender[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_toolbox[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_error_function_timings[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_service_message[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_kafka_bans[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_instagram[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_kafka_skiplist_consumer[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_service_threshold[0]: Still creating... [1h57m43s elapsed]
kubernetes_config_map.dashboard_gateway_stats[0]: Still creating... [1h57m44s elapsed]
kubernetes_config_map.dashboard_install_cache[0]: Still creating... [1h57m44s elapsed]
2020/10/30 14:26:48 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:26:48 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:26:48 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:26:48 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:26:53 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:26:53 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:26:53 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:26:53 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
kubernetes_config_map.dashboard_instagram[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_service_message[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_kafka_bans[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_service_threshold[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_toolbox[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_error_function_timings[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_recommender[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_kafka_skiplist_consumer[0]: Still creating... [1h57m53s elapsed]
kubernetes_config_map.dashboard_gateway_stats[0]: Still creating... [1h57m54s elapsed]
kubernetes_config_map.dashboard_install_cache[0]: Still creating... [1h57m54s elapsed]
2020/10/30 14:26:58 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:26:58 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:26:58 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:26:58 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:27:03 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:27:03 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:27:03 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:27:03 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
kubernetes_config_map.dashboard_instagram[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_toolbox[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_service_threshold[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_kafka_skiplist_consumer[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_service_message[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_error_function_timings[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_recommender[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_kafka_bans[0]: Still creating... [1h58m3s elapsed]
kubernetes_config_map.dashboard_install_cache[0]: Still creating... [1h58m4s elapsed]
kubernetes_config_map.dashboard_gateway_stats[0]: Still creating... [1h58m4s elapsed]
2020/10/30 14:27:08 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:27:08 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
2020/10/30 14:27:08 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:27:08 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:27:13 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/10/30 14:27:13 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "kubernetes_config_map.dashboard_appsflyer_producer[0]"
2020/10/30 14:27:13 [TRACE] dag/walk: vertex "kubernetes_config_map.dashboard_kafka_saved_rate_consumer[0]" is waiting for "kubernetes_config_map.dashboard_kafka_saved_rate_consumer (prepare state)"
2020/10/30 14:27:13 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_config_map.dashboard_toolbox[0]"
kubernetes_config_map.dashboard_service_message[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_toolbox[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_service_threshold[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_recommender[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_error_function_timings[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_instagram[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_kafka_skiplist_consumer[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_kafka_bans[0]: Still creating... [1h58m13s elapsed]
kubernetes_config_map.dashboard_install_cache[0]: Still creating... [1h58m14s elapsed]

Panic Output

Expected Behavior

What should have happened?

When applying the module, the 47 configmaps, which do not depend on each other should be created.

Actual Behavior

What actually happened?

After creating 8 or so configmaps, the remaing maps are stuck waiting for a k8s connection to close.

Steps to Reproduce


Apply the attached module with these variables set to test values

  datasource_names_be 
  datasource_name_graphite 
  datasource_name_prometheus_with_kafka
  notifier_uuids_pd  
  notifier_uuids_eng_alerts        
  notifier_uuids_eng_de_alerts        
  notifier_uuids_pd_eng_alerts       
  notifier_uuids_all               
  has_graphite_datasource       
  dashboards_path          

cluster_context should reference a context in your kube config
namespace should be a valid namespace within that cluster

Important Factoids

When I developed this module I created each configmap one at a time and applied it. It worked fine. When I destroyed the module and then recreated it, that is when this issue occurred.

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
bug needs investigation

Most helpful comment

for what it wort: I had the same errors on refreshing kubernetes secrets. I'm on EKS 1.17. As soon as I pinned the terraform kubernetes provider version from 1.13.3 back to 1.11.1. My stack completed without any errors.

So this solved it for me:

provider "kubernetes" {
  # version = "1.13.3"
  version = "1.11.1"
}

All 11 comments

@owenmonagan do you think you're hitting some sort of rate limit with EKS? We weren't able to reproduce this with kind. We'll need to test further, but can you give us a debug log with TF_LOG=TRACE ahead of your terraform apply so we can see what's happening?

@aareet here are my logs. from start to finish (well hitting the stuck state).
dashboard_debug.txt

@owenmonagan I've check the logs and I'm seeing this kind of API responses on some of the ConfigMaps:

2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: ---[ RESPONSE ]--------------------------------------
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: HTTP/2.0 409 Conflict
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Content-Length: 274
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Audit-Id: a336c68c-7c3c-49a3-9ab3-43ebf1cc5b2c
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Cache-Control: no-cache, private
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Content-Type: application/json
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Date: Thu, 05 Nov 2020 17:55:17 GMT
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: 
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: {
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "kind": "Status",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "apiVersion": "v1",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "metadata": {},
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "status": "Failure",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "message": "configmaps \"dashboard-dev-ue1-eks-01-service-threshold\" already exists",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "reason": "AlreadyExists",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "details": {
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:   "name": "dashboard-dev-ue1-eks-01-service-threshold",
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:   "kind": "configmaps"
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  },
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4:  "code": 409
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: }
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: 
2020-11-05T12:55:17.972-0500 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: -----------------------------------------------------

Can you check your configuration and make sure the resource naming scheme doesn't produce duplicates somehow?

logs.txt

I had not cleaned up since the last run. This one shows no such error.

I also created a new module which runs into the same issue:
logs_new_service.txt

Again, this works fine when applying one configmap, but breaks when applying multiple.
module.zip

My suspicion is that the issue may be related to the size of the configmaps.

They are very large. Potentially each configmap is being applied in batches?

Could it be that this large file application has a different close signal or behavior?

This is a guess.

@aareet @alexsomesan

I have a similar problem but my is related to k8s deployment:

2020-11-10T07:38:34.508Z [TRACE] plugin.terraform-provider-kubernetes_v1.13.3_x4: plugin received interrupt signal, ignoring: count=1 timestamp=2020-11-10T07:38:34.508Z
2020/11/10 07:38:34 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:38:35 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:38:39 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:38:40 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:38:44 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:38:45 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:38:49 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:38:50 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:38:54 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:38:55 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:38:59 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:39:00 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:39:04 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:39:05 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:39:09 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:39:10 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
2020/11/10 07:39:14 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:39:15 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
Two interrupts received. Exiting immediately. Note that data
2020-11-10T07:39:17.496Z [TRACE] plugin.terraform-provider-kubernetes_v1.13.3_x4: plugin received interrupt signal, ignoring: count=2 timestamp=2020-11-10T07:39:17.495Z
loss may have occurred.
2020/11/10 07:39:19 [TRACE] dag/walk: vertex "root" is waiting for "provider.kubernetes (close)"
2020/11/10 07:39:20 [TRACE] dag/walk: vertex "provider.kubernetes (close)" is waiting for "kubernetes_deployment.metrics-scraper-deployment"
Error: operation canceled

2020/11/10 07:39:22 [ERROR] <root>: eval: *terraform.EvalRefresh, err: rpc error: code = Unavailable desc = transport is closing
2020/11/10 07:39:22 [ERROR] <root>: eval: *terraform.EvalSequence, err: rpc error: code = Unavailable desc = transport is closing
2020/11/10 07:39:22 [TRACE] [walkRefresh] Exiting eval tree: kubernetes_deployment.metrics-scraper-deployment
2020/11/10 07:39:22 [TRACE] vertex "kubernetes_deployment.metrics-scraper-deployment": visit complete
2020/11/10 07:39:22 [TRACE] vertex "kubernetes_deployment.metrics-scraper-deployment": dynamic subgraph encountered errors
2020/11/10 07:39:22 [TRACE] vertex "kubernetes_deployment.metrics-scraper-deployment": visit complete
2020/11/10 07:39:22 [TRACE] dag/walk: upstream of "provider.kubernetes (close)" errored, so skipping
2020/11/10 07:39:22 [TRACE] dag/walk: upstream of "root" errored, so skipping
Error: rpc error: code = Unavailable desc = transport is closing

What is strange when I created defined resources for the first time it worked fine but during second and next planning stage it hangs up.

for what it wort: I had the same errors on refreshing kubernetes secrets. I'm on EKS 1.17. As soon as I pinned the terraform kubernetes provider version from 1.13.3 back to 1.11.1. My stack completed without any errors.

So this solved it for me:

provider "kubernetes" {
  # version = "1.13.3"
  version = "1.11.1"
}

That worked for me @walsw. Thank you!

same (or similar) issue was resolved be downgrading to 1.13.2
breaks in 1.13.3

This seems to work on the latest version of the Kubernetes and Helm providers (2.0.2). Maybe the bug has been fixed in one of the dependency updates since this issue was filed. I applied the reproducer config, then did a terraform destroy, then applied it again. Logs are here. I was using minikube and a local state instead of EKS and S3. Can you confirm if the issue is resolved for you too?

If not, I suspect there may be a problem with one of the dependency libraries opening too many connections to EKS, perhaps related to the exec block in the provider config. But that theory would need testing.

Was this page helpful?
0 / 5 - 0 ratings