Terraform-provider-google: Support setting target_pools for underlying group at google_container_node_pool

Created on 31 Oct 2018  路  23Comments  路  Source: hashicorp/terraform-provider-google

Description

We need to provide a way of set target pool for a group that was created by google_container_node_pool
Target pool is used for external TCP Load Balancer. Right now there is no way to create Load Balancer for regional container node pool.

Affected Resource(s)

  • google_container_node_pool

Expected Behavior

Way to set target-pool for group that have been created by google_container_node_pool
This targetPool can be used for External TCP Load Balancer

Actual Behavior

No way to set target pool for underlying instance group

enhancement new-resource sizM

Most helpful comment

I'm running into this limitation myself.

The issue is that the google_compute_target_pool resource doesn't have a way of linking an instance group to the target pool. In my situation I don't want to use the backend service approach. I want a simple TCP load balancer (no proxy) meaning all I need to setup is a google_compute_forwarding_rule and a google_compute_target_pool. But, since I can't automate the linking of the google_compute_target_pool with the google_compute_instance_group that gets created when a google_container_node_pool is provisioned, i'm stuck.

I've attempted to work around this by using the TCP and HTTP proxy style of load balancers, but that's not what I'm trying to do.

And to clarify, I am able to terraform up everything and I leave the instances attribute empty on the google_compute_target_pool, and use the gcloud gui to manually add the instance groups instead of "raw instances" and it all works as expected...

All 23 comments

Hey @EssentialMix, I'm not completely sure what you're trying to do, but here are a few things that might help you out:

I'm also not a k8s expert, but I also want to point out the GKE documentation about load balancing since I know there are ways to configure it within k8s itself: https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer

Sorry for that, let me clarify. So basically if you want to create external TCP load balancer with terraform you doing something like this

resource "google_compute_forwarding_rule" "public_lb" {
  name                  = "test-lb-public"
  load_balancing_scheme = "EXTERNAL"
  target                = "${google_compute_target_pool.public_target_pool.self_link}"
  ip_address            = "${google_compute_address.public_lb_front_ip.address}"
  ip_protocol           = "TCP"
  port_range            = "80"
}

resource "google_compute_target_pool" "public_target_pool" {
  name = "test-public-target-pool"

  health_checks = [
    "${google_compute_http_health_check.test_hc.name}",
  ]
}

resource "google_compute_http_health_check" "test_hc" {
  name         = "test-hc"
  request_path = "/"
  port         = "80"
}

resource "google_compute_address" "public_lb_front_ip" {
  name  = "test-public-ip"
}

This load balancer can not have instanceGroupManager as a target, only targetPool. UI actually allowing you to chose instanceGroupManager when you creating Load Balancer, you actually updating instanceGroupManager with setTargetPools method, so all of instance that part of this group also joining target_pool
We can set TargetPool in terraform with resource google_compute_instance_group_manager using target_pools parameter:

resource "google_compute_instance_group_manager" "appserver" {
  name = "appserver-igm"
  base_instance_name = "app"
  instance_template  = "${google_compute_instance_template.appserver.self_link}"
  update_strategy    = "NONE"
  zone               = "us-central1-a"

  target_pools = ["${google_compute_target_pool.appserver.self_link}"]
  target_size  = 1

}

Problem that we really no managing google_compute_instance_group_manager with terraform, because instanceGroupManager was created automatically with google_container_node_pool. So no way to set targetPool for this groups and use them with Load Balancer. So seems like we need to have same target_pools parameter for google_container_node_pool that will be passed down to underlying instanceGroupManager

Right now in Terraform we try to stay as close as makes sense to the APIs given to us by GCP. Our philosophy is that if a workflow were recommended, there would be an API for it. Given that the GKE API does not give an option to set a target pool on a node pool, then we'd prefer to trust their judgment and not implement something that has to go an additional level deep in the terraform resource itself.

That being said, you can set the instances on the target pool resource: https://www.terraform.io/docs/providers/google/r/compute_target_pool.html#instances and use the datasource I posted earlier to get the list of instances out of the instance group.

Thanks for update, i see what you saying. Unfortunately this approach will not support autoscaling node pool, where number of nodes can be changed depending on load and nobody will run terraform for updating target pool. It just confusing that UI supporting this case, with separated API call for updating instanceGroupManager and with terraform we really can not implement this.

@EssentialMix To confirm, you should be able to look up the underlying instance group associated with a node pool and target that.

You should be able to make that group the target for your load balancer. This will work fine with autoscaling.

@morgante thats exactly my problem here: how to associate underlying instance group (that was created by k8s node pool) with targetPool, that Load Balancer will target

I think I'm having the same problem creating an external UDP load balancer with a gke back end. It is possible within the web console but I don't think it is possible to do from terraform.

@danawillow The google_container_node_pool.instance_group_urls attribute is returning a list of InstanceGroupManager urls, not InstanceGroup urls, which is preventing me from using these URLs as a target for a backend service. Is this the expected behavior? If not, where should I be reporting this bug?

@danawillow The google_container_node_pool.instance_group_urls attribute is returning a list of InstanceGroupManager urls, not InstanceGroup urls, which is preventing me from using these URLs as a target for a backend service. Is this the expected behavior? If not, where should I be reporting this bug?

This is quick work around:
backend { group = "${replace(var.internal_groups_urls[0], "instanceGroupManagers", "instanceGroups")}" }

@danawillow The google_container_node_pool.instance_group_urls attribute is returning a list of InstanceGroupManager urls, not InstanceGroup urls, which is preventing me from using these URLs as a target for a backend service. Is this the expected behavior? If not, where should I be reporting this bug?

This is quick work around:
backend { group = "${replace(var.internal_groups_urls[0], "instanceGroupManagers", "instanceGroups")}" }

Thanks :)

However, would still like to know if the behavior I described is a bug and where to report it if so.

I'm running into this limitation myself.

The issue is that the google_compute_target_pool resource doesn't have a way of linking an instance group to the target pool. In my situation I don't want to use the backend service approach. I want a simple TCP load balancer (no proxy) meaning all I need to setup is a google_compute_forwarding_rule and a google_compute_target_pool. But, since I can't automate the linking of the google_compute_target_pool with the google_compute_instance_group that gets created when a google_container_node_pool is provisioned, i'm stuck.

I've attempted to work around this by using the TCP and HTTP proxy style of load balancers, but that's not what I'm trying to do.

And to clarify, I am able to terraform up everything and I leave the instances attribute empty on the google_compute_target_pool, and use the gcloud gui to manually add the instance groups instead of "raw instances" and it all works as expected...

Same issue here. I'm having to work around it by outputting the instance group, as well as the target pool, and then run with a wrapper the gcloud cli command: gcloud -q compute instance-groups managed set-target-pools {instanceGroup} --target-pools {targetPool}.

So roughly: GKE clusters create a google_compute_instance_group_manager for each node pool configured implicitly / inside GCP. There isn't any way in terraform to take that auto-created resource, and set the target_pools property of it, so that as the instance group manager changes the instance group, it auto-updates the target pool.

If there was a way to add in a google_container_node_pool what the target pool was, which looks through the GKE cluster, finds the instance group managers and updates / syncs it with the setting, would be quite nice.

I'm curious why you're interested in interacting directly with GKE Node VMs instead of K8S Pods / Services. Even if we exposed these convenience attributes in Terraform, GKE / GKE node pools are the control plane for the child VMs and Terraform has a limited ability to manage them.

See https://github.com/terraform-providers/terraform-provider-google/issues/1480#issuecomment-503718319 for example.

What extra utility is provided by GKE Nodes vs accessing K8S Pods / Services?

I am not sure if i am getting your question. Basically use case is to create GKE cluster with custom node pool and TCP load balancer for this node pool.

Short answer k8s ingress service - is HTTP/HTTPS load balancer and we need TCP LB.

Right, Ingress is an HTTP(S) load balancer. K8S also has Service resources which can have a type of LoadBalancer, as defined in this concepts article. That ends up mapping to a GCP Network Load Balancer that Kubernetes will automatically manage, instead of needing to do this by hand.

Have you tried that K8S resource? And if so, what about it didn't work for you?

In my case there's a mix of things in how the existing infrastructure was working. I probably could try and migrate it to a k8s Service object with type: LoadBalancer but that isn't how the infrastructure works in AWS, and it's not how it was working before I ported it to being launched by terraform. A bunch of the tooling for launchign things onthe k8s wants to know the external IP of the cluster load balancer _before_ you get to launching any k8s resources. I could in theory make that launch the LoadBalancer object, then wait for it to get an external ip, then do follow up but less than ideal / a relatively long refactoring project.

Currently I also give users instructions on how to do the DNS setup after they finish the terraform running (which means they need to know the google NLB IP), and if they need to deploy most everything to the cluster before they can do that would require more modification to instructions + some pieces of initial deployment and monitoring for when things get up might not work as expected.

How does the infrastructure differ in AWS? If you're running on AWS, you'd also create a type: LoadBalancer to expose a service using a TCP load balancer.

What I would recommend is that you reserve the external IP ahead of time (this can be done via Terraform) and do any pointing you need there, then configure the k8s LoadBalancer service to use the external IP. This article outlines that process.

In AWS the network load balancer and several other pieces are also launched via Terraform, then the reserved external IP is used in various configurations for k8s services. Agreed it's in theory possible to move to an type: LoadBalancer but that takes significantly more time / effort / testing and is non-trivial to switchover to in the current usage.

Using type: LoadBalancer requires that credentials are obtained for the cluster to be able to create that object - would be simpler to automate that with terraform.

I'm curious why you're interested in interacting directly with GKE Node VMs instead of K8S Pods / Services. Even if we exposed these convenience attributes in Terraform, GKE / GKE node pools are the control plane for the child VMs and Terraform has a limited ability to manage them.

See #1480 (comment) for example.

What extra utility is provided by GKE Nodes vs accessing K8S Pods / Services?

One example use case is to find the instance tags in order to add firewall rules for the node pools.

I have a similar use case where I want to dynamically get data about the instances (in my case, network tags). I use the network tag data on my firewall rule resource.

These two data sources may help out others looking to get instance data from node pool resource.

data "google_compute_instance_group" "sample_instance_group" {
  self_link = replace(google_container_node_pool.node_pool_platform.instance_group_urls[0], "instanceGroupManagers", "instanceGroups")
}

data "google_compute_instance" "sample_instance" {
  self_link = tolist(data.google_compute_instance_group.sample_instance_group.instances)[0]
}

I'm triaging this as a new resource, adding resources like google_compute_instance_group_manager_target_pool and google_compute_region_instance_group_manager_target_pool that can set target pools on the underlying instance group managers managed by the GKE node pools. It'll fall under our team's regular triage process, so 馃憤 reactions to the parent post will help get it picked up sooner.

We don't actually expose the IGM from GKE, we've flattened it to the instance group. We probably want to also expose the raw IGM reference for node pools prior to implementing this.

@danawillow I think we should close this issue out for several reasons:

  1. @sjmiller609 You can specify custom tags on node_pool creation and reference it when you are creating the rule.
resource "google_container_node_pool" "np" {
  name       = "my-node-pool"
  location   = "us-central1-a"
  cluster    = google_container_cluster.primary.name
  node_count = 3

  node_config {
     tags = ["foo", "bar"]
   }
}

resource "google_compute_firewall" "default" {
  name    = "test-firewall"
  network = "foo"

  allow {
    protocol = "icmp"
  }

  allow {
    protocol = "tcp"
    ports    = ["80", "8080", "1000-2000"]
  }

  source_tags = google_container_node_pool.np.node_config.0.tags
}

  1. If you want a regional TCP LB, you can create a k8s service on terraform. Have a look at https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service and https://www.terraform.io/docs/providers/google/guides/using_gke_with_terraform.html .
  2. If you want a Global TCP LB, you'll need to use a NEG. The NEG is a k8s service with special annotations. https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg#attaching_a_load_balancer_to_your_standalone_negs and the links in point 2. You can talk to Google about introducing the backend service value in a annotation or status field but K8s is eventually consistent so it won't be easy to deploy via Terraform.
  3. If we introduced a field in the nodepool that calculated and returned an IGM/IG url, you will still have issues whenever you create nodes/upgrade/autoscale/downscale/delete nodes which is why the NEGs exists and reconciles automatically when a GKE node is changed. Google is also releasing features such as release channels, cluster autoscaling, auto-repair, auto-upgrade and surge upgrades which all create new VMs more frequently and will break this feature very quickly.
Was this page helpful?
0 / 5 - 0 ratings