Terraform-provider-google: Unclear when to use google_container_cluster or google_container_node_pool

Created on 27 Sep 2017  ·  8Comments  ·  Source: hashicorp/terraform-provider-google

The google_container_cluster resource has a node_pool field that can be used to define the node pools of the cluster. But there's also a google_container_node_pool resource that can _also_ define node pools in a cluster. But there's no guidance on when/how to use these, whether they should be used together, or why they're separated in the first place.

documentation

Most helpful comment

For anyone else who finds this issue it looks like there is now the remove_default_node_pool parameter (#1245).

The following config will create a cluster (cluster0) with two attached node pools (nodepool{0,1}) and no default node-pool):

"resource" "google_container_cluster" "cluster0" {
  "name" = "cluster0"
  "zone" = "europe-west1-b"
  "remove_default_node_pool" = true
  "additional_zones" = ["europe-west1-c", "europe-west1-d"]
  "node_pool" = {
    "name" = "default-pool"
  }
  "lifecycle" = {
    "ignore_changes" = ["node_pool"]
  }
}

"resource" "google_container_node_pool" "nodepool0" {
  "name" = "nodepool0"
  "cluster" = "cluster0"
  "node_count" = 1
  "zone" = "europe-west1-b"
  "depends_on" = ["google_container_cluster.cluster0"]
  "node_config" = {
    "machine_type" = "f1-micro"
  }
}

"resource" "google_container_node_pool" "nodepool1" {
  "name" = "nodepool1"
  "cluster" = "cluster0"
  "node_count" = 3
  "zone" = "europe-west1-d"
  "depends_on" = ["google_container_cluster.cluster0"]
}

Updating node pool properties and adding/deleting node pools to the cluster seems to behave as expected.

I think this issue is probably still valid as it's not really clear from the docs whether this is the preferred method for managing node pools or not.

All 8 comments

I think we can probably resolve this by updating the documentation pages for both these resources to explain that google_container_cluster should manage the node pools when you have a single, authoritative list of node pools--this should, generally, be the common case. However, google_container_node_pool should be used when you want to distribute authority on node pool configuration in a cluster, e.g., when an infrastructure team manages the cluster, and then each developer team manages their own node pools, sometimes with different requirements. We should probably also note that google_container_node_pool won't remove node pools that are added outside of Terraform. And we should show how to use lifecylcle.ignore_changes to make google_container_cluster work with google_container_node_pool.

@paddycarver Just to confirm my understanding, in order to manage node pools using gcloud_container_node_pool the following steps have to be taken:

  1. Create a google_container_cluster which will create a default pool, and use lifecycle.ignore_changes to ignore the pool.
  2. Create a null_resource that will delete that pool
  3. Create any number of pools using google_container_node_pool

My reasoning is that by using lifecycle.ignore_changes, no changes can ever be done to that node pool, so it should simply be removed and replaced with a google_container_node_pool.

Are there other ways to manage updateable node pools that can be managed externally?

I think I'm finally successful with:

resource "google_container_cluster" "stateful" {
  lifecycle {
    ignore_changes = ["node_pool"]
  }
  node_pool = {}
}

This will create an extra node pool (I don't understand how null_resource can be used to delete that (that sounds awful)), but now it works as expected. If this is the correct way to go, this is the example that should be in the docs.

**EDIT: not so sure anymore, I'm giving up with separate google_container_node_pools and just in-lining them to my google_container_cluster (it's a massive list) -- I don't understand how this went this complex. There's clearly something wrong with this design/docs.

*EDIT2: well, that prevents me from removing a node pool in the future without recreating the cluster.

I may be in the minority for this, but I do think that in production you should almost always be managing your cluster and your node pool separately. Primarily because of @matti's second edit, that any changes to the node pool will require the entire cluster to go down and come back up, no zero-downtime deploys are possible. That means you're left with that pesky default node pool though. Terraform is kind of in a tough spot here I think, I think the fault really lies with GCP's inability to launch a cluster without any node pool (despite the fact that you can delete all of the node pools?).

Anyways, I posted it here too, but here's an example of how to use a null_resource to delete the default node pool after the cluster is created.

resource "google_container_cluster" "cluster" {
  name = "my-cluster"
  zone = "us-west1-a"
  initial_node_count = 1
}

resource "google_container_node_pool" "pool" {
  name = "my-cluster-nodes"
  node_count = "3"
  zone = "us-west1-a"
  cluster = "${google_container_cluster.cluster.name}"
  node_config {
    machine_type = "n1-standard-1"
  }
  # Delete the default node pool before spinning this one up
  depends_on = ["null_resource.default_cluster_deleter"]
}

resource "null_resource" "default_cluster_deleter" {
  provisioner "local-exec" {
    command = <<EOF
      gcloud container node-pools \
    --project my-project \
    --quiet \
    delete default-pool \
    --cluster ${google_container_cluster.cluster.name}
EOF
  }
}

For anyone else who finds this issue it looks like there is now the remove_default_node_pool parameter (#1245).

The following config will create a cluster (cluster0) with two attached node pools (nodepool{0,1}) and no default node-pool):

"resource" "google_container_cluster" "cluster0" {
  "name" = "cluster0"
  "zone" = "europe-west1-b"
  "remove_default_node_pool" = true
  "additional_zones" = ["europe-west1-c", "europe-west1-d"]
  "node_pool" = {
    "name" = "default-pool"
  }
  "lifecycle" = {
    "ignore_changes" = ["node_pool"]
  }
}

"resource" "google_container_node_pool" "nodepool0" {
  "name" = "nodepool0"
  "cluster" = "cluster0"
  "node_count" = 1
  "zone" = "europe-west1-b"
  "depends_on" = ["google_container_cluster.cluster0"]
  "node_config" = {
    "machine_type" = "f1-micro"
  }
}

"resource" "google_container_node_pool" "nodepool1" {
  "name" = "nodepool1"
  "cluster" = "cluster0"
  "node_count" = 3
  "zone" = "europe-west1-d"
  "depends_on" = ["google_container_cluster.cluster0"]
}

Updating node pool properties and adding/deleting node pools to the cluster seems to behave as expected.

I think this issue is probably still valid as it's not really clear from the docs whether this is the preferred method for managing node pools or not.

According to the docs, GKE chooses the master VM’s size based on the initial number of nodes, so if you’re going to have a large cluster, you may want that initial number to be bigger than 1, even though you’re going to delete it!
https://kubernetes.io/docs/admin/cluster-large/#size-of-master-and-master-components
If anyone knows this to be outdated, I’d love to hear it :)

@michaelbannister This only seems to apply when using the kube-up.sh script to manage the masters yourself on GCE. With GKE however, the masters are managed by Google, in which case it becomes their responsibility to deal with scaling to support your nodes.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings