Terraform-provider-google: nvidia taint along custom taints in google_container_node_pool

Created on 3 Dec 2020 · 15Comments · Source: hashicorp/terraform-provider-google

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave _+1_ or _me too_ comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

terraform -v

Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/google v3.49.0
+ provider registry.terraform.io/hashicorp/google-beta v3.49.0
````

### Affected Resource(s)

`google_container_node_pool`

### Terraform Configuration Files



```tf
resource "google_container_node_pool" "gpu_pool_test" {
  ...

    taint = [
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]

....
}

Debug Output

Right now, we have a lot of pools, and with our gpu pools we have our own taints, but we need to comment this taint in the first deploy:

{
  effect = "NO_SCHEDULE"
  key    = "nvidia.com/gpu"
  value  = "present"
}

Otherwise, terraform will output the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

After the first deploy, we need to uncomment in the subsequent deploys (terraform apply), or terraform will replace the node_pool each time we run the apply command.

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

Important Factoids

Authenticating as a service account instead of a user.

persistent-bug sizS

Source

andre-lx

👍5

All 15 comments

@andre-lx help me understand how it should work after you uncomment the block?

edwardmedia on 4 Dec 2020

@andre-lx help me understand how it should work after you uncomment the block?

Hi @edwardmedia . Don't know if I understand correctly your question.

After uncomment the nvidia taint, everything works correctly in the updates.

The problem is with the first deploy using terraform apply, if the gpu pool have more than one taint.

I will provide a more extensive example:

First terraform apply:

gke-cluster.tf

resource "google_container_cluster" "gke_cluster" {
....
}

resource "google_container_node_pool" "gpu_pool" {
  name     = "gpu-pool"
  project  = project.id
  location = zone

  ...

  cluster            = google_container_cluster.gke_cluster.name

  ...

  node_config {
    machine_type = machine_type

    taint = [
      {
        key    = "my_own_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]
  }

  ...

}

This configuration, will work, and the pool is correctly created.
If I want to use my own taint in a gpu pool, I need to create the pool without the gpu taint, or terraform will output the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

The next terraform apply:

gke-cluster.tf

resource "google_container_cluster" "gke_cluster" {
....
}

resource "google_container_node_pool" "gpu_pool" {
  name     = "gpu-pool"
  project  = project.id
  location = zone

  ...

  cluster            = google_container_cluster.gke_cluster.name

  ...

  node_config {
    machine_type = machine_type

    taint = [
      {
        key    = "my_own_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
      {
       effect = "NO_SCHEDULE"
       key    = "nvidia.com/gpu"
       value  = "present"
      },
    ]
  }

  ...

}

If I don't insert the gpu taint together with our own taints like the previous file, terraform will "force replace" my pools all the time, since the taint is not available in the configurations file.

  # google_container_node_pool.gpu_pool must be replaced
-/+ resource "google_container_node_pool" "gpu_pool" {

        ......

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

           .....

That's why, I need to comment in the first deploy, and uncomment in the subsequent deploys.

An image with the terraform plan output (with the taint commented):

Screenshot 2020-12-04 at 18 02 58

andre-lx on 4 Dec 2020

@andre-lx I have tested cases by providing either one of below taint or both taint together. All tests are fine with me in the first tf apply. Can't hit your error. By changing any taint afterward, it does show force replacement in the following tf apply, which is expected. I noticed the error more than one taint with key nvidia.com/gpu, Are you aware if the key is already in place? Do you provide any other settings in the config that might affect this?

Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

resource "google_container_node_pool" "gpu_pool_test" {
  ...

    taint = [
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]

....
}

edwardmedia on 6 Dec 2020

Hi @edwardmedia .

Thanks for the quick response.

Since the nvidia taint is the default for the gpu node pools created by gke itself (even if you create the node pools manually), the only configuration missing in my examples, that can actually affect this, is the guest_accelerator, as the following example:

  node_config {
    machine_type = ....

    taint = [
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
    ]

    guest_accelerator = [
      {
        count = 1
        type  = nvidia-tesla-k80
      },
    ]
  }

Thanks!

andre-lx on 6 Dec 2020

@andre-lx below is the state from my first run. Did I miss anything? There are many incompatible configs but that seems beyond what the Terraform provider can control. If you see other cases, can you share your FULL terraform code so I can repro the issue? Another thing you may want to try is to see if you can create the pools using gcloud container ... command

resource "google_container_node_pool" "primary_preemptible_nodes" {
    cluster             = "issue7928-gke-cluster"
    id                  = "projects/myproject/locations/asia-east1-a/clusters/issue7928-gke-cluster/nodePools/issue7928-node-pool"
    initial_node_count  = 1
    instance_group_urls = [
        "https://www.googleapis.com/compute/v1/projects/myproject/zones/asia-east1-a/instanceGroupManagers/gke-issue7928-gke-cl-issue7928-node-p-8fea93f4-grp",
    ]
    location            = "asia-east1-a"
    name                = "issue7928-node-pool"
    node_count          = 1
    node_locations      = [
        "asia-east1-a",
    ]
    project             = "sunedward-1-autotest"
    version             = "1.16.15-gke.4300"
    management {
        auto_repair  = true
        auto_upgrade = true
    }
    node_config {
        disk_size_gb      = 100
        disk_type         = "pd-standard"
        guest_accelerator = [
            {
                count = 1
                type  = "nvidia-tesla-t4"
            },
        ]
        image_type        = "COS"
        labels            = {}
        local_ssd_count   = 0
        machine_type      = "n1-standard-1"
        metadata          = {
            "disable-legacy-endpoints" = "true"
        }
        oauth_scopes      = [
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
        preemptible       = true
        service_account   = "default"
        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]
        shielded_instance_config {
            enable_integrity_monitoring = true
            enable_secure_boot          = false
        }
    }
    upgrade_settings {
        max_surge       = 1
        max_unavailable = 0
    }
}

edwardmedia on 7 Dec 2020

Hi @edwardmedia.

You didn't miss anything. Bellow is my full config:

resource "google_container_cluster" "gke_cluster" {
  provider = google-beta
  name     = "my-cluster"
  project  = "my-project"
  location = "europe-west1-b"

  min_master_version = "1.16.15-gke.4300"
  network            = google_compute_network.vpc_gke_cluster.name
  subnetwork         = google_compute_subnetwork.subnet_gke_cluster.name
  networking_mode    = "VPC_NATIVE"

  remove_default_node_pool = true
  initial_node_count       = 1

  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

  ip_allocation_policy {
    cluster_ipv4_cidr_block  = "/20"
    services_ipv4_cidr_block = "/20"
  }

  resource_labels = {
    "application" = "my_platform"
  }

  master_auth {

    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
    cluster             = google_container_cluster.gke_cluster.name
    initial_node_count  = 1

    location            = "europe-west1-b"
    name                = "issue7928-node-pool"

    project             = "my-project"
    version             = "1.16.15-gke.4300"
    management {
        auto_repair  = true
        auto_upgrade = true
    }
    node_config {
        disk_size_gb      = 100
        disk_type         = "pd-standard"
        guest_accelerator = [
            {
                count = 1
                type  = "nvidia-tesla-k80"
            },
        ]
        image_type        = "COS"
        labels            = {}
        local_ssd_count   = 0
        machine_type      = "n1-standard-1"
        metadata          = {
            "disable-legacy-endpoints" = "true"
        }
        oauth_scopes      = [
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
        preemptible       = true
        service_account   = "default"
        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]
        shielded_instance_config {
            enable_integrity_monitoring = true
            enable_secure_boot          = false
        }
    }
    upgrade_settings {
        max_surge       = 1
        max_unavailable = 0
    }
}

I just copy and paste your google_container_node_pool in my files and run tf apply. The follow error occured:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

The full tf apply output:

Terraform will perform the following actions:

  # google_container_node_pool.primary_preemptible_nodes will be created
  + resource "google_container_node_pool" "primary_preemptible_nodes" {
      + cluster             = "my-cluster"
      + id                  = (known after apply)
      + initial_node_count  = 1
      + instance_group_urls = (known after apply)
      + location            = "europe-west1-b"
      + max_pods_per_node   = (known after apply)
      + name                = "issue7928-node-pool"
      + name_prefix         = (known after apply)
      + node_count          = (known after apply)
      + node_locations      = (known after apply)
      + project             = "my-project"
      + version             = "1.16.15-gke.4300"

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + node_config {
          + disk_size_gb      = 100
          + disk_type         = "pd-standard"
          + guest_accelerator = [
              + {
                  + count = 1
                  + type  = "nvidia-tesla-k80"
                },
            ]
          + image_type        = "COS"
          + labels            = (known after apply)
          + local_ssd_count   = 0
          + machine_type      = "n1-standard-1"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/logging.write",
              + "https://www.googleapis.com/auth/monitoring",
            ]
          + preemptible       = true
          + service_account   = "default"
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "nvidia.com/gpu"
                  + value  = "present"
                },
            ]

          + shielded_instance_config {
              + enable_integrity_monitoring = true
              + enable_secure_boot          = false
            }

          + workload_metadata_config {
              + node_metadata = (known after apply)
            }
        }

      + upgrade_settings {
          + max_surge       = 1
          + max_unavailable = 0
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "my-workspace"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_container_node_pool.primary_preemptible_nodes: Creating...

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

Creating the pool using the gcloud container command, with the same service account as terraform (also tested with my admin account using email):

gcloud container node-pools create issue7928-node-pool --accelerator type=nvidia-tesla-t4,count=1 --cluster my-cluster --machine-type n1-standard-1 --zone europe-west1-b --node-taints nvidia.com/gpu=present:NoSchedule

Output:

ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE.

This makes sense, since the nvidia taint is already added by default on gpu node pools by the gke itself.

On the terraform side, if you don't add this taint the gpu pool is created successfully. The problem, as I already described, on the updates, since terraform always show the "forces replacement".

It's important to refer, that, if you don't need to use custom taints (so, without specifying the taint block in the config file), the creation and updates works fine at the moment, and the nvidia taint is added by terraform to the state file, as showing bellow.

First tf apply:

Terraform will perform the following actions:

  # google_container_node_pool.primary_preemptible_nodes will be created
  + resource "google_container_node_pool" "primary_preemptible_nodes" {
      + cluster             = "my-cluster"
      + id                  = (known after apply)
      + initial_node_count  = 1
      + instance_group_urls = (known after apply)
      + location            = "europe-west1-b"
      + max_pods_per_node   = (known after apply)
      + name                = "issue7928-node-pool"
      + name_prefix         = (known after apply)
      + node_count          = (known after apply)
      + node_locations      = (known after apply)
      + project             = "my-project"
      + version             = "1.16.15-gke.4300"

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + node_config {
          + disk_size_gb      = 100
          + disk_type         = "pd-standard"
          + guest_accelerator = [
              + {
                  + count = 1
                  + type  = "nvidia-tesla-k80"
                },
            ]
          + image_type        = "COS"
          + labels            = (known after apply)
          + local_ssd_count   = 0
          + machine_type      = "n1-standard-1"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/logging.write",
              + "https://www.googleapis.com/auth/monitoring",
            ]
          + preemptible       = true
          + service_account   = "default"
          + taint             = (known after apply)

          + shielded_instance_config {
              + enable_integrity_monitoring = true
              + enable_secure_boot          = false
            }

          + workload_metadata_config {
              + node_metadata = (known after apply)
            }
        }

      + upgrade_settings {
          + max_surge       = 1
          + max_unavailable = 0
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "my-workspace"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_container_node_pool.primary_preemptible_nodes: Creating...
....
google_container_node_pool.primary_preemptible_nodes: Still creating... [1m20s elapsed]
google_container_node_pool.primary_preemptible_nodes: Creation complete after 1m24s [id=projects/my-project/locations/europe-west1-b/clusters/my-cluster/nodePools/issue7928-node-pool]

Subsequent tf apply (with the taint block comment or uncomment):

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Getting the terraform state show google_container_node_pool.primary_preemptible_nodes for a pool without the taint block, you see that the taint nvidia is added to the state file:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

In the next tf apply, terraform checks that the resource in gke it's equal to the state file, and no replacement is needed.

...

Getting the terraform state show google_container_node_pool.primary_preemptible_nodes3 for a pool with the taint block, but only with the custom taint, you can also see that the nvidia taint being add to the state file along with the custom one:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

So, it's really strange, that terraform thinks that the gpu pool needs a replacement:

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

The question is, why terraform forces replacement of an array that is equal to the same resource in the state file using custom taints? Since with only the nividia taint, the taint array are successfully added to the state file, and in the subsequent tf apply they match perfectly, so no replacement is needed.

Thanks!

andre-lx on 7 Dec 2020

@andre-lx forceReplacement on taint is by design. Can you explain why it should not trigger node pool recreation?
Do you still have questions regarding Found more than one taint with key nvidia.com/gpu...? I think running gcloud command... has explained why.

edwardmedia on 7 Dec 2020

Hi @edwardmedia.

In short,
Since I can't create the node pool with the nvidia taint, since it's a default from gke, how can I prevent the pool recreation each time I run tf apply? How can I set custom taints at the same time as the nvidia taint? Right now, as I said, I need to comment the nvidia taint on pool creation, and uncomment the nivida taint in the subsequent apply to ensure that the pool is not recreated. After this two steps, I can run tf apply forever and the pool is never recreated.

Why the pool is recreated if the nvidia taint is the default by gke?

And, why the pool is not recreated if no custom taints are used (or better, if only the nvidia taint exists).

andre-lx on 7 Dec 2020

@andre-lx I am not sure if I understand what you said correctly. In my tests, I have tried to put 1) both nvidia and a customer taint together 2) either one of taint in new node pools. All 3 cases were fine. No exceptions were received. I don't understand what you meant below.

Since I can't create the node pool with the nvidia taint, ...

Where do you see nvidia taint is the default by gke? Can you share a document?

From the provider's perspective, any changes on taints will trigger pool recreation because I don't see GCP API provides a way you can use to update taints directly. Instead, if you run kubectl, you can update the taints, but that is not what Terraform can manage. Does this make sense to you?

edwardmedia on 7 Dec 2020

@edwardmedia the nvidia taint is created by default on gpu node pools as you can see here:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create

That't why (I think), I can't add the taint in node pools at creation time, as I explained in the other comments, and that's why terraform and gcloud give me the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

Because of this I don't understand how did you manage to create the gpu pool with the nvidia taint specified.

I understand, that if you change the taints in both google platform, or via terraform, the terraform will recreate the pool, that's make a lot of sense and I was not expecting another way (since the state file is different from the resource itself). The problem here, is that, using customer specific taints, I can't create the pool with the nvidia taint, and I can't tf apply an unchanged pool without specifiying the nvidia taint after creation.

And that's why, I need to comment the nvidia taint on creation (since this is added by gke itself), and uncomment the nvidia taint in the subsequent tf apply.

I will put this to kind of examples, maybe makes it easy:

1 - No taints in config file:
1.1 - I create the pool with no taints (taint = [])
1.2 - The pool is created successfully, and the nvidia taint is added to the state file (again, since this is created automatically by gke)
1.3 - All the future tf apply will work perfectly, since the taint is in the state file, as well in the gke.

2 - with both nvidia and costumer specific taint:
2.1 - I try to create the pool, but the pool can't be crated because of the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

2.2 - Solution: create the pool has above (example 3)

3 - Only with one costumer specific taint in the config:
3.1 - I create with a taint like this:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
       ]

3.2 - The pool is created successfully, and the costumer specific taint as well the nvidia taint is added to the state file (again, since this is created automatically by gke)
3.3 - All the future tf apply, will ask for pool replacement. Why? That's is the part that don't make sense, the state file includes the nvidia taint as well the costumer specific created in step 3.1.
3.4 - Solution: add the nvidia taint to the taint block:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

3.5 - All the future tf applywill work perfectly.

andre-lx on 7 Dec 2020

@andre-lx I see. Thanks for the link. In my tests, all node pools were added to a new cluster, which is different from adding pools to an existing cluster. That explains why it works for mine and it not for yours

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints

All behaviors you have experienced appear to be controlled by gke/kubenetes. I don't think the provider has much space to do. I am glad you have found a workaround

edwardmedia on 7 Dec 2020

@andre-lx closing this issue then. Feel free to reopen if you see there is something the provider can help. Thank you

edwardmedia on 7 Dec 2020

👎1

I don't think we should close this, as it's still a fundamental issue with how the provider interacts with the GKE API. There is no way to get it working without manual workarounds (commenting/uncommenting) today.

The provider should be smart enough to ignore the GPU taint, either on first creation/apply or when diffing backwards later.

/cc @rileykarson

morgante on 7 Dec 2020

👍1

I can't agree more with @morgante.

I know this is a limitation from gcloud, but this can be changed in the provider side.

A simple and effective solution can be (based on my experience):

If it is a gpu pool (have the guest_accelerator defined) in the google_container_node_pool, you can have 3 types of config:

the nvidia taint is specified in the taint block along another specific taint by the developer -> the provider should ignore the nvidia taint in creation, and leave it in the next deploys
the nvidia taint is not specified in the taint block, but another taint exists created by the developer -> since the nvidia taint is created by gke and already added automatically to the state file, the provider should add the taint in the following tf apply under the hoods
no taints are specified -> this works fine right now! And this is something that doesn't make a lot of sense to me, since the state file include the nvidia taint defined, but not the configuration file, but this don't need a force replacement in the following tf apply