Terraform-provider-kubernetes: Feature Request: support container extended resources like nvidia/gpu

Created on 18 Mar 2018 · 12Comments · Source: hashicorp/terraform-provider-kubernetes

Terraform Version
v0.11.4

Affected Resource(s)

kubernetes_pod

Expected Behavior
Kubernetes version 1.8 introduced Extended Resources, required for example to request allocation of Nvidia GPUs to containers. This option is currently missing from the terraform kubernetes provider.

resource "kubernetes_pod" "test" {
  ...
  spec {
    container {
      ...
      resources {
        requests {
          "nvidia/gpu" = "1"
        }
        limits {
          "nvidia/gpu" = "1"
        }
      }
    }
  }
}

References

acknowledged breaking-change enhancement needs investigation sizM themcoverage

Source

allxone

👍22 ❤2

Most helpful comment

@jrhouston any updates on this? :)

tarrencev on 8 Dec 2020

👍8

All 12 comments

Just checking whether there is a reason this is not yet supported (other than perhaps not being a high priority)? Is anyone aware of a work-around to request GPUs for a container?
The only thing I'm aware of is using the third-party provider that has support for native YAML-based Kubernetes configurations (https://github.com/ericchiang/terraform-provider-k8s), though I much prefer the official provider.

dpad on 15 Aug 2019

👍2

I added a quick and dirty way to do this in the pull request #591 --^ Seems to work for me, so feel free to use it and merge/decline as desired.

dpad on 15 Aug 2019

Any updates on adding this feature? kubernetes has had gpu scheduling for a while now but the terraform provider does not support it.

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

tashby84 on 4 Dec 2019

Anyone has a good workaround for this? I have to manually remove the flag to be able to update my deployment. Ohterwise I get

Error: Invalid address to set: []string{"spec", "0", "template", "0", "spec", "0", "container", "0", "resources", "0", "limits", "0", "nvidia.com/gpu"}

Is there a way to ignore that flags or something like that?

landorg on 9 Mar 2020

👍1

Please merge this.

conet on 17 Apr 2020

👍5 👀1

At least add a fix which will allow the provider to run when the extra resource request is set on a resource managed by terraform instead of failing with the error described by https://github.com/terraform-providers/terraform-provider-kubernetes/issues/149#issuecomment-596788250.

conet on 20 Apr 2020

I can't believe that an issue with an open PR sits idle like this, seems like nobody uses GPUs in a terraform managed workload. At least please fix the error in https://github.com/terraform-providers/terraform-provider-kubernetes/issues/149#issuecomment-596788250 so that regular operations don't fail if the nvidia.com/gpu resource request is added outside of terraform.

conet on 11 May 2020

We would need this as well. Makes us a lot of pain. Thanks for investigating in it and we hope it can be merged soon.

phlegx on 19 May 2020

Also would need support for other resource requests like local-ephemeral-storage.

Error: Invalid address to set: []string{"spec", "0", "template", "0", "spec", "0", "container", "0", "resources", "0", "limits", "0", "ephemeral-storage"}

Even if you don't add support at least remove these errors so that the resource requests can be added outside of terraform and keep working with terraform, otherwise this becomes a real pain. @alexsomesan could you please take a look at this?

conet on 21 May 2020

Sorry for the wait on this one folks, this just landed at the top of our backlog. You can read about our new process here if you are interested in how we triage these things.

I had a look at the code for this. It seems like the schema for Container was implemented without realizing that the limits and requests fields of the Kubernetes API are actually just maps that accept key value pairs that are either:

In the list of standard resources, or
An extended resource in the format foo.bar/baz

So this attribute should actually be a Map with a custom validation function that verifies the above constraints. However, this is going to be a breaking change to the schema for Container and making this change will create diffs for everyone using any resource with a container in it - so we have to hold off on making this change for a major version release of this provider. We are currently collecting breaking changes for v2 of the provider, so this should happen reasonably soon although I can't specify a date.

~In the mean time - given that the reported use cases for this feature are few enough - I am open to hard-coding these additional attributes into the container schema. I would prefer to do this rather than add an additional attribute that we will deprecate in the next major version. The Kubernetes API does this all inside one field, so the provider should have parity with that.~

~This will mean adding the following attributes to the schema for requests & limits in the resources attribute:~

~nvidia.com/gpu~
~amd.com/gpu~
~ephemeral-storage~

edit: I just realized that we won't even be able to hard-code these resources as arguments to the schema because they contain punctuation, so we will have to add an additional field that is of type Map to support the extended resources, and then flatten it into a single Map in v2.