Serving: Run Knative service on specific machine type

Created on 11 Sep 2018  路  3Comments  路  Source: knative/serving

Copied from: https://issuetracker.google.com/issues/114402172

Question:

I'm evaluating https://cloud.google.com/knative/ for deploying a (non-tensorflow) machine learning inference script as a serverless function. Ideally it would be deployed on a GPU node, though a node with many CPU cores could work too. I understand it is currently impossible to use node pools or node selectors with Knative. I would like to request that feature.

(Originally asked on Stack Overflow: https://stackoverflow.com/questions/52142219/knative-run-service-on-specific-machine-type)

+clarification
I have a processing-intensive operation that is run a few times per day, on demand, exposed as an HTTP endpoint. To prevent wasting money, I'd like to turn this machine on only when it's called, and turn it off again if it hasn't been called for a few minutes. That sounds like it could be solved quite elegantly using Knative Serving's scale-from/to-zero feature.

The problem is, my processing-intensive operation needs a lot of CPU cores, or even better, a GPU. If I understand correctly, currently Knative expects a homogenous cluster, where the Knative controller/autoscaler/etc., needs to run on the same node type as the actual workload. To scale a GPU cluster from zero, the controller would also need to run on a GPU machine, which would nullify any cost savings. Is that correct?

areAPI areautoscale kinquestion

Most helpful comment

If you are using GPUs, you should be able to use resource requests to indicate that your knative workers need a GPU, i.e. fill out "resources" in your container spec:

kind: Configuration
....
spec:
  revisionTemplate:
    spec:
      container: 
        resources:
          limits:
            nvidia.com/gpu: 1

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v1-8-onwards

This would work for the GPU case, but would not work for e.g. cpu=ARM, which I don't think is exposed as a resource type.

One caution (based on your comments) -- knative assumes that all Pods backing a Revision can do an equal amount of work (currently scaled off CPU usage), so if you have substantial difference in nodes (some with GPUs and some not) or substantial non-CPU resource usage (e.g. 100% GPU but 10% CPU), the autoscaling probably won't work properly.

Note that knative will only scale Pods within a cluster, and will not automatically increase and decrease the Nodes in your cluster -- you'll need to use a separate autoscaler for that. Depending on the duration of your requests, you may also find that some time out before Node autoscaling completes if you supply more traffic than cluster can currently handle. We're exploring options to improve the speed of Node provisioning, but I expect the initial version may not work with custom resources like GPUs.

All 3 comments

If you are using GPUs, you should be able to use resource requests to indicate that your knative workers need a GPU, i.e. fill out "resources" in your container spec:

kind: Configuration
....
spec:
  revisionTemplate:
    spec:
      container: 
        resources:
          limits:
            nvidia.com/gpu: 1

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v1-8-onwards

This would work for the GPU case, but would not work for e.g. cpu=ARM, which I don't think is exposed as a resource type.

One caution (based on your comments) -- knative assumes that all Pods backing a Revision can do an equal amount of work (currently scaled off CPU usage), so if you have substantial difference in nodes (some with GPUs and some not) or substantial non-CPU resource usage (e.g. 100% GPU but 10% CPU), the autoscaling probably won't work properly.

Note that knative will only scale Pods within a cluster, and will not automatically increase and decrease the Nodes in your cluster -- you'll need to use a separate autoscaler for that. Depending on the duration of your requests, you may also find that some time out before Node autoscaling completes if you supply more traffic than cluster can currently handle. We're exploring options to improve the speed of Node provisioning, but I expect the initial version may not work with custom resources like GPUs.

This is a great use-case for why we should allow resources in the ContainerSpec of Revisions.

We actually have an issue (and PR) tracking (and adding) support for the resources block.

So I'm going to close this in favor of that tracking issue: https://github.com/knative/serving/issues/2099

Feel free to reopen, if your disagree.

Was this page helpful?
0 / 5 - 0 ratings