I'd like to share the single GPU of a p3.2xlarge instance with multiple containers in the same task.
In the ECS task definition it's not possible to indicate a single GPU can be shared between containers (or to distribute the GPU resource over multiple containers like with CPU units).
I have multiple containers that require a GPU but not at the same time. Is there a way run them in a single task on the same instance?
I've tried leaving the GPU unit resource blank but then the GPU device is not visible to the container.
Hey, We don't have support for sharing a single GPU with multiple containers right now. We have marked it as feature request.
For future reference, my current workaround to have multiple containers share a single GPU:
dockerd by adding --default-runtime nvidia to the OPTIONS variable in /etc/sysconfig/dockerSince the default runtime is now nvidia, all containers can access the GPU. You can leave the GPU field empty in the task definition wizard (or set it to 1 for only 1 container to make sure the task is put on a GPU instance).
Major drawback of this workaround is of course forking the standard AMI.
@robvanderleek: thanks for outlining this workaround for now =]
@robvanderleek We have a solution for EKS now. Please let us know if you are interested in it
Hi @Jeffwan
Thanks for the notification but we are happy with what ECS offers in general. Our inference cluster is running fine on ECS, although we have a custom AMI with the nvidia-docker hack.
Do you expect this solution to also become available for ECS?
@robvanderleek This is implemented like a device plugin in Kubernetes. I doubt it can be used in ECS directly. But overall the GPU sharing theory is similar and I think ECS can adopt a similar solution
Most helpful comment
For future reference, my current workaround to have multiple containers share a single GPU:
dockerdby adding--default-runtime nvidiato the OPTIONS variable in/etc/sysconfig/dockerSince the default runtime is now nvidia, all containers can access the GPU. You can leave the GPU field empty in the task definition wizard (or set it to 1 for only 1 container to make sure the task is put on a GPU instance).
Major drawback of this workaround is of course forking the standard AMI.