Containers-roadmap: [ECS] How to share a single GPU with multiple containers

Created on 7 Mar 2019 · 6Comments · Source: aws/containers-roadmap

Summary

I'd like to share the single GPU of a p3.2xlarge instance with multiple containers in the same task.

Description

In the ECS task definition it's not possible to indicate a single GPU can be shared between containers (or to distribute the GPU resource over multiple containers like with CPU units).

I have multiple containers that require a GPU but not at the same time. Is there a way run them in a single task on the same instance?
I've tried leaving the GPU unit resource blank but then the GPU device is not visible to the container.

ECS Proposed

Source

robvanderleek

👍3 🚀2

Most helpful comment

For future reference, my current workaround to have multiple containers share a single GPU:

On a running ECS GPU optimized instance, make nvidia-runtime the default runtime for dockerd by adding --default-runtime nvidia to the OPTIONS variable in /etc/sysconfig/docker
Save the instance to a new AMI
In CloudFormation go the Stack created by the ECS cluster wizard and update the EcsAmiId field in the initial template
Restart your services

Since the default runtime is now nvidia, all containers can access the GPU. You can leave the GPU field empty in the task definition wizard (or set it to 1 for only 1 container to make sure the task is put on a GPU instance).

Major drawback of this workaround is of course forking the standard AMI.

robvanderleek on 9 Mar 2019

👍5

All 6 comments

Hey, We don't have support for sharing a single GPU with multiple containers right now. We have marked it as feature request.

shubham2892 on 7 Mar 2019

👍2 😕1

For future reference, my current workaround to have multiple containers share a single GPU:

On a running ECS GPU optimized instance, make nvidia-runtime the default runtime for dockerd by adding --default-runtime nvidia to the OPTIONS variable in /etc/sysconfig/docker
Save the instance to a new AMI
In CloudFormation go the Stack created by the ECS cluster wizard and update the EcsAmiId field in the initial template
Restart your services

Major drawback of this workaround is of course forking the standard AMI.

robvanderleek on 9 Mar 2019

👍5

@robvanderleek: thanks for outlining this workaround for now =]

adnxn on 11 Mar 2019

@robvanderleek We have a solution for EKS now. Please let us know if you are interested in it

Jeffwan on 28 Jan 2020

👍2

Hi @Jeffwan

Thanks for the notification but we are happy with what ECS offers in general. Our inference cluster is running fine on ECS, although we have a custom AMI with the nvidia-docker hack.

Do you expect this solution to also become available for ECS?

robvanderleek on 30 Jan 2020

@robvanderleek This is implemented like a device plugin in Kubernetes. I doubt it can be used in ECS directly. But overall the GPU sharing theory is similar and I think ECS can adopt a similar solution

Jeffwan on 6 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings