Ml-agents: Doubt: Tensorflow computations in GPU in docker

Created on 19 Oct 2018  路  14Comments  路  Source: Unity-Technologies/ml-agents

Hi,

Training ML Agents Agents on a Linux machine with docker speeds training time? If the machine has CUDA capability, are the tensorflow calculations done in GPU?

discussion

All 14 comments

Hi @icaro56,

You will have to be sure to install tensorflow-gpu instead to tensorflow, which is installed by default due to our requirements for the mlagents package. Currently the implementation of PPO we use does not take great advantage of a GPU. Only if you were to be large visual observations would you typically find an advantage from using GPU.

@awjuliani ,

I create a image on docker hub for tensorflow gpu: https://hub.docker.com/r/icaro56/ml-agents_images/tags/

But this image does not work. I change tensorflow to tensorflow-gpu.

It is happen this error when I try to run the train:

root@1cdd415e4b12:/workspace/unity-volume# mlagents-learn ./trainer_config.yaml --env=Bomberman --run-id=bomberman_test --train
Traceback (most recent call last):
File "/usr/local/bin/mlagents-learn", line 7, in
from mlagents.trainers.learn import main
File "/usr/local/lib/python3.6/site-packages/mlagents/trainers/__init__.py", line 4, in
from .models import *
File "/usr/local/lib/python3.6/site-packages/mlagents/trainers/models.py", line 5, in
import tensorflow.contrib.layers as c_layers
ModuleNotFoundError: No module named 'tensorflow.contrib.layers'

If you have an NVIDIA gpu you can give this Dockerfile a try and see if it works for you. It is derived from the Unity docker image but uses nvidia/cudagl/cudnn and nvidia-docker2. It will let you train using the GPU (in a headless mode if desired).

https://github.com/mneilly/linux-unity-ml-agents-nvidia-docker

Thanks @mneilly . I will try to use this.

I am having problem of timeout. Look:

gpg: keyring `/tmp/tmp.M92x8F85ox/secring.gpg' created
gpg: keyring `/tmp/tmp.M92x8F85ox/pubring.gpg' created
gpg: requesting key AA65421D from hkp server keyserver.ubuntu.com
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server ha.pool.sks-keyservers.net
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server pgp.mit.edu
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server keyserver.pgp.com
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
The command '/bin/sh -c export GNUPGHOME="$(mktemp -d)"     && (        gpg --keyserver keyserver.ubuntu.com --recv-keys "$GPG_KEY"        || gpg --keyserver ha.               pool.sks-keyservers.net --recv-keys "$GPG_KEY"        || gpg --keyserver pgp.mit.edu --recv-keys "$GPG_KEY"        || gpg --keyserver keyserver.pgp.com --recv-ke               ys "$GPG_KEY"        )  && gpg --batch --verify python.tar.xz.asc python.tar.xz         && rm -rf "$GNUPGHOME" python.tar.xz.asc        && mkdir -p /usr/src/pyth               on      && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz       && rm python.tar.xz             && cd /usr/src/python   && gnuArch="$(dpkg-archit               ecture --query DEB_BUILD_GNU_TYPE)"     && ./configure          --build="$gnuArch"              --enable-loadable-sqlite-extensions             --enable-shared -               -with-system-expat              --with-system-ffi               --without-ensurepip     && make -j "$(nproc)"   && make install         && ldconfig             &               & apt-get purge -y --auto-remove $buildDeps             && find /usr/local -depth               \(                      \( -type d -a \( -name test -o -name test               s \) \)                         -o                      \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \)           \) -exec rm -rf '{}' +  && rm -rf /usr/sr               c/python' returned a non-zero code: 2

Well... if none of the key servers are responding then it looks like you are having an issue with the network and will need to try again later...

I change the address to use the port 80 and it works.

gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys "$GPG_KEY" \
|| gpg --keyserver hkp://ha.pool.sks-keyservers.net:80 --recv-keys "$GPG_KEY" \
|| gpg --keyserver hkp://pgp.mit.edu:80 --recv-keys "$GPG_KEY" \
|| gpg --keyserver hkp://keyserver.pgp.com:80 --recv-keys "$GPG_KEY" \

I hope the image works now! :)

The image is working now. But as @awjuliani spoke, "currently the implementation of PPO we use does not take great advantage of a GPU".

Hi.
Try to use the tags gpu or cpu
icaro56/ml-agents_images:cpu
icaro56/ml-agents_images:gpu

I am using these in my research.

@icaro56 Thanks but it does not work. It gave me:

Docker image path: index.docker.io/icaro56/ml-agents_images:latest
ERROR MANIFEST_UNKNOWN: manifest unknown

Thanks for your quick reply @icaro56 . I figured it out and I deleted my message in a hurry. I'm sorry for this naive question. BTW, I'm currently working on building a docker image that includes tensorflow-gpu + nvidia-driver and X-Server in order to do the training with visual observation on a server machine. Have you ever done it before? I'm encountering some issues to build the X-Server, i.e. I'm following what it is mentioned in https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md. Here is the DockerFile I have so far. I still can't figure it out. Can anyone please help?

Have you ever done it before?

No, I have not.

I use only vector observations. And the version of docker with gpu, practically has the same speed of the version docker with cpu. :(

The @mneilly maybe can help you.

Ok @icaro56 Thanks.

@maystroh , I made new training with ml-agents with tensorflow-gpu and cpu, and what I'm seeing, the tensorflow-cpu are training faster than the GPU.

The machine I use has 8 gpu cards in parallel.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gerardsimons picture gerardsimons  路  3Comments

Rodnyy picture Rodnyy  路  3Comments

mattinjersey picture mattinjersey  路  3Comments

scotthovestadt picture scotthovestadt  路  4Comments

Hongsungchan picture Hongsungchan  路  3Comments