Ml-agents: mlagents + xvfb

Created on 4 Mar 2019  路  22Comments  路  Source: Unity-Technologies/ml-agents

Hey everyone,

I'm trying to get unity / mlagents work with xvfb, but I am failing to do so. I'm starting the environment with docker_training=True, and I get Unity environment took too long to respond issue. First I tried to do so in a docker, to setup training environment, but it failed with the above exception, then I tried the same outside of the docker (in the root ubuntu 18) it failed as well with the same exception. Outside of the docker, everything works fine with docker_training=False. I spent a day on this and I don't know how to debug it any further, and I can't find the solution on the internet (none of the solutions work for me).

Do you have any idea what am I doing wrong? How can I setup xvfb to work on the root ubuntu, outside of the docker, the one that has actual displays (not to use screens as displays but xvfb)?

help-wanted

Most helpful comment

I have been able to get xvfb working with Ubuntu 18.04 with both CPU and GPU.

Problem

apt installs xvfb version 1.19.6 which is incompatible with Ubuntu 18.04.

Solution

install an older version of xvfb (1.18.4) which works with Ubuntu 18.04. Unfortunately apt repository does not contain this version so it has be installed in a roundabout way using gdebi. Also libxfont is required to install xvfb 1.18.4. Here are the details:

CPU instructions

  1. Install gdebi and wget using the command apt install gdebi-core wget

  2. Get libxfont and xvfb packages
    wget http://security.ubuntu.com/ubuntu/pool/main/libx/libxfont/libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb
    wget http://security.ubuntu.com/ubuntu/pool/universe/x/xorg-server/xvfb_1.18.4-0ubuntu0.7_amd64.deb

  3. Install libxfont and xvfb packages
    gdebi libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb
    gdebi xvfb_1.18.4-0ubuntu0.7_amd64.deb

GPU Instructions (for Nvidia GPU's only)

  1. Follow the steps in CPU instructions

  2. Make sure CUDA + Nvidia + CUDNN are setup correctly. These instructions work on Nvidia 410.79, Cuda 10.0, CUDNN 7.

  3. Install the following libraries using apt: xorg, xorg-dev, mesa-utils, libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev
    apt install xorg, xorg-dev, mesa-utils, libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev

  4. Update the LD_LIBRARY_PATH variable. The easiest way to do that is to add this line to your .bashrc
    export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

  5. Ensure the Nvidia binaries are also available in the LD_LIBRARY_PATH. They are usually installed in /usr/local/nvidia/bin.

All 22 comments

Hi @bjg2, I think I'd probably need some more detail to help you. Have you been trying to do this on our sample environments e.g GridWorld on a docker image where GridWorld is built so that it is not headless?

Hey, thanks for your reply!

So I did the best I could to make this reproable. I built GridWorld for training, and I'm not sure what do you mean by not headless, there's no headless option, attached is what I see in build menu:
no-headless

I created repro repo mlagents-xvfb-fail. There you can hopefully repeat the steps and get the same behavior. Note that my machine has screens, and I can't get mlagents work with xvfb neither out of the docker nor inside. Desired behavior is to be able to run with xvfb both inside and outside docker.

Ping, can you take a look at this if possible? What's wrong with our setup? This is blocking us at the moment...

Hey, so I had a bit of progress regarding this one (since we're blocked on it, I have to try whatever I can to get us running).

From the repo root I run command:

exec xvfb-run --auto-servernum --server-args='-screen 0 640x480x24' ./gridworld/gridworld.x86_64 --port 20742 > out.txt

It fails instantly, but out.txt had a clue what happened:

Set current directory to /media/bjg/storage/code/mlagents-xvfb-fail
Found path: /media/bjg/storage/code/mlagents-xvfb-fail/gridworld/gridworld.x86_64
Mono path[0] = '/media/bjg/storage/code/mlagents-xvfb-fail/gridworld/gridworld_Data/Managed'
Mono config path = '/media/bjg/storage/code/mlagents-xvfb-fail/gridworld/gridworld_Data/MonoBleedingEdge/etc'
Preloaded 'ScreenSelector.so'
Preloaded 'libgrpc_csharp_ext.x64.so'
Display 0 'screen': 640x480 (primary device).
Logging to /home/bjg/.config/unity3d/Unity Technologies/Unity Environment/Player.log

And Player.log says:

Desktop is 640 x 480 @ 0 Hz
Unable to find a supported OpenGL core profile
Failed to create valid graphics context: please ensure you meet the minimum requirements
E.g. OpenGL core profile 3.2 or later for OpenGL Core renderer
Vulkan detection: 0
No supported renderers found, exiting

(Filename:  Line: 634)

With this additional info, I searched again and found some similar issues, like ml-agents + Xvfb on cloud problem and Running on docker image time out exception. These suggested reinstalling nvidia drivers without opengl as they might have been corrupting the opengl, or something like that. Tried that, and again I got the same result.

Then I tried to make repro as resilient as possible, I reinstalled ubuntu 18.04, did clean install, and took the following steps.

Install nvidia drivers (as per https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html, mentioned in issue solution):

# Update / upgrade
sudo apt-get update
sudo apt-get -y dist-upgrade
sudo apt-get install -y gcc g++ build-essential
# Copy nvidia and cuda files to desktop
cp /media/bjg/storage/NVIDIA-Linux-x86_64-418.43.run ~/Desktop
cp /media/bjg/storage/cuda_10.0.130_410.48_linux.run ~/Desktop
# NVIDIA will clash with the nouveau driver so insert following lines (not commented) to blacklist it
# blacklist nouveau
# blacklist lbm-nouveau
# options nouveau modeset=0
# alias nouveau off
# alias lbm-nouveau off
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
# Update driver thing and reboot (warning: after reboot you'll need to enter some tty)
sudo update-initramfs -u
sudo reboot
# Install nvidia drivers without opengl
sudo ~/Desktop/NVIDIA-Linux-x86_64-418.43.run --no-opengl-files
sudo reboot

At this moment I tried to run the command again, and got the same error as before. Then I messed around with opengl version:

# Install mesa-utils so I could see opengl version
sudo apt install mesa-utils
glxinfo | grep "OpenGL version"
# Version was: OpenGL version string: 3.1 Mesa 18.3.3
# I wanted to update as unity stated it required 3.2
sudo add-apt-repository ppa:ubuntu-x-swat/updates
sudo apt-get dist-upgrade
# also added MESA_GL_VERSION_OVERRIDE=4.0 to /etc/environment
sudo reboot
glxinfo | grep "OpenGL version"
# Now, version is: OpenGL version string: 4.0 (Compatibility Profile) Mesa 18.3.3

At this moment I run the command again, same issue.

I can't seem to get mlagents + xvfb going. Can you guys please help me? I spent days trying to set it up and don't know what else to try.

@bjg2 Are you able to run glxgears? If not, I would focus on that first.

$ xvfb-run -s "-screen 0 1024x768x24" glxgears
4917 frames in 5.0 seconds = 983.252 FPS
5053 frames in 5.0 seconds = 1010.401 FPS
5105 frames in 5.0 seconds = 1020.972 FPS
^C

In the meantime, I installed some libraries and now my opengl is again tied with nvidia...
image
But yes, gridworld fails as described with that one as well, and glxgears works:
image

@bjg2 are you working on your local machine or a distant machine (computer cluster for instance) ?

My local machine, for now. Plan is to setup AWS, but I want to make this thing working in local first.

I found out additional info - unity build works in ubuntu 16.04 but does not work in ubuntu 18.04. I updated the mlagents-xvfb-fail repo with even simpler examples - it all boils down to different base docker image.

Differences

Differences are described with:

cat /etc/*release | grep PRETTY_NAME
xvfb-run -s "-screen 0 640x480x24" glxinfo | grep version

Machine (Ubuntu 18.04.2 LTS) and docker (Ubuntu 18.04.1 LTS)

server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Max core profile version: 3.3
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.2.2
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.1 Mesa 18.2.2
OpenGL shading language version string: 1.40
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.2.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

Docker (Ubuntu 16.04.5 LTS)

server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Max core profile version: 3.3
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.0.5
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.0 Mesa 18.0.5
OpenGL shading language version string: 1.30
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

Conclusion?

I tried to install older versions of OpenGL / mesa but I couldn't figure out how to do so in ubuntu 18. But in both cases opengl core profile version is 3.3, and in ubuntu 18 error implies that problem might be with older OpenGl core profile (or vulkan detection 0, what's up with that line?):

Desktop is 640 x 480 @ 0 Hz
Unable to find a supported OpenGL core profile
Failed to create valid graphics context: please ensure you meet the minimum requirements
E.g. OpenGL core profile 3.2 or later for OpenGL Core renderer
Vulkan detection: 0
No supported renderers found, exiting

Note that in both of these glxgears work in xvfb. So, as far as I understand, unity build does not work in xvfb in ubuntu 18? Or have I messed up build somehow (gridworld was built with unity 2018.3.1, below is the screenshot of doing the build, and build itself is in the repo)? Please help, or point to where to ask for help, as this is dragging our development...

no-headless

Another ping. Do you consider this as mlagents issue? Or do you think this is core unity issue? Or someone elses? Who to ping for resolution?

Hi @bjg2
I am sorry for the delay in responding. A few notes:

  • The version of docker we have tried and tested Unity+ML-Agents+xvfb on is based on Ubuntu 16.04 LTS. I have tried ml-agents on 18.04 LTS previously but in headless mode. I think the reason you are not seeing the headless option is because you are building using the linux editor, is that correct?

E.g. See image

I will reach out to someone else internal reg. this as well to see if there is someone more appropriate who can help you with this.

Yes, I am building in linux editor. Would it work if I would to build on windows in headless mode? I also read somewhere that building in headless mode is not good thing to do for agents that have visual observations (as mine do), so that one would not solve the issue for me as well?

Yeah, if you are using visual observations (which you might arguably not need to), you are forced to use it in non headless mode. I think you should be fine building it on windows for linux in headless mode, the screenshot is from macOS.

Our current projects and models are revolving around visual inputs, and that's by design. If you could point me to someone to help me on setting up unity binaries working with xvfb in ubuntu 18 that would be great.

@bjg2 , everything I did was with ubuntu 16.04 so I don't have really experience with ubuntu 18

We are closing this issue due to inactivity. Feel free to reopen it if you鈥檇 like to continue the discussion.

@bjg2 , in this link you can find a singularity image that works with xvbf

Thanks! :)

Root of problem seems to be that Unity Editor runs just fine on about anything, while Unity Player has dependencies that don't match -- libgtk2 and OpenGL 3.3 are required for the player, nonsensically. Why?

I have the similar issue while I don't use ML-Agent, I just build a simple unity standalone application for Linux. And I run it with xvfb but get the same log. While my glxinfo shows that my OpenGL core profile is version 3.3. But Unity still said it can not find a supported OpenGL and render.

Desktop is 640 x 480 @ 0 Hz
Unable to find a supported OpenGL core profile
Failed to create valid graphics context: please ensure you meet the minimum requirements
E.g. OpenGL core profile 3.2 or later for OpenGL Core renderer
Vulkan detection: 0
No supported renderers found, exiting

My glxinfo is

OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 8.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.0.8
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 19.0.8
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.0.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

I have been able to get xvfb working with Ubuntu 18.04 with both CPU and GPU.

Problem

apt installs xvfb version 1.19.6 which is incompatible with Ubuntu 18.04.

Solution

install an older version of xvfb (1.18.4) which works with Ubuntu 18.04. Unfortunately apt repository does not contain this version so it has be installed in a roundabout way using gdebi. Also libxfont is required to install xvfb 1.18.4. Here are the details:

CPU instructions

  1. Install gdebi and wget using the command apt install gdebi-core wget

  2. Get libxfont and xvfb packages
    wget http://security.ubuntu.com/ubuntu/pool/main/libx/libxfont/libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb
    wget http://security.ubuntu.com/ubuntu/pool/universe/x/xorg-server/xvfb_1.18.4-0ubuntu0.7_amd64.deb

  3. Install libxfont and xvfb packages
    gdebi libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb
    gdebi xvfb_1.18.4-0ubuntu0.7_amd64.deb

GPU Instructions (for Nvidia GPU's only)

  1. Follow the steps in CPU instructions

  2. Make sure CUDA + Nvidia + CUDNN are setup correctly. These instructions work on Nvidia 410.79, Cuda 10.0, CUDNN 7.

  3. Install the following libraries using apt: xorg, xorg-dev, mesa-utils, libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev
    apt install xorg, xorg-dev, mesa-utils, libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev

  4. Update the LD_LIBRARY_PATH variable. The easiest way to do that is to add this line to your .bashrc
    export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

  5. Ensure the Nvidia binaries are also available in the LD_LIBRARY_PATH. They are usually installed in /usr/local/nvidia/bin.

@bjg2 , in this link you can find a singularity image that works with xvbf

@maystroh link is broken, could you update it?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DVonk picture DVonk  路  3Comments

gerardsimons picture gerardsimons  路  3Comments

Porigon45 picture Porigon45  路  3Comments

RavenLeeANU picture RavenLeeANU  路  4Comments

green4you picture green4you  路  4Comments