Azure-kinect-sensor-sdk: [Remote] depth_engine_start_helper reports error 204 ... GPU API failure ... at start of streaming

Created on 1 Oct 2019 · 27Comments · Source: microsoft/Azure-Kinect-Sensor-SDK

I have a 2 synchronized Azure Kinects and a fully working test setup. When I attempt to integrate the Kinects (on the same hardware system where my test works) within a larger real-time application, the cameras consistently fail at the moment when k4a_device_start_cameras() is called. I get an error message on stdout as below:

[2019-10-01 11:49:11.734] [error] [t=4190] /home/vsts/work/1/s/extern/Azure-Kinect-Sensor-SDK/src/dewrapper/dewrapper.c (154): depth_engine_start_helper(). Depth engine create and initialize failed with error code: 204.

The wrapper layer that handles all calls to/from the k4a library routines is the same as the one I use in my test program .... and the test program works perfectly. It seems to me that this issue must be resource-related or possibly a bug.

My host system is an Intel Core-i7 with 16 Gbyte RAM running Ubuntu 18.04LTS
The SDK is version 1.2
Kinect device firmware is as below:
== Azure Kinect DK Firmware Tool ==
Device Serial Number: 000261592512
Current Firmware Versions:
RGB camera firmware: 1.6.102
Depth camera firmware: 1.6.75
Depth config file: 6109.7
Audio firmware: 1.6.14
Build Config: Production
Certificate Type: Microsoft

Device Serial Number: 000441592912
Current Firmware Versions:
RGB camera firmware: 1.6.102
Depth camera firmware: 1.6.75
Depth config file: 6109.7
Audio firmware: 1.6.14
Build Config: Production
Certificate Type: Microsoft

I don't know if this is useful, but I tried running the program "deversion" from the SDK and it produced the following report:
~/Azure-Kinect-Sensor-SDK/src$ ../build/bin/deversion
[2019-10-01 12:59:13.968] [error] [t=5362] ../src/dynlib/dynlib_linux.c (86): dynlib_create(). Failed to load shared object libdepthengine.so.0.0 with error: libdepthengine.so.0.0: cannot open shared object file: No such file or directory
[2019-10-01 12:59:13.968] [error] [t=5362] ../src/dynlib/dynlib_linux.c (86): dynlib_create(). Failed to load shared object libdepthengine.so.1.0 with error: libdepthengine.so.1.0: cannot open shared object file: No such file or directory
Found Depth Engine Pluging Version: 2.0.0 (current)

Any suggestions for getting past this issue would be greatly appreciated.

   Steve B

Bug Linux Triage Approved

Source

StevenButner

👀1

Most helpful comment

This is a late comment, but it looks like EGL provides a way to get an OpenGL context in headless environments w/o Xorg running:

https://devblogs.nvidia.com/linking-opengl-server-side-rendering/
https://devblogs.nvidia.com/egl-eye-opengl-visualization-without-x-server/
https://stackoverflow.com/a/3331769/7553580

I am working in robotics and working on integrating the K4A into a headless rover system (Jetson Xavier / ROS). The workaround of allowing the window manager / X to run in the background and just setting the DISPLAY environmental variable is working for me, but ideally we would run completely headless. The articles linked make is seem like making the switch to EGL to grab OpenGL contexts is fairly straight forward, but since libdepthengine is closed source that is hard to know.

haljarrett on 30 Mar 2020

👍3

All 27 comments

It seems like you are missing libdepthengine.so.1.0, see https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/docs/depthengine.md.

wes-b on 1 Oct 2019

In which environment are you starting the application?
Logged on session with a running X-Server or headless/ssh session?
If its the latter, you need to export the a DISPLAY variable before starting the application like this:

export DISPLAY:=1

RoseFlunder on 1 Oct 2019

👍1

My application doesn't have any screens. It's a robotic application that uses the depth and intensity cameras for guidance. No screens, means no X server, hence no DISPLAY variable. I guess it may fit your categorty "headless" but it is not in any ssh session.

StevenButner on 1 Oct 2019

Does it work when you export a display variable with the aforementioned command?
Without a DISPLAY variable set "k4a_device_start_cameras()" will fail on Ubuntu.

RoseFlunder on 1 Oct 2019

I will have to do some reading to learn how to set an environment variable. The process that owns the cameras is a systemd daemon, so I guess I will have to add some commands in its services file or such. I'll try to figure it out and, if it works, I'll close this ticket. If not, I'll let you know.
Thnx

StevenButner on 1 Oct 2019

This could help you I think:
https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/562#issuecomment-536323723

EDIT:
@wes-b
I saw this issue popping up already 2 or 3 times now and I had it myself (in my case with a ssh-session).
Would it either be possible to have a proper error message for Ubuntu systems where the necessary environment variables are not set or at least provide a hint in the documentation?
Maybe here: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/docs/usage.md#linux-device-setup

In robotics applications it will be common to not have a X-Server running.

RoseFlunder on 1 Oct 2019

Agree with @wes-b suggestion, which would make it easier for a user to understand what the user is. But I am not (yet) able to solve the problem. I need to set an environment variable within my systemd daemon. I've read through #562 and several related threads. As I understand it, systemd runs its programs, by default, as root. Mine also needs to run as root so, until now, all has worked.

The requirement for an environment variable seems to require a user and a working directory. Once you put in a user, it shouldn't be root according to some of the contributors on the threads I've read .... though apparently it can be root. The additional requirement, however, that we set the particular environment variable XDG_RUNTIME_DIR=/run/user/1000 <> makes this a bit more challenging. On my system, even though there of course is a "root" user, and even though the root user has the id of 0, there is no directory "/run/user/0" so I cannot do what is being suggested ... sigh.

Additionally, I don't know whether the DISPLAY=:0 or DISPLAY=:1 is required. One person told me DISPLAY:=1 and I think that may have been a typo. Another says to use DISPLAY=:0. I've tried all of these but I don't think they are reason for my continued issue. Systemd doesn't seem to like me to set User=root and even when that goes through, I do not see any directory /run/user/0

Any further suggestions for this situation?

StevenButner on 1 Oct 2019

Would it either be possible to have a proper error message for Ubuntu systems where the necessary environment variables are not set or at least provide a hint in the documentation?
Maybe here: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/docs/usage.md#linux-device-setup

We are not planning to fix the root cause as it is a limitation in OpenGL. We are also working to see what we can update documentation to recomend better ways to get past this more quickly.

We will use this issue to at a minimum make the error message clearer. Is there an API that can be called to query for the fact there is no display session?

@rabbitdaxi FYI

wes-b on 2 Oct 2019

@StevenButner
Unfortunately I don't have a device myself to test it right now.
But tomorrow I may test to create a systemd service, like in the comment I linked, and report if it works.
I will be using an Intel NUC PC with:

Ubuntu 18.04
No monitor connected
Single azure kinect connected
Startup an application which uses the sensor SDK via systemd

EDIT: Couldn't really get it working flawlessly. Might also have to do sth. with using an external GPU on my side.

RoseFlunder on 2 Oct 2019

And, what would be most useful (at least for me) would be if you could have the systemd program you write run as root.

StevenButner on 2 Oct 2019

I'm still struggling with this GPU API error 204 thing. I've had to rearrange things a bit so that my application can run without being root (a good thing in the long term, no doubt). And, I've followed very closely the sample given in #562. I can launch my systemd service and it starts running. It opens the depth cameras and checks their serial numbers, etc. as part of its initialization. I can see a record of that in my log. But the instant it starts either camera streaming, get the same 204 error code and message.

My systemd unit file is here.
`[Unit]
Description=Navigation Manager (navmgr)
After=network.target

[Service]
Environment=DISPLAY=:0
Environment=XDG_RUNTIME_DIR=/run/user/990
User=rp9
Group=rp9
Type=simple
ExecStart=/opt/bin/navmgr
Restart=on-failure
WorkingDirectory=/tmp

[Install]
WantedBy=multi-user.target
`
The user that I created is "rp9" and it has id 990. I logged into it via ssh from another computer so that the /run/user/990 path got created. Note that, I do not yet understand how to do this without an external login (i.e. under the hood via systemd). Perhaps my problem is that the external login owns the DISPLAY :0 ?? I did also try DISPLAY=:1 however, with the same result.

Any suggestions of what to try next would be greatly appreciated.

StevenButner on 8 Oct 2019

We are not planning to fix the root cause as it is a limitation in OpenGL.

Hi @wes-b!

how important is the dependency on OpenGL? Why is it there in the first place? Do you rely on OpenGL to process depth or it's just for some non-core functionality (like visualization)?

Would it be possible to eliminate this dependency?

gkrasin on 18 Oct 2019

I am thinking auto starting as systemd service might not be possible because systemd services are ment to be run before a users login into a graphical session or am I wrong?
I am no Linux expert when it comes to this.

RoseFlunder on 18 Oct 2019

how important is the dependency on OpenGL? Why is it there in the first place? Do you rely on OpenGL to process depth or it's just for some non-core functionality (like visualization)?

@gkrasin OpenGL is essential to the process of converting the raw depth data into images, which we do on the GPU

wes-b on 24 Oct 2019

@wes-b got it, it makes sense now. Thank you for your insightful reply!

gkrasin on 24 Oct 2019

796

wes-b on 29 Jan 2020

This is a late comment, but it looks like EGL provides a way to get an OpenGL context in headless environments w/o Xorg running:

https://devblogs.nvidia.com/linking-opengl-server-side-rendering/
https://devblogs.nvidia.com/egl-eye-opengl-visualization-without-x-server/
https://stackoverflow.com/a/3331769/7553580

haljarrett on 30 Mar 2020

👍3

I agree with haljarrett and the many others who have posted in issues #796, #681, #562, #503 and more. We could really benefit from a truly headless system. Our robots shouldn't need to run a windows manager. All of these issues come back to OpenGL while trying to run some form of a headless system.

z33154 on 31 Mar 2020

Hi!,

I have a tool that extracts color & depth images from a kinect recording that works fine in windows. I just stumbled with this issue, trying to port it to a docker container (Ubuntu18LTS) to run in a kubernetes cluster.

I tried just setting the DISPLAY=:0 or DISPLAY=:1 but no luck.

Is there a way to run k4a into a headless environment / headless docker container? If so, what would be the workaround?

Thanks!

emepetres on 7 Aug 2020

Is there a way to run k4a into a headless environment / headless docker container? If so, what would be the workaround?

For Docker: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/1258#issuecomment-648943546

wes-b on 10 Aug 2020

Thanks @wes-b for answering.

I can indeed run k4a in a docker container with gpu by mounting the X11 socket. However this solution is not feasible for a cloud/headless environment, as a display is needed to be running in the host.

I've done a deep research in the past few days to overcome the issue, that I post here for reference and to see if anyone can shed some light on it:

These are all the options I've explored for now with little to none success:

xvfb: OpenGL runs on CPU and its version is 3.3, not enough for k4a. It is also deprecated in favor of Xdummy.
Xdummy: OpenGL runs on CPU and its version is 3.1, not enough for k4a. Tried to add VirtualGL to use the GPU and latest OpenGL, SegFault.
Xdummy + nvidia driver: Not supported by nvidia containers
x11vnc, x11docker: Both need a display connected as well
Windows Containers with directX gpu support: Cannot install k4a or other dependencies as installer need GUI. Not sure if even installing everything, it would work without a graphical interface in windows, nor if libdepthengine uses OpenGL in windows as well.

Currently I have these other two options that may work:

Use aks-engine to install custom images with nvidia/cuda + Xorg on the Kubernetes working nodes.
Use Azure Batch instead of Kubernetes, using a Windows VM with graphical interface.

In the following days I'll try the azure batch approach, as it seems to be easier than using aks-engine.

However, I understand the issue could be easily solved if libdepthengine created a EGL headless context, as posted by others before. @wes-b , any idea of working in that direction?

emepetres on 12 Aug 2020

👍2

@emepetres .... Your discussion of a docker-enabled GPU supporting the K4A needs is very interesting. Thanks for sharing that. I want to investigate it and try to get it running on my own system. I may have some questions for you in that regard which I will likely send via email.

The description of your various attempts to get the K4As to work in a headless environment are very familiar to me. I can report that, at this point, I have a working Ubuntu 18.04-based headless robotic system that uses 2 K4A cameras to sense depth in an autonomous navigation application. Many of the items that you list were also tried during development of my system .... with similar frustration and resulting failures.

I believe that the most important factor for getting a headless system up is figuring out how to boot up your system with a logged-in user. That's the area where I struggled .... because the investigation led to a number of complex issues bridging such topics as authentication (i.e. the PAM subsystem), session and 'seat' management, all manner of permission-related issues, and lots of systemd experimentation. In the end, I discovered that all I really needed was to install the "nodm" package. This is a full-featured display manager for headless systems (I.e. for systems with no display!). It has the all-important feature that most other display managers have .... configurable autologin. Nodm has a configuration file in /etc/default/nodm. In that file, you can configure the autologin feature. The comments in the /etc/default/nodm file are clear and helpful. There you can specify what userid to autologin and how to fire up the X11 session, including the X11 server options.

With nodm installed and enabled, I can boot up my system without any terminal attached. Everything boots up and a single special userid is automatically logged in and attached to an X11 server. I created some systemd _user_-mode units to attach to the X11 server for the special userid that I have configured for autologin. These systemd user-mode units get launched by systemd once that userid is logged in ... and that happens due to nodm (at boot time). Coupled with this, I have some regular systemd units for other daemons used in my system. These daemons are in the same group and I use group permission settings to maintain appropriate access and protection of other sensor subsystems in my application.

Note that I can log into my system over a wireless ssh session at any time. I routinely do my development over such an arrangement. If I give the command "loginctl list-sessions", I can see my own login session and one other one .... the userid that got autologged in. It always shows up with seat0 and has TTY "???".

I believe that the session setup --- in particular, the environment in /run/user/ ---- is the missing link that is needed for running any application that needs to stream the K4A depth camera. Once I managed to have a logged-in userid in place, together with its runtime environment, then all of the K4A library functionality worked for me. It all gets hooked up in the systemd unit that launches the application. This unit has to define at least the items shown below:

`[Service]
Environment=DISPLAY=:10 <<<--- this is what I use, but it can be different
Environment=XDG_RUNTIME_DIR=/run/user/ <<<--- numeric userid of your autologged in user goes here
Environment=MESA_LOADER_DRIVER_OVERRIDE=iris <<<--- this is needed on my system which has Intel GPU

User= <<<--- put name of your autologged in userid here
Group= <<<--- put your own defined group number here
Type=simple
Restart=on-failure
WorkingDirectory=/tmp <<<<-- you can change this as needed
ExecStart= <<<--- path to your executable daemon code here
`
I hope this helps. It works well for me.

StevenButner on 13 Aug 2020

👍1

Thank you @StevenButner for sharing that, the nodm setup is very interesting.

However if I understand correctly it is still not possible to use it in a nvidia/cuda docker container (nowadays) in a truly headless host, because nodm still needs an X server running in the background. Currently nvidia/cuda containers does not work with an X server running inside because the nvidia driver is not installed in the container (the host driver is used instead).

FYI looking at nodm I found that the author has deprecated it in favor of a lightdm autologin plugin

emepetres on 14 Aug 2020

Thanks for the response, @emepetres. I was aware that nodm was deprecated and, in fact, lightdm would almost certainly work in its place. I like the simplicity and smaller size of nodm. As for the docker container not supporting an X11 server inside, that may not be a show-stopper, since you have already demonstrated that the OpenGL environment is apparently available within a docker container. You have shown a run with the k4aviewer application, which supports depth streaming.

I haven't tried this, but I think it is likely that the nodm auto-login could be configured to work without the X11 server launch (in a manner more like a tty). We really don't need the full set of features that X11 supports anyway. Apparently, the nvida/cuda docker technology has enough built-in capability that the K4A functions can run. In this area, all of the details count (and I don't know enough of them at this time to make any speculations).

Can you describe more about how the docker container offers OpenGL without launching an X11 server? And, for those of us who do not have nvidia GPU hardware but still have a headless system and a desire to run it all in a docker container, do you know whether the nvidia/cuda docker setup can be used with other GPUs? The systems I work with have the integrated Intel graphics controllers, e.g. Intel HD Graphics 530.

Added later: After doing some reading about nodm and about the nvdia/cuda docker setup, I now see that one thing I said above is not correct. Nodm does need to be setup with an X11 server. The attachment is quite flexible, however, due to the flexibility built into the launch methods for X11. But, it does need to have X11 services. I do not know whether or not this could be configured so as to attach the auto-login session to an X11 server running outside the container. That determination will require more research.

SteveButner on 14 Aug 2020

Some GPU Cards need to connect a display monitor to enable OpenGL. You cannot use any OpenGL program if the GPU card is not outputting display signal. This is a common issue in the headless computer system.

To solve this problem, You can purchase a dummy display plug to simulate a display monitor connection. No additional software needs to be installed.

You can buy something like this: dummy display plug

lesterlo on 25 Aug 2020

We got it working with this workaround:

Enable automatic login for the user you are going to use. After that we could shutdown the pc, disconnect the monitor and power it on again. With the automatic login the x-server session will be created.
After connecting via ssh set the Display env variable with: export DISPLAY=:0
Start the software using the k4a sdk as usual

RoseFlunder on 25 Aug 2020

@RoseFlunder thanks for this work around. One of our team also suggested using command line to enable automatic login. You can find instructions herehttps://vitux.com/how-to-enable-disable-automatic-login-in-ubuntu-18-04-lts/