Dali: Question on the state of the python binding and compatibility for aarch64 (Jetson Xavier)

Created on 7 Apr 2020 · 24Comments · Source: NVIDIA/DALI

Hello,

I'm working on a project where we currently use DALI as dataloader on x86 to train a NN with pytorch. We would like to transition that code to the jetson Xavier platform (we know it's mainly geared towards inference).

I see aarch64 support is limited, what would be the required work to get the python binding operational (with limited ops support we only use a few) on aarch64 ?

I guess operational is way optimistic, so rather what current roadblocks are known and what additional problems could be anticipated to try to get it to work on the Xavier platform ?

Cheers

enhancement question

Source

IceTDrinker

All 24 comments

Hi,

I see aarch64 support is limited, what would be the required work to get the python binding operational (with limited ops support we only use a few) on aarch64 ?

What it would require is a cross-compilation of pybind11 python bindings. The main challenge is that we are using the CMake file that pybind11 provides and it supports (AFAIK) only the native compilation. One of the things it does it call to python to figure out compilation options. I see a thread about this https://github.com/pybind/pybind11/issues/1330. The short term solution would be just to use manual build (as long as you create cross-compiled python available in your build env so pybind11 can link with it).

To create a python wheel you need to manually invoke this and provide right WHL_PLATFORM_NAME. bundle-wheel.sh itself uses host patchelf, so you would need to adjust this as well

If you make that working feel free to post a PR with changes. We would be more than happy to include it in DALI.

JanuszL on 7 Apr 2020

I see, a compilation directly on a Xavier would yield the wheel as one would expect then ?

We can afford to do this as a first step (first limit I found while checking this was the available cmake is just below the requirement in CMakeLists.txt) cross compilation will become important but is not a deal breaker as of today.

IceTDrinker on 7 Apr 2020

I see, a compilation directly on a Xavier would yield the wheel as one would expect then ?

It may work but we haven't tested that.

first limit I found while checking this was the available cmake is just below the requirement in CMakeLists.txt

Why not build CMake from sources?

JanuszL on 7 Apr 2020

I will build CMake from sources to try it out, it’s just the first hurdle to pass ;)

Thanks for the infos.

IceTDrinker on 8 Apr 2020

Small update : after passing different hurdles it seems our train.py script now loads after a bare metal compilation on the Xavier board, I will check if it works properly tomorrow.

IceTDrinker on 8 Apr 2020

Sounds promising. I'm looking forward to hearing more about your results and experience.

JanuszL on 9 Apr 2020

New update : the network trains properly ! Great succes :)

For now I create the wheel in a very basic manner with :

python3 setup.py bdist_wheel

I have a lead on dockerizing this for x86 based on this : https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson#enabling-jetson-containers-on-an-x86-workstation-using-qemu

What course of action do you recommend so that I can create a viable PR beyond our use ?

Cheers

IceTDrinker on 9 Apr 2020

For now I create the wheel in a very basic manner with :

python3 setup.py bdist_wheel

I would recommend pip install dali/python. The wheel you have created will work only as long as you have local build dir as libraries are point to build artifacts stored there. That is why we have bundle-wheel.sh script that does the necessary steps to add missing libraries DALI depends on to wheel and patch so lib paths.

What course of action do you recommend so that I can create a viable PR beyond our use ?

If you can make it portable and self-contained feel free to fill a PR with any changes you have made and then we can think how to embed this into the current build system.

JanuszL on 9 Apr 2020

Yep got it, I used bundle-wheel.sh but it generated two wheels (the one for dali and one for future) I was not quite sure why at the moment and went with a more familiar path, I'll check but it should be fine with the bundle-wheel.

Also no modification required to the code, we are using the 0.18.0 release commit and it just worked. The most notable difference for us is that the ImageDecoder does not have the mixed mode because of currently missing libnvjpeg on the jetson, let's hope it gets included in the next release of jetpack with cuda 10.1 !

IceTDrinker on 10 Apr 2020

Just a feedback here on .so patching for my dummy wheel build, I checked with ldd the .so from dali that are installed in my venv, they all point to .so that are not in the DALI repository, I guess this might be because there are very few dependencies with all that's turned off during compilation

IceTDrinker on 10 Apr 2020

from dali that are installed in my venv, they all point to .so that are not in the DALI repository

Depending on your configuration it should point to at least libjpeg-turbo library. Anyway, I'm glad it worked so smoothly in your case.

JanuszL on 10 Apr 2020

libjpeg.so.8 => /usr/lib/aarch64-linux-gnu/libjpeg.so.8

I'm not familiar with libjpeg-turbo installed shared objects but it seems to be part of the package and the build switch was ON (I re-used the cmake flags from the available Dockerfile, I just turned off libsnd which we have no use for)

manylinux1 does not support aarch64 I think I will try to provide a minimal viable dockerfile to compile on x86 via qemu rather than the jetson (it is pretty darn slow) and try to reuse some Dockerfiles that are already available in the repo

IceTDrinker on 10 Apr 2020

libjpeg.so.8 => /usr/lib/aarch64-linux-gnu/libjpeg.so.8

I think you are using plain JPEG lib, not libjpeg-turbo. libjpeg-turbo is a drop-in replacement as far as remember. To use libjpeg-turbo you may need to compile it from sources.

JanuszL on 10 Apr 2020

apt-file list libjpeg-turbo8
libjpeg-turbo8: /usr/lib/aarch64-linux-gnu/libjpeg.so.8
libjpeg-turbo8: /usr/lib/aarch64-linux-gnu/libjpeg.so.8.1.2
libjpeg-turbo8: /usr/share/doc/libjpeg-turbo8/changelog.Debian.gz
libjpeg-turbo8: /usr/share/doc/libjpeg-turbo8/copyright
libjpeg-turbo8: /usr/share/lintian/overrides/libjpeg-turbo8

IceTDrinker on 10 Apr 2020

So you are good then.

JanuszL on 10 Apr 2020

yep :) I doubted too
I'll keep you posted

IceTDrinker on 10 Apr 2020

👍1

Ok I just finally took the time to read the bundle wheel sh, you carry all the dependencies with the wheel in the .lib directory I understand your concern now !

IceTDrinker on 10 Apr 2020

At last I have a Dockerfile to build on x86 for jetson !

I'll propose a PR with the core content, the current major problem is that the l4t-base image from nvidia lacks CUDA so I had to mount a copy I made from the jetson I have on hand. Also patchelf 0.9 corrupts the libdali_operators.so had to compile 0.10 in the docker for that.

Cheers

IceTDrinker on 15 Apr 2020

Hi,
Have you checked docker/Dockerfile.build.aarch64-linux? We install CUDA toolkit for Jetson there.

patchelf 0.9 corrupts the libdali_operators.so had to compile 0.10 in the docker for that.

We have encountered that too for Ubuntu, manylinux we use for python wheels build it from source and don't have that problem. I think your solution is fine.

In your PR do you want to replace docker/Dockerfile.build.aarch64-linux or make an alternative solution to that?

JanuszL on 15 Apr 2020

Hi,

The idea is that the solution is an alternative, the docker with qemu aarch64 basically means you emulate the jetson on your x86 to build as if everything happened on the jetson. Currently the problem I have is that nvidia does not provide (as far as I could tell) a jetson OS image (l4t) with CUDA bundled with it. I guess there might be a possibility with the SDK manager but I did not have time to check that for the moment.

IceTDrinker on 22 Apr 2020

I guess there might be a possibility with the SDK manager but I did not have time to check that for the moment.

Sure. If you have any time just let us know the result.

JanuszL on 22 Apr 2020

Hello again ! 😄

SDK manager allows to download cuda toolkit for jetson, and the new version is even callable from command line, I will tie this all together and make a readme and submit a PR once the process is streamlined.

Also the new jetpack 4.4 was released as a Developer Preview, no sign of the CUDA libnvjpeg so we probably will have another PR at some point to use the one provided with the jetson, because of hardware and software differences it looks like it will never ship with CUDA.

Cheers

IceTDrinker on 23 Apr 2020

👍1

I don't have a formal authorization to share the code I wrote... Sorry about that. I'll keep you posted

IceTDrinker on 25 May 2020

@IceTDrinker - no problem. Thanks for your effort.

JanuszL on 25 May 2020

Was this page helpful?

0 / 5 - 0 ratings