Dali: Question on the state of the python binding and compatibility for aarch64 (Jetson Xavier)

Created on 7 Apr 2020  路  24Comments  路  Source: NVIDIA/DALI

Hello,

I'm working on a project where we currently use DALI as dataloader on x86 to train a NN with pytorch. We would like to transition that code to the jetson Xavier platform (we know it's mainly geared towards inference).

I see aarch64 support is limited, what would be the required work to get the python binding operational (with limited ops support we only use a few) on aarch64 ?

I guess operational is way optimistic, so rather what current roadblocks are known and what additional problems could be anticipated to try to get it to work on the Xavier platform ?

Cheers

enhancement question

All 24 comments

Hi,

I see aarch64 support is limited, what would be the required work to get the python binding operational (with limited ops support we only use a few) on aarch64 ?

What it would require is a cross-compilation of pybind11 python bindings. The main challenge is that we are using the CMake file that pybind11 provides and it supports (AFAIK) only the native compilation. One of the things it does it call to python to figure out compilation options. I see a thread about this https://github.com/pybind/pybind11/issues/1330. The short term solution would be just to use manual build (as long as you create cross-compiled python available in your build env so pybind11 can link with it).

To create a python wheel you need to manually invoke this and provide right WHL_PLATFORM_NAME. bundle-wheel.sh itself uses host patchelf, so you would need to adjust this as well

If you make that working feel free to post a PR with changes. We would be more than happy to include it in DALI.

I see, a compilation directly on a Xavier would yield the wheel as one would expect then ?

We can afford to do this as a first step (first limit I found while checking this was the available cmake is just below the requirement in CMakeLists.txt) cross compilation will become important but is not a deal breaker as of today.

I see, a compilation directly on a Xavier would yield the wheel as one would expect then ?

It may work but we haven't tested that.

first limit I found while checking this was the available cmake is just below the requirement in CMakeLists.txt

Why not build CMake from sources?

I will build CMake from sources to try it out, it鈥檚 just the first hurdle to pass ;)

Thanks for the infos.

Small update : after passing different hurdles it seems our train.py script now loads after a bare metal compilation on the Xavier board, I will check if it works properly tomorrow.

Sounds promising. I'm looking forward to hearing more about your results and experience.

New update : the network trains properly ! Great succes :)

For now I create the wheel in a very basic manner with :

python3 setup.py bdist_wheel

I have a lead on dockerizing this for x86 based on this : https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson#enabling-jetson-containers-on-an-x86-workstation-using-qemu

What course of action do you recommend so that I can create a viable PR beyond our use ?

Cheers

For now I create the wheel in a very basic manner with :

python3 setup.py bdist_wheel

I would recommend pip install dali/python. The wheel you have created will work only as long as you have local build dir as libraries are point to build artifacts stored there. That is why we have bundle-wheel.sh script that does the necessary steps to add missing libraries DALI depends on to wheel and patch so lib paths.

What course of action do you recommend so that I can create a viable PR beyond our use ?

If you can make it portable and self-contained feel free to fill a PR with any changes you have made and then we can think how to embed this into the current build system.

Yep got it, I used bundle-wheel.sh but it generated two wheels (the one for dali and one for future) I was not quite sure why at the moment and went with a more familiar path, I'll check but it should be fine with the bundle-wheel.

Also no modification required to the code, we are using the 0.18.0 release commit and it just worked. The most notable difference for us is that the ImageDecoder does not have the mixed mode because of currently missing libnvjpeg on the jetson, let's hope it gets included in the next release of jetpack with cuda 10.1 !

Just a feedback here on .so patching for my dummy wheel build, I checked with ldd the .so from dali that are installed in my venv, they all point to .so that are not in the DALI repository, I guess this might be because there are very few dependencies with all that's turned off during compilation

from dali that are installed in my venv, they all point to .so that are not in the DALI repository

Depending on your configuration it should point to at least libjpeg-turbo library. Anyway, I'm glad it worked so smoothly in your case.

libjpeg.so.8 => /usr/lib/aarch64-linux-gnu/libjpeg.so.8

I'm not familiar with libjpeg-turbo installed shared objects but it seems to be part of the package and the build switch was ON (I re-used the cmake flags from the available Dockerfile, I just turned off libsnd which we have no use for)

manylinux1 does not support aarch64 I think I will try to provide a minimal viable dockerfile to compile on x86 via qemu rather than the jetson (it is pretty darn slow) and try to reuse some Dockerfiles that are already available in the repo

libjpeg.so.8 => /usr/lib/aarch64-linux-gnu/libjpeg.so.8

I think you are using plain JPEG lib, not libjpeg-turbo. libjpeg-turbo is a drop-in replacement as far as remember. To use libjpeg-turbo you may need to compile it from sources.

apt-file list libjpeg-turbo8
libjpeg-turbo8: /usr/lib/aarch64-linux-gnu/libjpeg.so.8
libjpeg-turbo8: /usr/lib/aarch64-linux-gnu/libjpeg.so.8.1.2
libjpeg-turbo8: /usr/share/doc/libjpeg-turbo8/changelog.Debian.gz
libjpeg-turbo8: /usr/share/doc/libjpeg-turbo8/copyright
libjpeg-turbo8: /usr/share/lintian/overrides/libjpeg-turbo8

So you are good then.

yep :) I doubted too
I'll keep you posted

Ok I just finally took the time to read the bundle wheel sh, you carry all the dependencies with the wheel in the .lib directory I understand your concern now !

At last I have a Dockerfile to build on x86 for jetson !

I'll propose a PR with the core content, the current major problem is that the l4t-base image from nvidia lacks CUDA so I had to mount a copy I made from the jetson I have on hand. Also patchelf 0.9 corrupts the libdali_operators.so had to compile 0.10 in the docker for that.

Cheers

Hi,
Have you checked docker/Dockerfile.build.aarch64-linux? We install CUDA toolkit for Jetson there.

patchelf 0.9 corrupts the libdali_operators.so had to compile 0.10 in the docker for that.

We have encountered that too for Ubuntu, manylinux we use for python wheels build it from source and don't have that problem. I think your solution is fine.

In your PR do you want to replace docker/Dockerfile.build.aarch64-linux or make an alternative solution to that?

Hi,

The idea is that the solution is an alternative, the docker with qemu aarch64 basically means you emulate the jetson on your x86 to build as if everything happened on the jetson. Currently the problem I have is that nvidia does not provide (as far as I could tell) a jetson OS image (l4t) with CUDA bundled with it. I guess there might be a possibility with the SDK manager but I did not have time to check that for the moment.

I guess there might be a possibility with the SDK manager but I did not have time to check that for the moment.

Sure. If you have any time just let us know the result.

Hello again ! 馃槃

SDK manager allows to download cuda toolkit for jetson, and the new version is even callable from command line, I will tie this all together and make a readme and submit a PR once the process is streamlined.

Also the new jetpack 4.4 was released as a Developer Preview, no sign of the CUDA libnvjpeg so we probably will have another PR at some point to use the one provided with the jetson, because of hardware and software differences it looks like it will never ship with CUDA.

Cheers

I don't have a formal authorization to share the code I wrote... Sorry about that. I'll keep you posted

@IceTDrinker - no problem. Thanks for your effort.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tianyang-li picture tianyang-li  路  4Comments

dhkim0225 picture dhkim0225  路  4Comments

samra-irshad picture samra-irshad  路  3Comments

kfhe00 picture kfhe00  路  5Comments

bamfpga picture bamfpga  路  3Comments