Hi, I built DALI from source on IBM POWER machine. I got the following error:
$ python main.py -a resnet18 ImageNet
=> creating model 'resnet18'
Traceback (most recent call last):
File "main.py", line 501, in <module>
main()
File "main.py", line 254, in main
pipe = HybridTrainPipe(batch_size=args.batch_size, num_threads=args.workers, device_id=args.local_rank, data_dir=traindir, crop=crop_size, dali_cpu=args.dali_cpu)
File "main.py", line 95, in __init__
self.decode = ops.nvJPEGDecoderRandomCrop(device="mixed", output_type=types.RGB, device_memory_padding=211025920, host_memory_padding=140544512,
AttributeError: module 'nvidia.dali.ops' has no attribute 'nvJPEGDecoderRandomCrop'
I inspected the attribute of ops and I saw only nvJPEGDecoder but not other operations. Please advise. Thank you!
Hi,
nvJPEGDecoderRandomCrop is using API from next nvJPEG version that is not public yet. We use prerelease version to build DALI whl that is why in the prebuild binaries it is available. Please wait few more days and keep track when nvJPEG update is online and then try to build again.
New nvJPEG release is available https://developer.nvidia.com/nvjpeg-release-download.
The given link has patches for CUDA 10.0 and 9.0. Will there be a patch available for 10.1 ?
Thanks,
@marsaev ?
It should be available in "CUDA Toolkit 10.1 Update 1" here:
https://developer.nvidia.com/cuda-downloads
See release notes https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#nvjpeg-u1-new-features
In case it helps, a pre-built DALI package for Power systems is included in the Watson Machine Learing Community Edition (previously 'PowerAI') conda channel.
After adding the conda channel (see: https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.1/navigation/wmlce_install.html) you should be able to just:
$ conda create -y -n my-dali-env python=3.6 dali powerai-release=1.6.1
$ conda activate my-dali-env
(my-dali-env) $ conda list dali
...
dali 0.9 py36_666ce55_1094.g70c071f
(my-dali-env) $ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:34:02)
...
>>> import nvidia.dali
>>> 'nvJPEGDecoderRandomCrop' in dir(nvidia.dali.ops)
True
>>>
In our 1.6.1 release the DALI package is build against CUDA 10.1 (also included in the channel).
@hartb - WOW, I didn't know that there is any community build for Power.
Would you like to make a PR and update our installation guide by providing this info?
@JanuszL Thank you for the invitation; I'll put up a PR.
@hartb - if any of your work enabling conda builds can be made public feel free to share it as well. DALI is open source after all and we would be more than happy to see external contributions.
@JanuszL We're in favor of that, though our current recipe includes some stuff that's specific to our build setup and wouldn't work for others. Maybe we can clean it up and contribute here or to conda forge. FYI @jayfurmanek
Thanks @hartb It works like a charm.
It's be great if there were an official conda recipe for DALI. Conda has its own toolchain and glibc, so you don't have to jump though hoops pulling in docker, etc to get a compliant build like you have to do with manylinux.
We should be able to clean up the recipe used in WML CE to make it more appropriate to contribute. Where should we move this conversation since this particular issue is closed?
@jayfurmanek - I have created https://github.com/NVIDIA/DALI/issues/1125 so we can track this request.
Most helpful comment
In case it helps, a pre-built DALI package for Power systems is included in the Watson Machine Learing Community Edition (previously 'PowerAI') conda channel.
After adding the conda channel (see: https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.1/navigation/wmlce_install.html) you should be able to just:
In our 1.6.1 release the DALI package is build against CUDA 10.1 (also included in the channel).