Hi, using a fresh conda env on xenial, CUDA 10.1, I'm having trouble loading UCF-101 video using Dali. The data is in the officially released format, and I'm following the simple video loader tutorial, notably with this script:
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIGenericIterator
import nvidia.dali.ops as ops
from glob import glob
import timeit
import os
from pathlib import Path
class VideoPipe(Pipeline):
def __init__(self, data, batch_size, num_threads, device_id, sequence_length, step, stride):
super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
self.input = ops.VideoReader(device='gpu', filenames=data, sequence_length=sequence_length,
shard_id=0, num_shards=1, step=step, stride=stride)
self.crop = ops.Crop(device='gpu', crop_h=128, crop_w=128)
def define_graph(self):
sequence = self.input(name='Reader')
cropped_output = self.crop(sequence)
return cropped_output
# PATH TO UCF ROOT
video_path = '<PATH_TO_UCF_DIR>'
vids = glob(os.path.join(video_path, '*', '*'))
pipe = VideoPipe(batch_size=100, num_threads=12, device_id=0, data=vids, sequence_length=5, step=2, stride=1)
pipe.build()
dali_iter = DALIGenericIterator([pipe], ['video'], pipe.epoch_size("Reader"), fill_last_batch=True, last_batch_padded=True)
i = 0
prev_timer = timeit.default_timer()
for video_data in dali_iter:
for d in video_data:
frame_tensors = d['video'].cpu().numpy()
for batch in frame_tensors:
for frame_tensor in batch:
for frame_tensor in batch:
if i % 1000 == 0:
print(f"Processed {i} frames")
num_frames = frame_tensors.shape[0] * frame_tensors.shape[1]
timer = timeit.default_timer()
elapsed = timer - prev_timer
prev_timer = timer
print(f"fps: {num_frames/elapsed}")
print(f'Total elapsed time: {elapsed}')
This script runs on some of our envs, but not others (seems to work on LSB 18.04+, but not xenial). This is the type of error I get when I try to run it on xenial:
[avi @ 0x562fd1660300] Could not find codec parameters for stream 1 (Audio: mp3 (U[0][0][0] / 0x0055), 44100 Hz, 2 channels, 125 kb/s): unspecified frame size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Invalid return value 0 for stream protocol
Invalid return value 0 for stream protocol
and so on. Tried installing by source, but ran into version issues with some of the deps. I currently have no system or conda-specific versions of ffmpeg installed. Docker of course works, but since my workflow depends on another image, it might be more convenient to sort out this error with the pip version.
Hi @rdevon ,
DALI bundles own version of FFmpeg so it should not rely on anything that you have installed already.
Docker of course works, but since my workflow depends on another image, it might be more convenient to sort out this error with the pip version.
You mean that installing DALI inside the clean docker works but in your bare metal setup it doesn't?
What version of DALI do you have? You mentioned coda - have you build DALI conda package or you are using official whl?
@a-sansanwal - any idea what could go wrong?
Yes, a docker works, e.g.
docker pull nvcr.io/nvidia/pytorch:20.02-py3
I'm using the official whl installed via:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali
and getting this error.
I attempted to install from git source, but ran into make errors related to libjpeg-turbo (I'm not entirely sure if 1.5 is available on xenial)
pytorch:20.02-py3 has DALI 0.18, while https://developer.download.nvidia.com/compute/redist/cuda/10.0 already has 0.19. Could you downgrade your bare metal installation to 0.18 and recheck, as well as upgrade to 0.19 inside docker and check again as well?
That error is from ffmpeg. DALI uses ffmpeg for demuxing.
Anyway its an error in audio stream and it should be possible to remove audio streams from the dataset as a workaround.
@a-sansanwal how does one load while ignoring audio streams? This is a fine solution for me if it results in no errors.
@JanuszL
First attempt, installing 0.18 via pip after uninstalling 0.19. Result was as many related Audio stream errors followed by:
[/opt/dali/dali/operators/reader/video_reader_op.h:67] [/opt/dali/dali/operators/reader/loader/video_loader.cc:223] Could not open file /data/Datasets/UCF-101/WalkingWithDog/v_WalkingWithDog_g04_c04.avi because of Too many open files
Stacktrace (27 entries):
[frame 0]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2b56fe) [0x7fd6f4ded6fe]
[frame 1]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x7fc5ab) [0x7fd6f53345ab]
[frame 2]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x804c92) [0x7fd6f533cc92]
[frame 3]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x82b88b) [0x7fd6f536388b]
[frame 4]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x82cbe2) [0x7fd6f5364be2]
[frame 5]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(std::_Function_handler<std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (dali::OpSpec const&), std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (*)(dali::OpSpec const&)>::_M_invoke(std::_Any_data const&, dali::OpSpec const&)+0xc) [0x7fd6f4de7f7c]
[frame 6]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x130bf4) [0x7fd6f3838bf4]
[frame 7]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::InstantiateOperator(dali::OpSpec const&)+0x34e) [0x7fd6f383813e]
[frame 8]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::OpGraph::InstantiateOperators()+0xa7) [0x7fd6f37ef707]
[frame 9]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Build(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >)+0xa48) [0x7fd6f3856988]
[frame 10]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x500df) [0x7fd6ffe1b0df]
[frame 11]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x20593) [0x7fd6ffdeb593]
[frame 12]: python(_PyMethodDef_RawFastCallKeywords+0x264) [0x559e4b04c114]
[frame 13]: python(_PyCFunction_FastCallKeywords+0x21) [0x559e4b04c231]
[frame 14]: python(_PyEval_EvalFrameDefault+0x52cf) [0x559e4b0b0e8f]
[frame 15]: python(_PyFunction_FastCallKeywords+0xfb) [0x559e4b04b68b]
[frame 16]: python(_PyEval_EvalFrameDefault+0x6a0) [0x559e4b0ac260]
[frame 17]: python(_PyEval_EvalCodeWithName+0x2f9) [0x559e4b0056f9]
[frame 18]: python(PyEval_EvalCodeEx+0x44) [0x559e4b0065f4]
[frame 19]: python(PyEval_EvalCode+0x1c) [0x559e4b00661c]
[frame 20]: python(+0x21c974) [0x559e4b107974]
[frame 21]: python(PyRun_FileExFlags+0xa1) [0x559e4b111cf1]
[frame 22]: python(PyRun_SimpleFileExFlags+0x1c3) [0x559e4b111ee3]
[frame 23]: python(+0x227f95) [0x559e4b112f95]
[frame 24]: python(_Py_UnixMain+0x3c) [0x559e4b1130bc]
[frame 25]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fd700cd5830]
[frame 26]: python(+0x1d0990) [0x559e4b0bb990]
Traceback (most recent call last):
File "test_dali.py", line 26, in <module>
pipe.build()
File "/home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 316, in build
self._pipe.Build(self._names_and_devices)
RuntimeError: [/opt/dali/dali/operators/reader/loader/video_loader.cc:223] Could not open file /data/Datasets/UCF-101/WalkingWithDog/v_WalkingWithDog_g04_c04.avi because of Too many open files
Stacktrace (27 entries):
[frame 0]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2b56fe) [0x7fd6f4ded6fe]
[frame 1]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x7fc5ab) [0x7fd6f53345ab]
[frame 2]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x804c92) [0x7fd6f533cc92]
[frame 3]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x82b88b) [0x7fd6f536388b]
[frame 4]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x82cbe2) [0x7fd6f5364be2]
[frame 5]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(std::_Function_handler<std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (dali::OpSpec const&), std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (*)(dali::OpSpec const&)>::_M_invoke(std::_Any_data const&, dali::OpSpec const&)+0xc) [0x7fd6f4de7f7c]
[frame 6]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x130bf4) [0x7fd6f3838bf4]
[frame 7]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::InstantiateOperator(dali::OpSpec const&)+0x34e) [0x7fd6f383813e]
[frame 8]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::OpGraph::InstantiateOperators()+0xa7) [0x7fd6f37ef707]
[frame 9]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Build(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >)+0xa48) [0x7fd6f3856988]
[frame 10]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x500df) [0x7fd6ffe1b0df]
[frame 11]: /home/devonh/anaconda3/envs/dali_no_tv/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x20593) [0x7fd6ffdeb593]
[frame 12]: python(_PyMethodDef_RawFastCallKeywords+0x264) [0x559e4b04c114]
[frame 13]: python(_PyCFunction_FastCallKeywords+0x21) [0x559e4b04c231]
[frame 14]: python(_PyEval_EvalFrameDefault+0x52cf) [0x559e4b0b0e8f]
[frame 15]: python(_PyFunction_FastCallKeywords+0xfb) [0x559e4b04b68b]
[frame 16]: python(_PyEval_EvalFrameDefault+0x6a0) [0x559e4b0ac260]
[frame 17]: python(_PyEval_EvalCodeWithName+0x2f9) [0x559e4b0056f9]
[frame 18]: python(PyEval_EvalCodeEx+0x44) [0x559e4b0065f4]
[frame 19]: python(PyEval_EvalCode+0x1c) [0x559e4b00661c]
[frame 20]: python(+0x21c974) [0x559e4b107974]
[frame 21]: python(PyRun_FileExFlags+0xa1) [0x559e4b111cf1]
[frame 22]: python(PyRun_SimpleFileExFlags+0x1c3) [0x559e4b111ee3]
[frame 23]: python(+0x227f95) [0x559e4b112f95]
[frame 24]: python(_Py_UnixMain+0x3c) [0x559e4b1130bc]
[frame 25]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fd700cd5830]
[frame 26]: python(+0x1d0990) [0x559e4b0bb990]
So as I understand you are not using the version shipped in nvcr.io/nvidia/pytorch:20.02-py3 but you upgrade it to 0.19 there as well? Because as I said pytorch:20.02-py3 ships with 0.18 and you should see the same error in Docker as you have in bare metal after downgrading to 0.18.
Quick google search gives this command
ffmpeg -i input -map 0 -map -0:a -c copy output
i would suggest writing a script to clean the dataset by running this command on all files
Upgrading to 0.19 within the Docker does indeed give the errors.
I will try to repro that and get back to you soon.
I upgraded to bionic and I still get the error, installing dali from pip. Perhaps this is related to the pip installation? So far I've observed:
conditioned on various envs | install with pip -> errors
I managed to reproduce that. I guess it may be related to https://github.com/NVIDIA/DALI/pull/1659. Need some time to narrow this down.
I see where the problem is. When our custom read encounters EOF is should return AVERROR_EOF instead of 0. The message you see is just a warning about the wrong return value but FFmpeg can still handle this properly.
We will make that warning gone but until then I don't think you should experience any problem because of this (other than more messages in the console).
I'm referring to and https://github.com/NVIDIA/DALI/pull/1814 should make this gone:
Invalid return value 0 for stream protocol
This one:
[avi @ 0x562fd1660300] Could not find codec parameters for stream 1 (Audio: mp3 (U[0][0][0] / 0x0055), 44100 Hz, 2 channels, 125 kb/s): unspecified frame size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Was commented by a-sansanwal, and this warning not an error that prevents DALI from decoding video.
Ah so everything is warnings then.
Ah so everything is warnings then.
It seems so.
OK, so the latest PR fixes the return issue (but not yet on pip?) and the other audio warning is ffmpeg and can be fixed by removing the audio (which I don't need for now). I can attempt to install from git, and will see if this resolves at least the former.
The warnings are only visible during data preparation, not during the actual execution. So unless you don't have anything more important to do I wouldn't bother with stripping the audio.
OK, the warnings go away with the nightly build.