Hello,
I've been trying to load the Kinetics-400 dataset using the following code (the videos are in mp4 format):
from torchvision.datasets.video_utils import VideoClips
from torchvision.datasets.utils import list_dir
from torchvision.datasets.folder import make_dataset
from torchvision.datasets.vision import VisionDataset
frames_per_clip = 16
step_between_clips = 16
extensions = ('avi', 'mp4')
root = r'/kinetics2/kinetics2/train'
classes = list(sorted(list_dir(root)))
class_to_idx = {classes[i]: i for i in range(len(classes))}
samples = make_dataset(root, class_to_idx, extensions, is_valid_file=None)
video_list = [x[0] for x in samples]
video_clips = VideoClips(video_list, frames_per_clip, step_between_clips)
Apparently, it prints "moov atom not found" and stops. I assume there's a corrupted file but I can't download it again. Is there a solution for skipping these corrupted files?
@ekosman can you try isolating the problematic file and share with us?
We do handle corrupted files in torchvision, see https://github.com/pytorch/vision/blob/93bceaf250fe659f4baa9a1ecf79ece70d14b356/torchvision/io/video.py#L112-L115 for an example. So it would be great to understand what corner case you might be facing.
@fmassa This file raises the exception:
https://drive.google.com/file/d/1oRi88WRDnmgCblIQrPCTU9ZOGCBfZNIn/view?usp=sharing
moov atom not found
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/datasets/video_utils.py", line 55, in __init__
self._compute_frame_pts()
File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/datasets/video_utils.py", line 84, in _compute_frame_pts
for batch in dl:
File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
File "av/utils.pyx", line 27, in av.utils.AVError.__init__
TypeError: __init__() takes at least 3 positional arguments (2 given)
Thanks for the video example @ekosman !
I'll try to debug the issue and fix it
Thanks @fmassa ! :)
Thanks for the report!