Vision: Support for large video datasets that contain a few corrupted files

Created on 29 Aug 2019  路  5Comments  路  Source: pytorch/vision

Hello,
I've been trying to load the Kinetics-400 dataset using the following code (the videos are in mp4 format):

from torchvision.datasets.video_utils import VideoClips
from torchvision.datasets.utils import list_dir
from torchvision.datasets.folder import make_dataset
from torchvision.datasets.vision import VisionDataset
frames_per_clip = 16
step_between_clips = 16
extensions = ('avi', 'mp4')
root = r'/kinetics2/kinetics2/train'
classes = list(sorted(list_dir(root)))
class_to_idx = {classes[i]: i for i in range(len(classes))}
samples = make_dataset(root, class_to_idx, extensions, is_valid_file=None)
video_list = [x[0] for x in samples]
video_clips = VideoClips(video_list, frames_per_clip, step_between_clips)

Apparently, it prints "moov atom not found" and stops. I assume there's a corrupted file but I can't download it again. Is there a solution for skipping these corrupted files?

bug help wanted io needs reproduction video

All 5 comments

@ekosman can you try isolating the problematic file and share with us?

We do handle corrupted files in torchvision, see https://github.com/pytorch/vision/blob/93bceaf250fe659f4baa9a1ecf79ece70d14b356/torchvision/io/video.py#L112-L115 for an example. So it would be great to understand what corner case you might be facing.

@fmassa This file raises the exception:
https://drive.google.com/file/d/1oRi88WRDnmgCblIQrPCTU9ZOGCBfZNIn/view?usp=sharing

moov atom not found

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/datasets/video_utils.py", line 55, in __init__
    self._compute_frame_pts()
  File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/datasets/video_utils.py", line 84, in _compute_frame_pts
    for batch in dl:
  File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/ekosman/anaconda3/envs/torch/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
  File "av/utils.pyx", line 27, in av.utils.AVError.__init__
TypeError: __init__() takes at least 3 positional arguments (2 given)

Thanks for the video example @ekosman !

I'll try to debug the issue and fix it

Thanks @fmassa ! :)

Thanks for the report!

Was this page helpful?
0 / 5 - 0 ratings