Vision: VideoReader segfault on SOME videos.

Created on 8 Sep 2020 · 4Comments · Source: pytorch/vision

🐛 Bug

VideoReader segmentation fault on long videos when using video_reader backend. This issue is a continuation of #2259
Torchvision segfaults when reading entire test video.

I used to believe this was the issue with long videos only, but it happens on the test videos we have provided as well suggesting in might related to the FFMPEG version installed on a system (the fact that test don't catch that might suggest that).

To Reproduce

Steps to reproduce the behavior:

install torchvision from source - in this case
from your folder call
vframes, _, _ = torchvision.io.read_video(path, pts_unit="sec") where path=$TVDIR/test/assets/videos/TrumanShow_wave_f_nm_np1_fr_med_26.avi

Backtrace suggests it's an issue in libswscale.

#0  0x00007fff88224cf2 in ?? () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5
#1  0x00007fff88223bb4 in ?? () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5
#2  0x00007fff881f9af4 in sws_scale () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5

Which I've previously found can be due to conflicting inputs (note, this _might_ be due to new/different FFMPEG version?).

Expected behavior

Video is being read

Environment

Collecting environment information...
PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000

Nvidia driver version: 440.33.01
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn.so.7

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0a0+78ed10c
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.1.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38hcb8c335_0 conda-forge
[conda] numpy 1.19.1 py38hbc911f0_0
[conda] numpy-base 1.19.1 py38hfa32c7d_0
[conda] pytorch 1.6.0 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] torchvision 0.7.0a0+78ed10c pypi_0 pypi

Suggested fix

Removing hidden inputs (specifically size/aspect ratio/crop) in _read_video op can in principle fix this, but _might_ be bc breaking if users are exposing these manually in their code.

cc @bjuncek

bug high priority video triage review

Source

bjuncek

All 4 comments

More information:

Program received signal SIGSEGV, Segmentation fault.
0x00007f4d206b7fc2 in ff_yuv_420_rgb24_ssse3.loop0 ()
    at libswscale/x86/yuv_2_rgb.asm:376
376 libswscale/x86/yuv_2_rgb.asm: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64
(gdb) bt
#0  0x00007f4d206b7fc2 in ff_yuv_420_rgb24_ssse3.loop0 ()
    at libswscale/x86/yuv_2_rgb.asm:376
#1  0x00007f4d206b6e84 in yuv420_rgb24_ssse3 (c=0x564bca2e59c0, src=0x7ffd52b1c9e0, 
    srcStride=0x7ffd52b1c9c0, srcSliceY=0, srcSliceH=256, dst=0x7ffd52b1ca00, 
    dstStride=0x7ffd52b1c9d0) at libswscale/x86/yuv2rgb_template.c:177
#2  0x00007f4d2068eb45 in sws_scale (c=<optimized out>, srcSlice=<optimized out>, 
    srcStride=<optimized out>, srcSliceY=<optimized out>, srcSliceH=256, 
    dst=<optimized out>, dstStride=0x7ffd52b1cd10) at libswscale/swscale.c:969
#3  0x00007f4d21f62d2b in ffmpeg::(anonymous namespace)::transformImage (
    context=0x564bca2e59c0, srcSlice=0x564bca35f100, srcStride=0x564bca35f140, 
    inFormat=..., outFormat=..., 
    out=0x564bca4c5b60 "\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\024\r\027\024\r\031\024\r\031\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\023\f\030\023\f\030\023\f"..., planes=0x7ffd52b1cd20, lines=0x7ffd52b1cd10)
    at /root/vision/torchvision/csrc/cpu/decoder/video_sampler.cpp:46
#4  0x00007f4d21f639a8 in ffmpeg::VideoSampler::sample (this=0x564bca5a4220, 
    srcSlice=0x564bca35f100, srcStride=0x564bca35f140, out=0x564bca4c3490)
    at /root/vision/torchvision/csrc/cpu/decoder/video_sampler.cpp:182
#5  0x00007f4d21f63c1e in ffmpeg::VideoSampler::sample (this=0x564bca5a4220, 
    frame=0x564bca35f100, out=0x564bca4c3490)

The segafults only occur when MMX/SSE/AVX optimizations are enabled on FFmpeg

andfoy on 16 Sep 2020

I believe this issue might be a bug in FFmpeg introduced in https://github.com/FFmpeg/FFmpeg/commit/fc6a5883d6af8cae0e96af84dda0ad74b360a084, and that has been fixed in https://github.com/FFmpeg/FFmpeg/commit/ba3e771a42c29ee02c34e7769cfc1b2dbc5c760a

The bug report for this issue was in https://trac.ffmpeg.org/ticket/8747

If that's the case, then recompiling FFmpeg would solve the issue.

fmassa on 18 Sep 2020

🚀1

Effectively, this issue is directly related to the regression introduced in 4.3 and fixed in https://github.com/FFmpeg/FFmpeg/commit/ba3e771a42c29ee02c34e7769cfc1b2dbc5c760a. On FFmpeg 4.2 video reader tests pass

andfoy on 18 Sep 2020

👍1

Given that this was a known issue from ffmpeg, and is fixed by using a different version, I'm closing this issue

bjuncek on 15 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings