Mpv: Seeking is slow or doesn't work in AV1 videos made with SVT-AV1 in open GOP mode

Created on 4 Jun 2020 · 5Comments · Source: mpv-player/mpv

Important Information

mpv version: 0.32.0
Platform and Version: Arch Linux
Source of the mpv binary: Arch Linux repositories

Reproduction steps

Encode a video with SVT-AV1 using the --irefresh-type 1 option (which is the default). Or download the example file below. Play the video with mpv, using either of the AV1 decoders --vd=libaom-av1 or --vd=libdav1d.

Attempt to seek forward in the video by clicking on the OSD. Also try seeking forward with the right arrow key.

Expected behavior

Seeking should be fast, no matter how far into the video you try to seek. Seeking should work, no matter what method is used.

Actual behavior

When clicking on the OSD, seeking is slow, taking longer to seek the longer into the video you try to seek. It seems like mpv is re-reading the entire file on every seek.

Seeking forward with the right arrow key causes mpv to exit immediately, even if there is plenty of the video left.

Log file

av1_open_gop_example_arrow_key_seek.log
av1_open_gop_example_osd_seek.log

Sample files

An example file (18s, 50MB): https://0x0.st/iOEm.mkv

Source

reedrs

Most helpful comment

Maybe this open GOP mode has some sort of recovery mode (e.g. seek anywhere, decode at least N frames -> all frames after it can be decoded without errors)

I think that's usually called periodic intra refresh or so in the general context of encoders like x264 etc (after N pictures anywhere in the stream you should have the full amount of data to start fully decoding), and yes - containers and multimedia frameworks almost never have the required info to note this, unfortunately. It's generally utilized in streaming so most clients just keep on playing A->B until they are properly initialized.

Open GOP on the other hand usually just means that there might be pictures which in coding order are after a random access point, but which in presentation order are before the random access point and refer to coded pictures from before the random access point. Possibly useful for stuff like forcing a specific GOP layout (random access point at every 25 pictures) but still trying to optimize the encoding a bit more.

This should not be any different from any other random access point, to be honest. So it just seems like either:

whatever muxed this file did not interpret the Open GOP style of random access point correctly and thus didn't mark them as random access points in the muxer.
SVT-AV1 failed at marking those pictures correctly.

Both are possible. I will take a look at the sample file and see if by poking some AV1-related folk I can figure out which case this is.

jeeb on 5 Jun 2020

👍2

All 5 comments

There's only 1 key frame in that file, so there are two choices:

precise seeking, which decodes and skips all data until the seek point (slow)
seeking only to the start or past the end of the file with key frame seeking (fast)

Maybe this open GOP mode has some sort of recovery mode (e.g. seek anywhere, decode at least N frames -> all frames after it can be decoded without errors), but we wouldn't know about this. I suggest not using a clown world codec.

ghost on 5 Jun 2020

🎉1

Oh interesting, I hadn't thought to check that. I reencoded the file explicitly setting a keyframe interval to SVT-AV1 with --keyint 31, but according to ffprobe, the file still only has 1 key frame. I suppose ffmpeg (or dav1d/libaom-av1) doesn't support whatever SVT-AV1 is doing when in open GOP mode.

reedrs on 5 Jun 2020

If anything, you should probably ask lavf to handle open GOP consistently across codecs, and ask Matroska to spec it.

ghost on 5 Jun 2020

Maybe this open GOP mode has some sort of recovery mode (e.g. seek anywhere, decode at least N frames -> all frames after it can be decoded without errors)

This should not be any different from any other random access point, to be honest. So it just seems like either:

whatever muxed this file did not interpret the Open GOP style of random access point correctly and thus didn't mark them as random access points in the muxer.
SVT-AV1 failed at marking those pictures correctly.

Both are possible. I will take a look at the sample file and see if by poking some AV1-related folk I can figure out which case this is.

jeeb on 5 Jun 2020

👍2

@reedrs finally got to checking this sample and reading the AV1 spec.

First I checked the part 7.6.2 of the spec, which defines the different types of random access points. Effectively, there's three of which all of them depend on a coded picture with frame_type of type KEY_FRAME. If you then look at part 6.8.2 (Uncompressed header semantics) of spec, this is the value 0 (zero).

With a new enough FFmpeg, you can utilize the trace_headers bit stream filter to dump the details of the stream. For example:

ffmpeg -i VIDEO_FILE -map 0:v -c copy -bsf:v trace_headers -f null - 2> VIDEO_FILE.trace

And then you can utilize your favourite thing capable of regular expressions to check out how many cases of frame_type\s*00 there is in the output. With a quick check, this count is at one with this sample.

Thus, as far as I can see FFmpeg properly muxed your AV1 stream into matroska and mpv really can't do anything faster than it does: there is just one random access point.

jeeb on 8 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings