Encode a video with SVT-AV1 using the --irefresh-type 1 option (which is the default). Or download the example file below. Play the video with mpv, using either of the AV1 decoders --vd=libaom-av1 or --vd=libdav1d.
Attempt to seek forward in the video by clicking on the OSD. Also try seeking forward with the right arrow key.
Seeking should be fast, no matter how far into the video you try to seek. Seeking should work, no matter what method is used.
When clicking on the OSD, seeking is slow, taking longer to seek the longer into the video you try to seek. It seems like mpv is re-reading the entire file on every seek.
Seeking forward with the right arrow key causes mpv to exit immediately, even if there is plenty of the video left.
av1_open_gop_example_arrow_key_seek.log
av1_open_gop_example_osd_seek.log
An example file (18s, 50MB): https://0x0.st/iOEm.mkv
There's only 1 key frame in that file, so there are two choices:
Maybe this open GOP mode has some sort of recovery mode (e.g. seek anywhere, decode at least N frames -> all frames after it can be decoded without errors), but we wouldn't know about this. I suggest not using a clown world codec.
Oh interesting, I hadn't thought to check that. I reencoded the file explicitly setting a keyframe interval to SVT-AV1 with --keyint 31, but according to ffprobe, the file still only has 1 key frame. I suppose ffmpeg (or dav1d/libaom-av1) doesn't support whatever SVT-AV1 is doing when in open GOP mode.
If anything, you should probably ask lavf to handle open GOP consistently across codecs, and ask Matroska to spec it.
Maybe this open GOP mode has some sort of recovery mode (e.g. seek anywhere, decode at least N frames -> all frames after it can be decoded without errors)
I think that's usually called periodic intra refresh or so in the general context of encoders like x264 etc (after N pictures anywhere in the stream you should have the full amount of data to start fully decoding), and yes - containers and multimedia frameworks almost never have the required info to note this, unfortunately. It's generally utilized in streaming so most clients just keep on playing A->B until they are properly initialized.
Open GOP on the other hand usually just means that there might be pictures which in coding order are after a random access point, but which in presentation order are before the random access point and refer to coded pictures from before the random access point. Possibly useful for stuff like forcing a specific GOP layout (random access point at every 25 pictures) but still trying to optimize the encoding a bit more.
This should not be any different from any other random access point, to be honest. So it just seems like either:
Both are possible. I will take a look at the sample file and see if by poking some AV1-related folk I can figure out which case this is.
@reedrs finally got to checking this sample and reading the AV1 spec.
First I checked the part 7.6.2 of the spec, which defines the different types of random access points. Effectively, there's three of which all of them depend on a coded picture with frame_type of type KEY_FRAME. If you then look at part 6.8.2 (Uncompressed header semantics) of spec, this is the value 0 (zero).
With a new enough FFmpeg, you can utilize the trace_headers bit stream filter to dump the details of the stream. For example:
ffmpeg -i VIDEO_FILE -map 0:v -c copy -bsf:v trace_headers -f null - 2> VIDEO_FILE.trace
And then you can utilize your favourite thing capable of regular expressions to check out how many cases of frame_type\s*00 there is in the output. With a quick check, this count is at one with this sample.
Thus, as far as I can see FFmpeg properly muxed your AV1 stream into matroska and mpv really can't do anything faster than it does: there is just one random access point.
Most helpful comment
I think that's usually called periodic intra refresh or so in the general context of encoders like x264 etc (after N pictures anywhere in the stream you should have the full amount of data to start fully decoding), and yes - containers and multimedia frameworks almost never have the required info to note this, unfortunately. It's generally utilized in streaming so most clients just keep on playing A->B until they are properly initialized.
Open GOP on the other hand usually just means that there might be pictures which in coding order are after a random access point, but which in presentation order are before the random access point and refer to coded pictures from before the random access point. Possibly useful for stuff like forcing a specific GOP layout (random access point at every 25 pictures) but still trying to optimize the encoding a bit more.
This should not be any different from any other random access point, to be honest. So it just seems like either:
Both are possible. I will take a look at the sample file and see if by poking some AV1-related folk I can figure out which case this is.