Exoplayer: HLS with discontinuities - decoding errors (blocking)

Created on 3 Feb 2017  路  29Comments  路  Source: google/ExoPlayer

We have a set of streams where I work (Belgian broadcaster) where we dynamically insert advertising server-side. We use HLS streams, where the content is encrypted with AES, the ads are not encrypted. Between every content/ad/other ad switch there is a discontinuity.

Sample stream: https://medialaan.ambroos.io/hls/2/discontinuity-decoding-error.m3u8

Playing this stream in ExoPlayer (we've seen it on 1.5.9, 2.2.0, dev-v2) results in the following:

  • Content works great (0:00 - 0:43)
  • Ads get visual decoding errors (0:43 - 1:23)
  • Content after the ads works fine (1:23 - end)

Screenshot of the decoding issue: https://i.imgur.com/1D5INJR.png

We have observed the issue on the following devices:

  • Sony Xperia Z5 (E6653, Android 7.0, MSM8994)
  • Huawei Y6 (SCL-L21, Android 4.3, MSM8909)
  • Nexus 5X (Android 7.1.1, MSM8992)

We could not reproduce it on the following

  • HTC Nexus 9 (Android 6.0 most likely, Tegra K1)
  • Huawei Mediapad M2 (Android 5.1, M2-A01L, Hisilicon Kirin 930)
  • Samsung Galaxy S6 (SM-G920F, Android 6.0.1, Exynos 7420 Octa)
  • Samsung Galaxy Note 10.1 2014 (SM-P600, Android 5.1.1, Snapdragon 800)

I suspect it's only going to be seen on devices with Qualcomm SoCs.

A sample ExoPlayer demo build (r2.2.0) with the failing stream inside: https://medialaan.ambroos.io/hls/2/exoplayer-bad-decode.apk

device specific

Most helpful comment

There are still no news about this, guys, but it's still being tracked.

All 29 comments

Just a quick note: the ad segments by themselves work correctly, or if you start the stream in an ad position.

I suspect there's something in the decoder that doesn't get reset on discontinuity and then fails to decode the possibly slightly different ad segments correctly.

(We did not observe this issue on any other platform so far.)

I think I have rootcaused the issue. The solution might have to wait a bit, though.

Note to self: Codec reconfiguration without reinstantiation is attemped, but it is not working well. When not, playback works well.

@AquilesCanta Thanks! Do you know if there's any way we can work around this issue temporarily (by changing the encoding configuration of either the content or ad segments, for example)?

What I could observe is that the PSP and PPS differ between content and ads, I imagine that using the same parameters would fix the issue. I am not 100% sure, though. I need to research a bit more. I think this should definitely be fixed in exoplayer.

An ignorable note is that you change any value(like width, for instance) that would make MediaCodecVideoRenderer#canReconfigureCodec() return false, that would make things work, but this is just anecdotal.

Hm. Considering our content comes from a hardware encoder (limited configurability), and the advertisements are processed by a third party using ffmpeg, I don't think we're ever going to get the PPS and SPS to match exactly (but my knowledge of H264 is very limited).

I'll keep my test stream available. Thanks for looking at our issue!

It does not reproduce on Sony Bravia 4k, I wonder if we could apply a codecNeedsAdaptationWorkaround here, @andrewlewis? Either that or codec reconfiguration is not working properly and the TV's decoder is somehow managing to work around it.

@AquilesCanta We've only managed to reproduce it on some of our test devices. I have only seen it on non-Samsung devices with a Qualcomm GPU. I have never seen it happen on any device with another GPU vendor. Since Bravia's have MediaTek GPU's, I guess that makes a weird kind of sense.

It's a very, very strange thing.

Another interesting thing, we experimented with the resolution switch (to make canReconfigureCodec return false), but this did not help, decoding errors are still visible.

Sample stream: https://medialaan.ambroos.io/hls/diff-res/manifest.m3u8 (ads start at 0:40).

Is there any way we could sponsor development of a fix for this? MEDIALAAN would be more than happy to do something like that. Feel free to contact me at ambroos.[email protected].

That is indeed strange! We have already contacted the media codec team to see what's their view on this. We will post updates here.

There are still no news about this, guys, but it's still being tracked.

@AquilesCanta Hi Santiago, sorry for bothering you again. Any news?

@Ambroos I'm afraid we haven't heard anything back from Qualcomm yet but will ask them for an update.

Knowing that this issue occurs after some adaptations is not enough to implement a targeted workaround -- we really need to find out what characteristics of this adaptation are causing the problem. One option is to try making incremental modifications to the parameter sets in the content/ads and seeing whether the issue is still reproducible with each modification. On a different chipset we saw issues with adaptations where the maximum number of reference frames changed but not the resolution, so you could try changing the number of reference frames used in the content/ads and see if this affects whether the issue is reproducible or not.

@andrewlewis Thanks, I'll try to find some time to see if I can trigger the issue with a specific parameter change!

What we have noticed is that the NexPlayer demo does not have issues with the test stream on the same devices. I'm not sure if it's helpful since it's a closed-source thing, but at least we know that it is possible to play the streams without issues on Qualcomm Android devices.

Would you mind providing a logcat output using while using nexplayer? That will allow us to determine whether a decoder is being instantiated.

This is the unfiltered logcat output during playback of the faulty m3u8 example with NexPlayer:
https://gist.github.com/Ambroos/7ad735790461b41c4a29fd11a431b745

Playback starts around 14:57:50, the ad break starts around 14:58:30 and ends around 14:59:10.

We a new set of devices with issues on these kinds of streams that don't use Qualcomm SoCs:

Huawei P8 Lite (ALE-L21)
When reaching the discontinuity, the video freezes on the first frame of the discontinuity while audio keeps playing. When the 'main' stream resumes after a discontinuity, video starts playing again.

Huawei Ascend P7
When reaching the discontinuity, the video freezes (not sure on which frame). After this, no more video streams can be played correctly until the device is rebooted.

Both devices use HiSilicon Kirin SoC's with a Mali-450MP4 GPU.

Any news @ojw28 / @AquilesCanta / @andrewlewis ? I see the bug label was removed. This issue affects a ton of devices (we suspect well over 30% of all devices in our target market Belgium). We're a bit lost and would like to avoid having to switch to a completely new proprietary video playback SDK.

Our offer to sponsor development to fix this issue still stands, if it is a possibility.

Sorry for the delay on updating this, but we haven't had any response either. I'll see what I can do to provide a solution from our side.

I'll have a reply next week, probably on Tuesday/Wednesday.

Hello, @Ambroos. We still haven't decided what to do about this as we don't know the root cause of the reconfiguration error (making it hard to target a workaround). I don't have much time right know to allocate to the required research, so it will have to wait unfortunately.

Being urgent for you, I can suggest disabling codec reconfiguration on devices that reproduce the problem. To do this, you can return false on MediaCodecVideoRenderer#canReconfigureCodec()[1] for them. This is not ideal, since adaptation works well on most other cases, but I think this is a sufficient patch until a proper solution is built in.

I know it's inconvenient to add changes to the library itself, but being a small change, I don't think there are great upgrading costs to it. Let me know if this solves your issue for now. I'll try to come back to this as soon as I have some more time.

[1] MediaCodecVideoTrackRenderer#canReconfigureCodec() on v1.

Hey @AquilesCanta, thanks for the information! I've tried disabling codec reconfiguration as suggested and it seems to work well. Is there a downside to doing this on all devices? We're still not 100% confident in our detection of the problem and would prefer to play on the safe side.

Other than a short pause (say, 100 ms), there shouldn't be any problems. Of course, in comparison, there is simply not much choice here. Do keep me updated if you run into problems.

Something that @ojw28 brought up: It would probably be a good idea only to disable reconfiguration if the resolution is unchanged.

@AquilesCanta @andrewlewis - Did we ever figure out a sensible way to target a workaround for this in ExoPlayer code? Is there an internal thread or bug about this with QC?

I don't think we found a viable workaround. No response yet so I've pinged the internal bug 35233870.

Hi,

I was looking at this issue and I think we identified the underlying adaptation change that causes the issue.

It seems to be indeed around the reference frames, more specifically around B-Frames. When the decoder is initialised with an SPS indicating maxNumRefFrames = 1 and new format has maxNumRefFrames > 1, the adaptation fails and re-configuring seem to be not enough. However, if the decoder is initialised with maxNumRefFrames > 1, adaptation back and forth works fine.

This can be verified with the test content posted on this issue if you start playback at the Ad. In that case the decoder will be initialised with B-Frame support and everything works as expected.

We validated that its the B-frames by transcoding the Ad segments once with and once without B-frames, keeping all other settings the same.

Using this for a targeted workaround, I am not sure if it is enough to check if the decoder was ever initialised with maxNumRefFrames > 1 or if an increase of the number of reference frames should also fully reset the decoder and disable re-configuration. Considering that this effects Qualcomm chipsets, it would be very good to know if the first case is enough.

@andrewlewis you mentioned in an earlier comment that you observed an issue around max reference frames before on other chipsets, and it seems to me that this is the same or a very similar underlying cause. Was that issue resolved and do you maybe have more info?

Right now I am looking at a patch where I parse the SPS from the init data on a format change, extract the max number of ref frames from there and then use it as in indicator to disable reconfiguration. I would still like to do the full reset only once when I observe maxNumRefFrames > 1 for the first time (and it was 1 before), but I don't know if that is a good idea since the decoder might need to allocate memory according to the number of reference frames, and in that case it would probably need a full re-set every time the number of reference frames increases. Any hints here would be much appreciated.

@thasso Thanks for the information. That issue affected the H.264/AVC decoder in Nexus 9 devices. The cause was similar to what you've described here: a reconfiguration to the same resolution but with more reference frames was not being handled properly. We resolved it with the workaround in fdf26d6a1f10cb07a2e6301b7e4225e44b4b8825, which forces buffers to be reallocated.

If you make MediaCodecRenderer.codecNeedsAdaptationWorkaround return true does it fix this issue? This workaround may be more efficient than releasing the codec and initializing a new one.

It's been a while, but there's an updated firmware that addresses this issue in review at the moment [Internal bug b/35233870].

QC submitted a firmware fix for this bug in their video driver, so it should be fixed in any future system images.

Was this page helpful?
0 / 5 - 0 ratings