Exoplayer: Occasionally ANRs on Huawei devices

Created on 13 Aug 2019  路  22Comments  路  Source: google/ExoPlayer

Issue description

We experience occasionally ANRs on different Huawai devices when clearing video surface on an simpleExoPlayer instance. We experienced something similar before reported by my colleague in #3724.

Reproduction steps

Although we have similar or same devices to test on we cannot reproduce this issue in our environment.

A full bug report captured from the device

Unfortunatly we only have the obfuscated ANR log available but we could deobfuscate the following ANR report from play console:

"main" prio=5 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x751288f0 self=0x7a54615c00
  | sysTid=12663 nice=-10 cgrp=default sched=0/0 handle=0x7adaeea548
  | state=S schedstat=( 32648761976 3586515706 36908 ) utm=2876 stm=388 core=5 HZ=100
  | stack=0x7ff5661000-0x7ff5663000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait (Native method)
- waiting on <0x0f3f840c> (a com.google.android.exoplayer2.O)
  at com.google.android.exoplayer2.PlayerMessage.boolean blockUntilDelivered() (SourceFile:8)
- locked <0x0f3f840c> (a com.google.android.exoplayer2.O)
  at com.google.android.exoplayer2.SimpleExoPlayer.void setVideoSurfaceInternal(android.view.Surface,boolean) (SourceFile:107)
  at com.google.android.exoplayer2.SimpleExoPlayer.void setVideoSurface(android.view.Surface) (SourceFile:11)
  at com.google.android.exoplayer2.SimpleExoPlayer.void clearVideoSurface(android.view.Surface) (SourceFile:6)
...

Version of ExoPlayer being used

We are using 2.10.3 in our current release.

Device(s) and version(s) of Android being used

Happens in Android 9 in the following devices

Mate 10 Pro (HWBLA) 103 27,3 %
P20 Pro (HWCLT) 69  18,3 %
P20 (HWEML) 66  17,5 %
Mate 20 lite (HWSNE)    43  11,4 %
HUAWEI P30 lite (HWMAR) 20  5,3 %
P30 Pro (HWVOG) 18  4,8 %
HUAWEI P30 (HWELE)  13  3,4 %
Mate 20 Pro (HWLYA) 10  2,7 %
Mate 20 (HWHMA) 9   2,4 %
HUAWEI P smart 2019 (HWPOT-H)   5   1,3 %
Honor 10 (HWCOL)    5   1,3 %
Honor 8X (HWJSN-H)  5   1,3 %
device specific

All 22 comments

This issue does not seem to follow the issue template. Make sure you provide all the required information.

Could you try to set the same workarounds as used in #3724? That is set codecNeedsSetOutputSurfaceWorkaround to true?

Unfortunatly this would mean that we enable the workaround in our production environment since we cannot reproduce this issue on our local devices. And I think we should avoid this :see_no_evil:.
But we could try patching the Renderer and add the mentioned devices to try if this reduces this failure.

To reproduce locally, can you try to switch surfaces

  1. from one surface to another (while playback is already running),
  2. from a surface to null (i.e. no surface)
  3. from null to to an actual surface.
    If that workaround helps, at least one of these cases should fail consistently on these devices.

Great! Thanks for the fast reply! I will try to reproduce this on one of these devices :+1:

I build a demo application to try these switches on a Pixel 3a and an Honor10. Turns out, app is not crashing on Honor10 but exoplayer is cycling buffers from a previous set surface into the video rendering. When setting codecNeedsSetOutputSurfaceWorkaround to true this issue is gone.

On Pixel3a, sm-t813, lg-h850 the app behaviour is as expected.
P20 had the same issues as Honor 10

I tested the surface switches on a Mate 10 Pro (because that seems to get the most errors in your table above), but couldn't reproduce the problem nor any cycling buffers. There also shouldn't be any problems in theory because Android device specification now ensures the correct behavior on SDK versions 27+.

The cycling buffer issue you mentioned above is probably different from the setVideoSurface ANR one. So maybe file a new issue and provide more detailed reproduction steps so that we can have a closer look.

For the ANR issue, this might also just be a case of the Android platform MediaCodec code being too slow to respond to the setSurface command. See #5887 and #5078 for other examples of this. Unfortunately, we need to block on these methods to ensure we don't accidentally leak decoder instances for other apps to use. And we are also trying to get stricter guarantees for MediaCodec calls in future Android releases to prevent this from happening.

Besides that. it would still be good to see if setting the workaround flag helps to eliminate the issues you are having with these devices because we could then add them to the workaround list.

Thanks for your efforts and feedback!
I will upload my demonstration code and write another issue about the buffer cycling behaviour.
I will add the mentioned device list to the workaround list in our build and followup with results if this resolves our issue.

Can we duplicate this onto https://github.com/google/ExoPlayer/issues/6331? The symptoms may be different but it seems like the solution is going to be the same.

May let us wait until we have verified ANR statistics with our upcoming release to ensure that this will help with the ANR issues. (~1.5 Weeks)


Hey @stetro. We need more information to resolve this issue but there hasn't been an update in 14 days. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

Hi, we now have our release distributed to clients and it turns out that these ANRs still happen during app start on the above shown devicelist in a similar frequency. In conclusion to this I assume that the workaround does not have any effect. :disappointed:

We now investigate further if this ANR happens in combination of Exoplayer and a weird view-lifecycle on devices running EMUI 9. But this again will take a couple of days until we see evidence after our sprint release.

Thanks for the update!

Hello,
I have a similar issue on ALL devices (not just Huawei).
Some of them report on ANR at blockUntilDelivered. Here is a typical ANR report:

"main" prio=5 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x75794770 self=0x3073021c00
  | sysTid=23917 nice=0 cgrp=default sched=0/0 handle=0x2fed5cd548
  | state=S schedstat=( 10073527995 6894125827 42689 ) utm=789 stm=217 core=2 HZ=100
  | stack=0x7fd1b1d000-0x7fd1b1f000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait (Native method)
- waiting on <0x0c5fefc8> (a com.google.android.exoplayer2.PlayerMessage)
  at com.google.android.exoplayer2.PlayerMessage.c (PlayerMessage.java:283)
- locked <0x0c5fefc8> (a com.google.android.exoplayer2.PlayerMessage)
  at com.google.android.exoplayer2.SimpleExoPlayer.a (SimpleExoPlayer.java:1471)
  at com.google.android.exoplayer2.SimpleExoPlayer.a (SimpleExoPlayer.java:71)
  at com.google.android.exoplayer2.SimpleExoPlayer$ComponentListener.surfaceDestroyed (SimpleExoPlayer.java:1719)
  at android.view.SurfaceView.updateSurface (SurfaceView.java:641)
  at android.view.SurfaceView.onWindowVisibilityChanged (SurfaceView.java:252)
  at android.view.View.dispatchWindowVisibilityChanged (View.java:12868)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewRootImpl.performTraversals (ViewRootImpl.java:1854)
  at android.view.ViewRootImpl.doTraversal (ViewRootImpl.java:1536)
  at android.view.ViewRootImpl$TraversalRunnable.run (ViewRootImpl.java:7502)
  at android.view.Choreographer$CallbackRecord.run (Choreographer.java:949)
  at android.view.Choreographer.doCallbacks (Choreographer.java:761)
  at android.view.Choreographer.doFrame (Choreographer.java:696)
  at android.view.Choreographer$FrameDisplayEventReceiver.run (Choreographer.java:935)
  at android.os.Handler.handleCallback (Handler.java:873)
  at android.os.Handler.dispatchMessage (Handler.java:99)
  at android.os.Looper.loop (Looper.java:193)
  at android.app.ActivityThread.main (ActivityThread.java:6720)
  at java.lang.reflect.Method.invoke (Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:858)

I investigated a little bit and I think that the cause is a possible deadlock in the following scenario:
I have 3 threads:

  1. The ExoPlayer thread (the one that I used to create ExoPlayer and send control messages)
  2. Other Async thread for doing some application-specific logic
  3. The UI thread

In my application I need to perform some logic on the ExoPlayer thread, since this is the only way I can access various states of the player. So, during this I may request a lock on a shared resource in my application.
Now, lets assume that this lock is being held by my other ASync thread, and in the same time, I try to perform the following command on the UI thread:
setVisibility(INVISIBLE) for making the video container (my custom VideoView) to disappear.
This will cause the destroySurface event to be thrown on the UI thread, and then the following code in SimpleExoPlayer.java will block the UI thread:

 private void setVideoSurfaceInternal(@Nullable Surface surface, boolean ownsSurface) {
    // Note: We don't turn this method into a no-op if the surface is being replaced with itself
    // so as to ensure onRenderedFirstFrame callbacks are still called in this case.
    List<PlayerMessage> messages = new ArrayList<>();
    for (Renderer renderer : renderers) {
      if (renderer.getTrackType() == C.TRACK_TYPE_VIDEO) {
        messages.add(
            player.createMessage(renderer).setType(C.MSG_SET_SURFACE).setPayload(surface).send());
      }
    }
    if (this.surface != null && this.surface != surface) {
      // We're replacing a surface. Block to ensure that it's not accessed after the method returns.
      try {
        for (PlayerMessage message : messages) {
          message.blockUntilDelivered();
        }
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }

I assume that the dead-lock is bacause you need the ExoPlayer thread to be running, in order to activate a flow that will release the message block - but it's blocked.

I would like to suggest to add a special case where the surface parameter is null (which is the case in my scenario), so that no messages will be sent or blocked. You don't need to send any messages if the surface is being destroyed, right?

Thanks

@stetro. Do you have any updates for the investigations mentioned above?

Hi @tonihei Thanks for the reminder: Unfortunately we could not find any evidence with different view behavior on EMUI devices.

Thanks! In this case, I'll close this issue because there is nothing we can do about it. If anything new comes up, please feel free to reopen (or file a new issue).

But maybe for completeness: Since we constantly keep updating to the latest ExoPlayer release we see a decay of ANRs in our releases during the last couple months. Unfortunately I cannot pinpoint a specific release which might has fixed an issue here.

But maybe for completeness: Since we constantly keep updating to the latest ExoPlayer release we see a decay of ANRs in our releases during the last couple months. Unfortunately I cannot pinpoint a specific release which might has fixed an issue here.

Did a new release solve the issue for you @stetro? We happen to have a similar ANR experienced only on Huawei HUAWEI P smart+ 2019 (HWPOT-H) using version 2.10.3. I'd like to know if a newer release solved it for you.

We are running 2.11.5 and we ware seeing only a couple of ANRs on those devices the last three months
image Before it was like 100 cases per day per device type.

We are running 2.11.5 and we ware seeing only a couple of ANRs on those devices the last three months
image Before it was like 100 cases per day per device type.

Looks promising! Will try updating it as well and report back how things go. Thanks!

Was this page helpful?
0 / 5 - 0 ratings