Status-react: Fix 64bit apk freezes on arm64 hardware

Created on 29 Jul 2019  路  52Comments  路  Source: status-im/status-react

Problem

When building the app for 64bit arch as required by Google Play Store, we get freezes on arm64 hardware.
The freezes are random. There is no obvious indicator of a freeze in the logs besides the fact that there is no more js logging statements appearing, or no js logs at all in logcat if the freeze occurs during splash screen.

Implementation

We have tried:

Dismissed cause:

  • status-go is compiled for all arch already with gomobile so we consider it is not the issue (will try with debug level logs though to make sure it isn't an arch issue on go side)

We need to try:

  • considering that it seems like there is no logs for status-go whatsoever in 64bits arch, we need to investigate the reason behind that
  • upgrading react-native and using hermes js engine (which is also said to be much faster)

Acceptance Criteria

No freezes of the 64bits apk on 64bits devices

release

Most helpful comment

All 52 comments

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

Did anyone else with React Native or X other technology see similar issues?

Has someone tried to setup a minimally failing test case? What's the current best hypothesis?

Also cc @PombeirP

Could this be related https://github.com/facebook/react-native/issues/24260? Can we add version numbers, platform and environment details to original issue?

i couldn't find any reports with similar issues, but here https://github.com/facebook/react-native/issues/25494#issuecomment-514996960 , ppl still experience problems and author of 59.10 fix suggests to try 60.4 and new engine, we're trying to check this currently with @yenda but nix makes things slower, maybe @PombeirP could help with that

Tested as well without realm and with latest jsc https://github.com/status-im/status-react/pull/8528/commits/a1834903b0d6c6f030a60f7763f97fd40c0ec167

Still freezes

Next steps:

  • we can try upgrading as many libs as possible, possibly focusing on those who are using native code, though I don't have much hope for that because there is 0 errors in the logs when the freezes occur
  • upgrading react-native and using hermes https://github.com/status-im/status-react/issues/8672

Things that we could explore:

  • trying to get more logs

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

Comparing these logs, what stands out is that there are no entries for ReactNativeJS/StatusModule/RNKeychainManager on the 1st case.

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

In this one, there are entries for ReactNativeJS, but not for StatusModule/RNKeychainManager.

Comparing lib/arm64-v8a/libstatus-logs.so and lib/armeabi-v7a/libstatus-logs.so, the 64-bit version of the lib suspiciously takes <6KB while the 32-bit takes almost 14KB. I'll be investigating that, maybe the 64-bit version is currently just a skeleton.

we can try upgrading as many libs as possible, possibly focusing on those who are using native code, though I don't have much hope for that because there is 0 errors in the logs when the freezes occur

Honestly I think this is a suboptimal approach to debugging the problem. The problem with it is it that it is a positive one, where you add stuff. The only time this would make sense imo is if there's a clear hypothesis, for example based on similar bug reports, that this would resolve the issue.

A better approach to isolate the error instead is the negative one, where we try to find a minimally failing test case. This is also what would be the most useful once we find out what the upstream (?) bug is.

I'm comparing the readelf output from both the 32-bit lib and the 64-bit lib to check the import/export tables, and you can definitely see that quite a bit is missing (although the main export JNI_OnLoad is indeed there):

image

In addition to comparing the full objdump logs, I'll try adding code that crashes the app in JNI_OnLoad to confirm that it is indeed getting called for arm64-v8a.

Even more telling is dumping the contents of the .rodata ELF section (readelf -p.rodata):

image

I'll try to find out where the missing strings are coming from and that should inform us of what input is missing in the arm64 library.

I created a local PR build and from the logs can see that status-go is working (search for GoLog): https://gist.githubusercontent.com/PombeirP/380cf5da9e7e2d4eb4da79e3ca9620fd/raw/87b96bb4a389bf7e9633c1209b608628a15d3707/Status.log

OK, some more data. In this run, I was able to create an account and see the chat screens. I then went to add a public chat, and it hung on the public chat list. Even though the UI was hung, I could still see signals from status-go being processed by the status-react Java code, e.g.:

07-31 11:54:18.813 11840 12008 D StatusModule: Signal event: {"type":"wallet","event":{"type":"newblock","blockNumber":8257691,"accounts":[]}}

From subsequent runs, I gathered some additional observations:

  • hangs can occur as soon as the splash screen is shown, or as late as adding a public chat;
  • since native/status-go code is still running, the obvious culprit would seem to be the JS thread being locked up for some reason;
  • status-logs isn't at the root of this issue, I've made a build that didn't load the lib, but it still hung on the splash screen (and other places in the app, depending on the run);
  • I could eventually post a message to #status-core, without the app locking up (by chance);
  • native animations and text inputs, of course, run even when the JS UI is hung.

So as we are suspecting the JSC, we are going to switch it with Hermes to see if the problem persists

Currently trying to build any non-trivial RN sample app which is built on 0.59.x, such as from this list, to see if they suffer from the same issue or if we can modify them to reproduce the issue, but it's proving challenging to find one that actually builds and runs...

maybe it would be better to remove all from status-react, create a few simple screens and try

I think @flexsurfer has the right idea. Building the app using our current stack but with almost everything removed to see if it fails the same way, and if it doesn't keep re-adding screens/whatever else to see what triggers the freeze.

keep re-adding screens/whatever else to see what triggers the freeze.

this may be challenging and time-consuming

One more quick idea is to build release v13 for 64, because in current develop we have all performance optimizations, js modules etc

Error libgnustl_shared.so is 32 bit instead of 64-bit (android) https://github.com/status-im/status-react/issues/4478
fix libgnustl_shared.so is 32 bit instead of 64-bit https://github.com/status-im/status-react/commit/b50cbd036b2dbc004bd72ff05d951d60cb3cd739
so before this commit we had 64-bit working version?
no: https://github.com/status-im/status-react/commit/4b7360db17e68cf865958316cc59131bdeeec378#diff-dc46f9f5dfe204e394fb41395ccd03cf

@flexsurfer don't waste your time going back, previous versions of react-native didn't even support 64bits

first version of react-native to support 64bits was 0.59.1
we made a big jump for 56 to 59 in this commit https://github.com/status-im/status-react/commit/24a978d44219314f550b5b02323c09c9cfcf3548#diff-66a880bc35034360e66422980df146b0 which was part of 0.12 https://github.com/status-im/status-react/blob/0.12.0-mobile/mobile_files/package.json.orig

The v13 build seems to fail for a different reason:
09:25:17 From https://github.com/status-im/react-native-status-keycard 09:25:17 * [new branch] ios -> origin/ios 09:25:17 ! [rejected] v2.5.5 -> v2.5.5 (would clobber existing tag) 09:25:17 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

If I build v13 locally I get a different error:

> Task :app:processReleaseManifest FAILED
/home/pedro/src/github.com/status-im/status-react/android/app/src/release/AndroidManifest.xml:22:18-91 Error:
        Attribute application@appComponentFactory value=(android.support.v4.app.CoreComponentFactory) from [com.android.support:support-compat:28.0.0] AndroidManifest.xml:22:18-91
        is also present at [androidx.core:core:1.0.0] AndroidManifest.xml:22:18-86 value=(androidx.core.app.CoreComponentFactory).
        Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:26:5-102:19 to override.

@PombeirP could you take a look?

@flexsurfer It's hard to say whether it fixes the issue or not, since with this change most of the screens were removed, so there is not much that can be done inside the app to exercise the functionality. Perhaps the only indicator is whether it stops locking up in the splash screen, but I didn't get that often in my P20 Pro. Normally it hung when creating an account or joining a public chat. I'll try to see whether I'm unable to repro the hang at splash screen.

@PombeirP we tested the same version but with status native module and it froze

It doesn't prove beyond a reasonable doubt that the problem is inside react-native-status (it could be due to some weird interaction), but it does give us some clues to investigate. The question is what part of react-native-status might be causing this. If we disable all calls to status-go instead of removing react-native-status altogether, we'll still have a very crippled app but at least we're able to check if the app still hangs at startup or not.

I'm looking at gomobile to see if there's any weird arm64 stuff going on, like this, and whether our Reproducible Android gomobile build differs from the status-go Jenkins build.

Did another experiment, to rule out whether the issue was with the Nix status-go build: I modified the Nix recipe to build status-go (with gomobile) so that it just takes the latest status-go manual build from Jenkins. I was able to progress all the way to add a public chat in one go, but it still hung at that point.

Also, fwiw @flexsurfer I did the following experiments, all of which still resulted in hangs:

  • leaving setMobileSignalHandler in and commenting out the contents of StatusModule.handleSignal;
  • commenting out the contents of SetMobileSignalHandler on the Go side.

It's as if as soon as we call into Statusgo, we're prone to this hang. (will test the hypothesis by calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this)).

it's not only setMobileSignalHandler, any call to status go may lead to hang
this is latest develop without go calls, and it works
https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-190801-160041-8ebd2a-pr.apk
this is the change https://github.com/status-im/status-react/pull/8326/commits/8ebd2a1e2c17753162fef49641ab5d57cf908374

Yup, calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this) still caused the app to hang (only on arm64), so it sounds like as soon as we call into Statusgo, we're prone to these hangs. Now the question is: why is that the case?

I'll try upgrading to latest gomobile. It seems to have some changes relevant to arm64.

UPDATE: Upgrading gomobile didn't seem to fix the hang, unfortunately.

Maybe we should build a proof-of-concept RN app which calls a simple method in Go through gomobile? I'll be on holiday tomorrow, but I can work on that on Monday.

I was able to create a minimal RN sample that reproduces the hang. It still uses status-go as the import, so now the next step is to use a minimal gomobile project.

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

Repro app is available here. @flexsurfer I'd appreciate it if you could follow the build instructions to ensure that others can also reproduce it before I create an issue in the golang repository.

For clarity, what about:

  1. Simplest Android app (no RN) and Go?
  2. Simplest Android app and no Go?

For clarity, what about:

1. Simplest Android app (no RN) and Go?

2. Simplest Android app and no Go?

That'd be useful, but since the way that this bug manifests itself is through a hang of the JavaScript UI thread, I'm not sure how we'd reproduce it in a completely native app. We could get lucky and witness a hang of the native main thread, but might be a long shot. Still worth trying though.

I'd like to also generate a native arm64-v8a through some method other than Gomobile (maybe xgo?), in order to prove that the issue really comes from the way they generate the library.

I've created a sample app derived from an Android sample (BasicTransition) which calls the same gomobile library, but it doesn't seem to suffer any problems after invoking DoSomething() on that library.

I've also tried using xgo to build an aar which contains an arm64 library, but for some reason xgo only seems to want to build 32-bit targets in the aar. Maybe @adambabik knows more about this.

Sounds good.

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

Seems useful, perhaps people upstream have further thoughts. I notice there are a bunch of issues related to arm64 and mobile https://github.com/golang/go/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+arm64+mobile+ but don't know if any of them are related. Maybe gdb and backtrace would help? On the Go side obviously.

Perhaps it'd make sense to open an upstream issue in both Go and RN repo?

I've built the sample using RN 0.60.4 and these are the findings when running the debug app:

  • Hermes disabled:

    • arm64-v8a: it still hung when asked to reload JS;

    • armeabi-v7a: did not hang when asked to reload JS;

  • Hermes enabled:

    • arm64-v8a: did not hang when asked to reload JS;

    • armeabi-v7a: did not hang when asked to reload JS;

So while this does not inform us regarding the root cause of the interaction (i.e. is this an issue with RN or gomobile?), it gives us a path forward for the future releases.

Hermes is still not stable and there are bugs like this https://github.com/facebook/react-native/issues/25864, maybe we could try to upgrade app to 0.60.4, enable Hermes and test it on different Android devices? @annadanchenko @PombeirP wdyt?

@flexsurfer that was the plan https://github.com/status-im/status-react/issues/8672 the update isn't trivial though

I'm taking a stab at upgrading to Hermes.

@annadanchenko looking at progress on this issue, I'm guessing we won't be ready to release v0.14 with these items in time for our iOS cut-off. wdyt? Should we prepare comms with @j-zerah to explain the need to re-release v0.13 for iOS?

I would suggest we call it v0.13.1 (assuming 0.13 was latest).

i'll try to build simple clojure RN app with hermes

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andytudhope picture andytudhope  路  4Comments

yevh-berdnyk picture yevh-berdnyk  路  4Comments

lukaszfryc picture lukaszfryc  路  3Comments

rachelhamlin picture rachelhamlin  路  3Comments

denis-sharypin picture denis-sharypin  路  4Comments