Status-react: App freezes while offline inboxed messages are coming after log in / open app from background

Created on 10 May 2018  Â·  39Comments  Â·  Source: status-im/status-react

_Type_: Bug

_Summary_: When user logs in to account or opens the app from the background he may receive messages from offline inbox. App UI freezes while offline inbox messages arrive in 1-1/group or public chat. It depends from device: while receiving 100 messages almost has no impact for iPhoneX users, - Galaxy Note 4 unresponsive in navigating through app views for ~5 seconds for the same amount of messages.

500 offline inbox messages receiving the Note 4 freeze app for ~20 seconds

Expected behavior

No time lags while navigating through different app screens right after user logged in

Actual behavior

Time lags while navigating through different app screens right after user logged in

Reproduction 1 (For public chat and contact logs in)

Precondition:
Contact exists and joined to public chat.
Contact logged out of state.
100+ messages have been sent in the public chat

  • Open Status
  • Log in to account
  • Navigate through different screens (Profile -> Chats -> Wallet -> Chats -> 1-1 chat -> Back to Chats -> etc) and pay attention to time between appropriate view loads

Reproduction 2 (For 1-1 chat and contact went back from the background)

Precondition:
Contact exists and joined to public chat.
Contact logged out of state.
100+ messages have been sent in the public chat

  • Open Status
  • Device A: Create 1-1 chat with another user and navigate back to Home screen
  • Device A: Put the app into background
  • Device B: Send ~100 messages to Device A
  • Device A: Click on the last message notification and wait for app to open
  • Navigate between different app screens (E.g. Wallet - Profile - Home tab)

Additional Information (10th of May 2018 develop build stats)

  • Status version: Develop 942c703b / Release 0.9.18
  • Operating System: Android and iOS

TF logs and video for Scenario 1 above (user uses Galaxy Note 4 and receives 500 messages in public chat after log in) - app was unresponsive for ~30 seconds: 00:20 - 01:00 -> https://app.testfairy.com/projects/4803622-status/builds/8262301/sessions/41/?accessToken=IrdTLg4OeDBoTd0J0SugtsBoBug

TF logs and video for Scenario 2 above (user uses Galaxy Note 4, opens the app from the background and receives 200 messages. App freeze ~10 seconds) 10:00 - end https://app.testfairy.com/projects/4803622-status/builds/8262301/sessions/44/?accessToken=Kv38otfwWzdvWlCAwHkkFlYdSWk

Additional Information 2 (Updated. 10th of June 2018 develop build stats)

  • Status version: Release 0.9.19 / 7th of June Nighly builds (434 Jenkins build)
  • 100 messages from offline inbox in 1-1 chat (login in to account or open an app from background) - freezes app for ~5 seconds (average from 5 runs)
  • 500 messages from offline inbox in public channel - app freezes for 17-20 seconds (average from 5 tests, - not very precise numbers because it's not very obvious when the app start freezing so ±10% margin)
bug chat mobile-app performance

Most helpful comment

PR is here:
https://github.com/status-im/status-react/pull/9263#pullrequestreview-305789984 , this one was measured to have 40/45% improvement over current develop.

I copy below the findings:

Findings

Looking at the performance of the app when receiving messages through the profiler, we can see that the bottleneck is (as we already know) in the JS thread, which is hogged for the whole time preventing any input from the user to be handled, therefore "freezing" the app.

Looking a bit deeper into it, we can split the workload in two sections (events):

The actual receiving of the raw messages from status-go https://github.com/status-im/status-react/blob/792df3276d806eb92089d8c5e7504072c146f0a6/src/status_im/signals/core.cljs#L61 messages.new event.
The processing of the actual Text Messages https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/events.cljs#L638

Both events should be handle in 16ms for keeping the ui responsive.
Investigation showed that neither or them is close to that.

With regard of the first event the bottlenecks identified where:

1) Conversion from js->clj, this was by far the most expensive operation, and it has been addressed by keeping the payload in # js.
2) Calling hexToUtf8, this was not necessary as we can just provide the string version from status-go at no cost.
3) Transit parsing, to a minor extent (not changed)
4) Spec validation is expensive (not changed)

In the second part of the event (currently heavier then the first part), the most time is spent running the following code (roughly in order of performance):

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L276

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L277

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L264

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L282

How to address

First we should be as already discussed moving most of the code to status-go, that would be the natural first step. Before we do that, it would be though useful to simplify the message handling code and verify that if we ditch some performance improvements (say we don't use message groups anymore, but we get an already sorted list from status-go), we have still an overall positive effect.

Another thing is that we currently naively save messages in the db (including recalculating groups etc), even if the user is not on that chat (or the message is not in the current pagination window), which is wasteful.

As discussed with @yenda there's also a fair amount of regex parsing for each message received (this is more problematic as the longer the message is), my suggestion would be to parse messages on the sender side, and send metadata with each message. This might incur in higher bandwidth cost, but has some benefits:

1) Better perf. on receiving a message
2) Separate the way a user inputs a message from how it is rendered, allowing us to add styling/features more easily

For example:
A message that is composed as:

Hey @nobody **this** is a www.link.com or check this #channel

Would be sent across the wire as something like (copying twitter api, which does something similar, also making it verbose, but for some fields the bandwidth can be the same, i.e bold , two integers, for asterisks):


{:text "Hey @nobody this is a www.link.com or check this #channel"
  :entities {:mentions [[x,y,pk]], :links [[x,y]], :channels [[x,y,channel]], :bold [[x,y]]}

Changing input from markdown to html for example will not change the metadata associated, and there's a bit of a better upgrade path (as markings are stripped, if a client for example can't see bold it won't see **, it will just be shown as plain text)
But I guess that's something for later.

Another thing that we should be doing is to split the workload in manageable chunks.
Currently we just process whatever the amount of messages is received in one go, i.e (200, 300, 500 etc).
This gives us the shortest time to process messages, but it's detrimental to the user experience as it's definitely hogging the JS thread and results in the app freezing.
We should probably split off the messages in batches that can be processed in less than 16ms and dispatch after each chunk https://github.com/day8/re-frame/blob/69a3fcd411369fff65404e7d6a924ce1968e2dba/docs/Solve-the-CPU-hog-problem.md , as described.
I have done that but the results were mixed. On a more powerful device, I saw a slow down (but no real freeze) after pulling 700 messages in one go, which was promising, compared to 10 seconds of freezing. It took much longer though to process the messages overall (roughly a minute or so), but there's room for improvement (see points above). From inspection I noticed that the initial event was mostly under 16ms, while the second event would often go above (~100ms). Different batch sizes were tried, 1 is the one that gave the smoothest experience.

On a slower device though, the UI completely choked as events took much longer than 16ms, so this is something that we should revisit.

TLDR

We should move stuff to status-go, and at the same time simplify and remove computations that can be avoided.
On my side, I will address some of these issues as I work on related code for task in scope for v1.

All 39 comments

@Serhy can you please re-test this one?

@mandrigin, @janherich, @rasom
Since 1 month, when the issue initially reported, the app got some improvements in relations to performance.
However, even this issue has not gone - the time lag durations / app freezing shortened for ~30% comparing to 1 month ago stats.

Latest nightly (434th Jenkins build)
When a user on Galaxy Note 4 receives:
100 messages from offline inbox in 1-1 chat (login in to account or open an app from background) - app freezes for ~5 seconds (average from 5 runs)

500 messages from offline inbox in public channel - app freezes for 17-20 seconds (average from 5 tests, - not very precise numbers because it's not very obvious when the app start freezing so ±10% margin)

@mandrigin @Serhy I discovered one very inefficient thing we were doing related to saving messages - I'm improving on that in the #4664 PR, so I expect that this issue could be re-evaluated in scope of that PR testing.

In its current state, is this a strict beta blocker? Or is it more something we'd really want to (keep) improving?

If it's the latter, how about we remove it from beta milestone but keep working on it (barring any other beta blockers)?

I don't think it is a strict beta blocker, tbh.
@Serhy what do you think?

@Serhy as #4644 was recently merged with potential to improve the situation, it would be good to re-test and confirm improvements (if any)

@mandrigin , @janherich With the latest develop build including #4644 improvements:

When contact who uses Galaxy Note 4 joins the public chat and receives 500 messages from offline inbox - app freezes up to ~15 seconds.
It indeed looks better than before (without #4644) for me.

I don't think we should consider it as a Beta blocker. Yes, app lags when I receive offline messages but it's noticeable on not really powerful devices and in the most cases when a contact joins public chat like #status with a large history that is fetched on last 7 days.

  • we aware user by 'Fetching messages...' label messages fetch that something is going on, so contact does not experience "unknown" lags.

No, not a Beta blocker but something we still need to address and optimise.

13th of June nightly build quick stats:

Samsung Galaxy Note 4:

  • user receives 500 messages in public chat -> app freezes for ~15 secods
  • user receives in 14 separate 1-1 chats ~8000 messages -> app crashes on 4th-5th minute. Re-open and login to the same account shows about 3500 messages received in from 4 users. Rest chats and messages hasn't come.

Samsung Galaxy J7:

  • user receives 500 messages in public chat -> app frozen for ~10-11 seconds
  • receives ~8000 messages in 14 different 1-1 chats -> freezes the app for ~10 minutes

iPhoneX:

  • user receives 500 messages in public chat -> app frozen for ~3 seconds
  • user receives ~8000 messages in 14 different 1-1 chats -> freezes the app for ~2.5 minute.

@Serhy @janherich @mandrigin
we had a look at some logs and there are quite a few mixpanel events sent (2-3 minutes of freezing might be due to that, but that's very rough guess, might not be the case). It's worth testing this with Send/Receive ratio disabled (enabled by default I believe in nightlies), to understand if it has a performance impact.

Worth considering batching every x seconds in general, as you would do for stats.

yeah, let's build a release version of the app. We can probably use https://jenkins.status.im/job/status-react/job/parametrized_build/

Is this still a problem @lukaszfryc, or can we close this ?

@Serhy @lukaszfryc Still an issue or can we close?

@janherich @chadyj let me check it tomorrow

Re-testing it now in scope of PR #6372

@Serhy it would be nice to re-test it again :)

@mandrigin I've tested with Galaxy note 4 on 0.9.33 Develop nightly (24rd of January) receiving 500 messages and 100 messages in public chat and it looks better by now than before.

Samsung Galaxy Note 4:

  • New user having no chats, - joins public chat and receives 500 messages -> app freezes for ~13-15 seconds
  • New user having no chats, - joins public chat and receives 100 messages in public chat -> app freezes for ~3 second

User with iPhoneX joins public chat and receives 500 messages -> app freeze ~2-3 second.

Galaxy Note 4 with Android 6.0.1 was the weakest device we agreed to support in the past (agreed about 1 year ago).

So by this moment it's still an issue, but performance with 0.9.33 Develop (loglevel setting = Disabled) is 25-30% better than with Release 0.9.19

@Serhy thanks a lot! so, let's keep it open then

➤ Hester Bruikman commented:

No parsing of messages in the UI. Protocol move to Go. Request only what's seen on screen. Adam is working on this atm. RPC API.
No fetching messages through Web3 library.
Messages stored in Status Go.

No ETA > Igor Mandrigin can you give a worst case indication?
No Blockers

Next steps

  • Structure API for limited activity by Clojure
  • Push message events
  • New methods > mapped to existing objects in Clojure (leaves ca. 2 weeks for updates on Cloj side).

All from above comment is done, but we still have UI freeze, we need to investigate if status-go might be a reason for this freeze

@flexsurfer just thinking aloud, but it seems like the problem is more likely in status-react or between status-react and status-go.

Because of how whisper works, you will receive messages from any 1-to-1 sent by anyone on status, so theoretically if the problem was in status-go solely, then we should be able to replicate on account C by sending 200 messages between A & B (C will receive them and try to decrypt them, but will fail to do so, so it's a bit less work, but I would expect decrypting messages is the heaviest workload).

Regardless what we could do, is just disable the signal handling here https://github.com/status-im/status-react/blob/114b4da2d9437785f3a3dd4993633e19ef80c22b/src/status_im/signals/core.cljs#L58
and try again, if the problem persist is most likely in status-go, if peformance are ok, then is status-react where the bottleneck is.
What do you think?

@flexsurfer @cammellos I've commented away this line but still see a delay. Tried on #hugechat111 - 200 messages.

@siphiuel you don't see any message coming through once you comment out that line, correct? (i.e chat is empty)

@cammellos yes it is empty (screenshot: https://monosnap.com/file/jgWijRZ2PgeZLiyXuprXU4D73zYNsO). I'll also try on a device once #9220 build is ready

I've checked freezing time of latest nightly develop and build with commented out message.new signal (build from #9220 ) on Galaxy Note 4
New user having no chats, - joins public chat and receives 100 messages in public chat

| Build | receiving 100 messages from public chat freezes app up to |
| --- | --- |
| 0.9.33 Develop (24rd of January) (from above comments) | ~3 sec |
| 0.14.0 Develop (16th of October) | ~3 sec (+/- 0.2sec) |
| no message.new signal # 9220 PR build | ~3 sec (+/- 0.2sec) |

Ok, so looks like the issue is most likely in status-go, I will have a look at it, thanks!

@cammellos before looking into status-go, which does not make sense to me because the UI is freezing, I think you should check this:

    public void handleSignal(String jsonEvent) {
        Log.d(TAG, "Signal event: " + jsonEvent);
        WritableMap params = Arguments.createMap();
        params.putString("jsonEvent", jsonEvent);
        this.getReactApplicationContext().getJSModule(DeviceEventManagerModule.RCTDeviceEventEmitter.class).emit("gethEvent", params);
    }

this involves:

  • an event dispatch [:signals/signal-received (.-jsonEvent %)] on every signal
  • a json to clj conversion (types/json->clj event-str) even when you comment out processing the messages

so I wouldn't be surprised if the UI is still choking on big amount of message received because of the conversion. I might be wrong here but I think it is worth checking

makes sense, good point

@Serhy would you mind trying the same with https://github.com/status-im/status-react/pull/9230 (still building for now, it should be build #3).
You will not see any message (same as above), we just need to make sure that we repopulate the chat.

Thanks!

@cammellos with Android #4th build from #9230 (https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-191017-142532-fa00f5-pr-universal.apk)
I don't receive any message at all. Neither after going back online from offline state nor when both devices are online and exchange messages in public chat (however user with nightly build receives messages from user with #9230th PR build)
From the good news - there are no any lags for original issue :) - behavior is same to when all chats are empty

@Serhy that's expected, I disabled receiving messages. Did you join a chat with lots of messages posted today? (A chat that gives you the same problem )

Also https://github.com/status-im/status-react/pull/9220 should also not receive messages, is that correct?
I might dm you as I have a few questions.
Thanks!

Yes, I joined chat with 100 messages posted today (not exactly the same chat that were tested before though).
And yes, no messages were in #9220 but there was app freezes.

Ok, so looks like the bottle neck is the actually parsing of json messages, I will have a look at that, I wil push another build in the meantime to pin point better the issue

@cammellos it might be worth trying transit reader, I've experienced faster parsing than with js/JSON. However that may require more work afterword to make it work properly because chances are it will parse all the way down including content itself.

@yenda thanks, we will have a proper discussion about changing the format (discuss post?), so we consider all the options, I won't make any decision now, but I will verify if it's actually faster using a different format first

@cammellos I'm not talking about changing the format but just using transit decoder on the json itself, because transit is json.

oh I see, you are just basically saying to just use a different json decoder, sure we can try that

PR is here:
https://github.com/status-im/status-react/pull/9263#pullrequestreview-305789984 , this one was measured to have 40/45% improvement over current develop.

I copy below the findings:

Findings

Looking at the performance of the app when receiving messages through the profiler, we can see that the bottleneck is (as we already know) in the JS thread, which is hogged for the whole time preventing any input from the user to be handled, therefore "freezing" the app.

Looking a bit deeper into it, we can split the workload in two sections (events):

The actual receiving of the raw messages from status-go https://github.com/status-im/status-react/blob/792df3276d806eb92089d8c5e7504072c146f0a6/src/status_im/signals/core.cljs#L61 messages.new event.
The processing of the actual Text Messages https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/events.cljs#L638

Both events should be handle in 16ms for keeping the ui responsive.
Investigation showed that neither or them is close to that.

With regard of the first event the bottlenecks identified where:

1) Conversion from js->clj, this was by far the most expensive operation, and it has been addressed by keeping the payload in # js.
2) Calling hexToUtf8, this was not necessary as we can just provide the string version from status-go at no cost.
3) Transit parsing, to a minor extent (not changed)
4) Spec validation is expensive (not changed)

In the second part of the event (currently heavier then the first part), the most time is spent running the following code (roughly in order of performance):

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L276

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L277

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L264

https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L282

How to address

First we should be as already discussed moving most of the code to status-go, that would be the natural first step. Before we do that, it would be though useful to simplify the message handling code and verify that if we ditch some performance improvements (say we don't use message groups anymore, but we get an already sorted list from status-go), we have still an overall positive effect.

Another thing is that we currently naively save messages in the db (including recalculating groups etc), even if the user is not on that chat (or the message is not in the current pagination window), which is wasteful.

As discussed with @yenda there's also a fair amount of regex parsing for each message received (this is more problematic as the longer the message is), my suggestion would be to parse messages on the sender side, and send metadata with each message. This might incur in higher bandwidth cost, but has some benefits:

1) Better perf. on receiving a message
2) Separate the way a user inputs a message from how it is rendered, allowing us to add styling/features more easily

For example:
A message that is composed as:

Hey @nobody **this** is a www.link.com or check this #channel

Would be sent across the wire as something like (copying twitter api, which does something similar, also making it verbose, but for some fields the bandwidth can be the same, i.e bold , two integers, for asterisks):


{:text "Hey @nobody this is a www.link.com or check this #channel"
  :entities {:mentions [[x,y,pk]], :links [[x,y]], :channels [[x,y,channel]], :bold [[x,y]]}

Changing input from markdown to html for example will not change the metadata associated, and there's a bit of a better upgrade path (as markings are stripped, if a client for example can't see bold it won't see **, it will just be shown as plain text)
But I guess that's something for later.

Another thing that we should be doing is to split the workload in manageable chunks.
Currently we just process whatever the amount of messages is received in one go, i.e (200, 300, 500 etc).
This gives us the shortest time to process messages, but it's detrimental to the user experience as it's definitely hogging the JS thread and results in the app freezing.
We should probably split off the messages in batches that can be processed in less than 16ms and dispatch after each chunk https://github.com/day8/re-frame/blob/69a3fcd411369fff65404e7d6a924ce1968e2dba/docs/Solve-the-CPU-hog-problem.md , as described.
I have done that but the results were mixed. On a more powerful device, I saw a slow down (but no real freeze) after pulling 700 messages in one go, which was promising, compared to 10 seconds of freezing. It took much longer though to process the messages overall (roughly a minute or so), but there's room for improvement (see points above). From inspection I noticed that the initial event was mostly under 16ms, while the second event would often go above (~100ms). Different batch sizes were tried, 1 is the one that gave the smoothest experience.

On a slower device though, the UI completely choked as events took much longer than 16ms, so this is something that we should revisit.

TLDR

We should move stuff to status-go, and at the same time simplify and remove computations that can be avoided.
On my side, I will address some of these issues as I work on related code for task in scope for v1.

PR with 40/45% improvements has now been merged, I will not actively work on this for now probably as there are diminished returns, but will keep moving stuff to status-go in the scope of other tasks that should help with this, unless there are objections.

Great stuff @cammellos!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Serhy picture Serhy  Â·  3Comments

rachelhamlin picture rachelhamlin  Â·  3Comments

flexsurfer picture flexsurfer  Â·  3Comments

yevh-berdnyk picture yevh-berdnyk  Â·  4Comments

mfekadu picture mfekadu  Â·  3Comments