_Type_: Bug
_Summary_: When user logs in to account or opens the app from the background he may receive messages from offline inbox. App UI freezes while offline inbox messages arrive in 1-1/group or public chat. It depends from device: while receiving 100 messages almost has no impact for iPhoneX users, - Galaxy Note 4 unresponsive in navigating through app views for ~5 seconds for the same amount of messages.
500 offline inbox messages receiving the Note 4 freeze app for ~20 seconds
No time lags while navigating through different app screens right after user logged in
Time lags while navigating through different app screens right after user logged in
Precondition:
Contact exists and joined to public chat.
Contact logged out of state.
100+ messages have been sent in the public chat
Precondition:
Contact exists and joined to public chat.
Contact logged out of state.
100+ messages have been sent in the public chat
TF logs and video for Scenario 1 above (user uses Galaxy Note 4 and receives 500 messages in public chat after log in) - app was unresponsive for ~30 seconds: 00:20 - 01:00 -> https://app.testfairy.com/projects/4803622-status/builds/8262301/sessions/41/?accessToken=IrdTLg4OeDBoTd0J0SugtsBoBug
TF logs and video for Scenario 2 above (user uses Galaxy Note 4, opens the app from the background and receives 200 messages. App freeze ~10 seconds) 10:00 - end https://app.testfairy.com/projects/4803622-status/builds/8262301/sessions/44/?accessToken=Kv38otfwWzdvWlCAwHkkFlYdSWk
@Serhy can you please re-test this one?
@mandrigin, @janherich, @rasom
Since 1 month, when the issue initially reported, the app got some improvements in relations to performance.
However, even this issue has not gone - the time lag durations / app freezing shortened for ~30% comparing to 1 month ago stats.
Latest nightly (434th Jenkins build)
When a user on Galaxy Note 4 receives:
100 messages from offline inbox in 1-1 chat (login in to account or open an app from background) - app freezes for ~5 seconds (average from 5 runs)
500 messages from offline inbox in public channel - app freezes for 17-20 seconds (average from 5 tests, - not very precise numbers because it's not very obvious when the app start freezing so ±10% margin)
@mandrigin @Serhy I discovered one very inefficient thing we were doing related to saving messages - I'm improving on that in the #4664 PR, so I expect that this issue could be re-evaluated in scope of that PR testing.
In its current state, is this a strict beta blocker? Or is it more something we'd really want to (keep) improving?
If it's the latter, how about we remove it from beta milestone but keep working on it (barring any other beta blockers)?
I don't think it is a strict beta blocker, tbh.
@Serhy what do you think?
@Serhy as #4644 was recently merged with potential to improve the situation, it would be good to re-test and confirm improvements (if any)
@mandrigin , @janherich With the latest develop build including #4644 improvements:
When contact who uses Galaxy Note 4 joins the public chat and receives 500 messages from offline inbox - app freezes up to ~15 seconds.
It indeed looks better than before (without #4644) for me.
I don't think we should consider it as a Beta blocker. Yes, app lags when I receive offline messages but it's noticeable on not really powerful devices and in the most cases when a contact joins public chat like #status with a large history that is fetched on last 7 days.
No, not a Beta blocker but something we still need to address and optimise.
13th of June nightly build quick stats:
Samsung Galaxy Note 4:
Samsung Galaxy J7:
iPhoneX:
@Serhy @janherich @mandrigin
we had a look at some logs and there are quite a few mixpanel events sent (2-3 minutes of freezing might be due to that, but that's very rough guess, might not be the case). It's worth testing this with Send/Receive ratio disabled (enabled by default I believe in nightlies), to understand if it has a performance impact.
Worth considering batching every x seconds in general, as you would do for stats.
yeah, let's build a release version of the app. We can probably use https://jenkins.status.im/job/status-react/job/parametrized_build/
Is this still a problem @lukaszfryc, or can we close this ?
@Serhy @lukaszfryc Still an issue or can we close?
@janherich @chadyj let me check it tomorrow
Re-testing it now in scope of PR #6372
@Serhy it would be nice to re-test it again :)
@mandrigin I've tested with Galaxy note 4 on 0.9.33 Develop nightly (24rd of January) receiving 500 messages and 100 messages in public chat and it looks better by now than before.
Samsung Galaxy Note 4:
User with iPhoneX joins public chat and receives 500 messages -> app freeze ~2-3 second.
Galaxy Note 4 with Android 6.0.1 was the weakest device we agreed to support in the past (agreed about 1 year ago).
So by this moment it's still an issue, but performance with 0.9.33 Develop (loglevel setting = Disabled) is 25-30% better than with Release 0.9.19
@Serhy thanks a lot! so, let's keep it open then
➤ Hester Bruikman commented:
No parsing of messages in the UI. Protocol move to Go. Request only what's seen on screen. Adam is working on this atm. RPC API.
No fetching messages through Web3 library.
Messages stored in Status Go.
No ETA > Igor Mandrigin can you give a worst case indication?
No Blockers
Next steps
All from above comment is done, but we still have UI freeze, we need to investigate if status-go might be a reason for this freeze
@flexsurfer just thinking aloud, but it seems like the problem is more likely in status-react or between status-react and status-go.
Because of how whisper works, you will receive messages from any 1-to-1 sent by anyone on status, so theoretically if the problem was in status-go solely, then we should be able to replicate on account C by sending 200 messages between A & B (C will receive them and try to decrypt them, but will fail to do so, so it's a bit less work, but I would expect decrypting messages is the heaviest workload).
Regardless what we could do, is just disable the signal handling here https://github.com/status-im/status-react/blob/114b4da2d9437785f3a3dd4993633e19ef80c22b/src/status_im/signals/core.cljs#L58
and try again, if the problem persist is most likely in status-go, if peformance are ok, then is status-react where the bottleneck is.
What do you think?
@flexsurfer @cammellos I've commented away this line but still see a delay. Tried on #hugechat111 - 200 messages.
@siphiuel you don't see any message coming through once you comment out that line, correct? (i.e chat is empty)
@cammellos yes it is empty (screenshot: https://monosnap.com/file/jgWijRZ2PgeZLiyXuprXU4D73zYNsO). I'll also try on a device once #9220 build is ready
I've checked freezing time of latest nightly develop and build with commented out message.new signal (build from #9220 ) on Galaxy Note 4
New user having no chats, - joins public chat and receives 100 messages in public chat
| Build | receiving 100 messages from public chat freezes app up to |
| --- | --- |
| 0.9.33 Develop (24rd of January) (from above comments) | ~3 sec |
| 0.14.0 Develop (16th of October) | ~3 sec (+/- 0.2sec) |
| no message.new signal # 9220 PR build | ~3 sec (+/- 0.2sec) |
Ok, so looks like the issue is most likely in status-go, I will have a look at it, thanks!
@cammellos before looking into status-go, which does not make sense to me because the UI is freezing, I think you should check this:
public void handleSignal(String jsonEvent) {
Log.d(TAG, "Signal event: " + jsonEvent);
WritableMap params = Arguments.createMap();
params.putString("jsonEvent", jsonEvent);
this.getReactApplicationContext().getJSModule(DeviceEventManagerModule.RCTDeviceEventEmitter.class).emit("gethEvent", params);
}
this involves:
[:signals/signal-received (.-jsonEvent %)] on every signal(types/json->clj event-str) even when you comment out processing the messagesso I wouldn't be surprised if the UI is still choking on big amount of message received because of the conversion. I might be wrong here but I think it is worth checking
makes sense, good point
@Serhy would you mind trying the same with https://github.com/status-im/status-react/pull/9230 (still building for now, it should be build #3).
You will not see any message (same as above), we just need to make sure that we repopulate the chat.
Thanks!
@cammellos with Android #4th build from #9230 (https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-191017-142532-fa00f5-pr-universal.apk)
I don't receive any message at all. Neither after going back online from offline state nor when both devices are online and exchange messages in public chat (however user with nightly build receives messages from user with #9230th PR build)
From the good news - there are no any lags for original issue :) - behavior is same to when all chats are empty
@Serhy that's expected, I disabled receiving messages. Did you join a chat with lots of messages posted today? (A chat that gives you the same problem )
Also https://github.com/status-im/status-react/pull/9220 should also not receive messages, is that correct?
I might dm you as I have a few questions.
Thanks!
Yes, I joined chat with 100 messages posted today (not exactly the same chat that were tested before though).
And yes, no messages were in #9220 but there was app freezes.
Ok, so looks like the bottle neck is the actually parsing of json messages, I will have a look at that, I wil push another build in the meantime to pin point better the issue
@cammellos it might be worth trying transit reader, I've experienced faster parsing than with js/JSON. However that may require more work afterword to make it work properly because chances are it will parse all the way down including content itself.
@yenda thanks, we will have a proper discussion about changing the format (discuss post?), so we consider all the options, I won't make any decision now, but I will verify if it's actually faster using a different format first
@cammellos I'm not talking about changing the format but just using transit decoder on the json itself, because transit is json.
oh I see, you are just basically saying to just use a different json decoder, sure we can try that
PR is here:
https://github.com/status-im/status-react/pull/9263#pullrequestreview-305789984 , this one was measured to have 40/45% improvement over current develop.
I copy below the findings:
Looking at the performance of the app when receiving messages through the profiler, we can see that the bottleneck is (as we already know) in the JS thread, which is hogged for the whole time preventing any input from the user to be handled, therefore "freezing" the app.
Looking a bit deeper into it, we can split the workload in two sections (events):
The actual receiving of the raw messages from status-go https://github.com/status-im/status-react/blob/792df3276d806eb92089d8c5e7504072c146f0a6/src/status_im/signals/core.cljs#L61 messages.new event.
The processing of the actual Text Messages https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/events.cljs#L638
Both events should be handle in 16ms for keeping the ui responsive.
Investigation showed that neither or them is close to that.
With regard of the first event the bottlenecks identified where:
1) Conversion from js->clj, this was by far the most expensive operation, and it has been addressed by keeping the payload in # js.
2) Calling hexToUtf8, this was not necessary as we can just provide the string version from status-go at no cost.
3) Transit parsing, to a minor extent (not changed)
4) Spec validation is expensive (not changed)
In the second part of the event (currently heavier then the first part), the most time is spent running the following code (roughly in order of performance):
First we should be as already discussed moving most of the code to status-go, that would be the natural first step. Before we do that, it would be though useful to simplify the message handling code and verify that if we ditch some performance improvements (say we don't use message groups anymore, but we get an already sorted list from status-go), we have still an overall positive effect.
Another thing is that we currently naively save messages in the db (including recalculating groups etc), even if the user is not on that chat (or the message is not in the current pagination window), which is wasteful.
As discussed with @yenda there's also a fair amount of regex parsing for each message received (this is more problematic as the longer the message is), my suggestion would be to parse messages on the sender side, and send metadata with each message. This might incur in higher bandwidth cost, but has some benefits:
1) Better perf. on receiving a message
2) Separate the way a user inputs a message from how it is rendered, allowing us to add styling/features more easily
For example:
A message that is composed as:
Hey @nobody **this** is a www.link.com or check this #channel
Would be sent across the wire as something like (copying twitter api, which does something similar, also making it verbose, but for some fields the bandwidth can be the same, i.e bold , two integers, for asterisks):
{:text "Hey @nobody this is a www.link.com or check this #channel"
:entities {:mentions [[x,y,pk]], :links [[x,y]], :channels [[x,y,channel]], :bold [[x,y]]}
Changing input from markdown to html for example will not change the metadata associated, and there's a bit of a better upgrade path (as markings are stripped, if a client for example can't see bold it won't see **, it will just be shown as plain text)
But I guess that's something for later.
Another thing that we should be doing is to split the workload in manageable chunks.
Currently we just process whatever the amount of messages is received in one go, i.e (200, 300, 500 etc).
This gives us the shortest time to process messages, but it's detrimental to the user experience as it's definitely hogging the JS thread and results in the app freezing.
We should probably split off the messages in batches that can be processed in less than 16ms and dispatch after each chunk https://github.com/day8/re-frame/blob/69a3fcd411369fff65404e7d6a924ce1968e2dba/docs/Solve-the-CPU-hog-problem.md , as described.
I have done that but the results were mixed. On a more powerful device, I saw a slow down (but no real freeze) after pulling 700 messages in one go, which was promising, compared to 10 seconds of freezing. It took much longer though to process the messages overall (roughly a minute or so), but there's room for improvement (see points above). From inspection I noticed that the initial event was mostly under 16ms, while the second event would often go above (~100ms). Different batch sizes were tried, 1 is the one that gave the smoothest experience.
On a slower device though, the UI completely choked as events took much longer than 16ms, so this is something that we should revisit.
We should move stuff to status-go, and at the same time simplify and remove computations that can be avoided.
On my side, I will address some of these issues as I work on related code for task in scope for v1.
PR with 40/45% improvements has now been merged, I will not actively work on this for now probably as there are diminished returns, but will keep moving stuff to status-go in the scope of other tasks that should help with this, unless there are objections.
Great stuff @cammellos!
Most helpful comment
PR is here:
https://github.com/status-im/status-react/pull/9263#pullrequestreview-305789984 , this one was measured to have 40/45% improvement over current develop.
I copy below the findings:
Findings
Looking at the performance of the app when receiving messages through the profiler, we can see that the bottleneck is (as we already know) in the JS thread, which is hogged for the whole time preventing any input from the user to be handled, therefore "freezing" the app.
Looking a bit deeper into it, we can split the workload in two sections (events):
The actual receiving of the raw messages from status-go https://github.com/status-im/status-react/blob/792df3276d806eb92089d8c5e7504072c146f0a6/src/status_im/signals/core.cljs#L61
messages.newevent.The processing of the actual Text Messages https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/events.cljs#L638
Both events should be handle in 16ms for keeping the ui responsive.
Investigation showed that neither or them is close to that.
With regard of the first event the bottlenecks identified where:
1) Conversion from
js->clj, this was by far the most expensive operation, and it has been addressed by keeping the payload in# js.2) Calling
hexToUtf8, this was not necessary as we can just provide thestringversion fromstatus-goat no cost.3) Transit parsing, to a minor extent (not changed)
4) Spec validation is expensive (not changed)
In the second part of the event (currently heavier then the first part), the most time is spent running the following code (roughly in order of performance):
https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L276
https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L277
https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L264
https://github.com/status-im/status-react/blob/474ff00c7f0ebab86bda26be6b77faee26aa8f73/src/status_im/chat/models/message.cljs#L282
How to address
First we should be as already discussed moving most of the code to status-go, that would be the natural first step. Before we do that, it would be though useful to simplify the message handling code and verify that if we ditch some performance improvements (say we don't use message groups anymore, but we get an already sorted list from status-go), we have still an overall positive effect.
Another thing is that we currently naively save messages in the db (including recalculating groups etc), even if the user is not on that chat (or the message is not in the current pagination window), which is wasteful.
As discussed with @yenda there's also a fair amount of regex parsing for each message received (this is more problematic as the longer the message is), my suggestion would be to parse messages on the sender side, and send metadata with each message. This might incur in higher bandwidth cost, but has some benefits:
1) Better perf. on receiving a message
2) Separate the way a user inputs a message from how it is rendered, allowing us to add styling/features more easily
For example:
A message that is composed as:
Would be sent across the wire as something like (copying twitter api, which does something similar, also making it verbose, but for some fields the bandwidth can be the same, i.e bold , two integers, for asterisks):
Changing input from markdown to html for example will not change the metadata associated, and there's a bit of a better upgrade path (as markings are stripped, if a client for example can't see bold it won't see
**, it will just be shown as plain text)But I guess that's something for later.
Another thing that we should be doing is to split the workload in manageable chunks.
Currently we just process whatever the amount of messages is received in one go, i.e (200, 300, 500 etc).
This gives us the shortest time to process messages, but it's detrimental to the user experience as it's definitely hogging the JS thread and results in the app freezing.
We should probably split off the messages in batches that can be processed in less than 16ms and dispatch after each chunk https://github.com/day8/re-frame/blob/69a3fcd411369fff65404e7d6a924ce1968e2dba/docs/Solve-the-CPU-hog-problem.md , as described.
I have done that but the results were mixed. On a more powerful device, I saw a slow down (but no real freeze) after pulling 700 messages in one go, which was promising, compared to 10 seconds of freezing. It took much longer though to process the messages overall (roughly a minute or so), but there's room for improvement (see points above). From inspection I noticed that the initial event was mostly under 16ms, while the second event would often go above (~100ms). Different batch sizes were tried, 1 is the one that gave the smoothest experience.
On a slower device though, the UI completely choked as events took much longer than 16ms, so this is something that we should revisit.
TLDR
We should move stuff to status-go, and at the same time simplify and remove computations that can be avoided.
On my side, I will address some of these issues as I work on related code for task in scope for v1.