Cht-core: Loading the Contact list is slower in 3.5.0

Created on 26 Jun 2019 · 18Comments · Source: medic/cht-core

Describe the bug
Performance timings (stopwatch) are slower now when loading the contacts tab on all devices (chrome desktop, tecno wx3, and J5 in the mobile app)

To Reproduce
Steps to reproduce the behavior:

Install 3.5.0.beta.2
Seed with data
Log in as restricted user from seed data
Click on contacts tab. Start Timer here.

Expected behavior
When the loading finishes on RHS of desktop this is the end. While on mobile app the contact list displays.
Time has increased ~5-6 seconds up from ~1-2 seconds

Environment

Instance: gamma.dev.medicmobile.org
Browser: Chrome, Mobile App
Client platform: Linux, Android
App: webapp
Version: 3.5.0.beta.2

Details on times here.
https://docs.google.com/spreadsheets/d/13GW_gpZElcmW9AOs5r1EXwKligCDECdHVP_3aNcwqhw/edit#gid=0

2 - Medium Performance

Source

newtewt

All 18 comments

The tests are being rerun so I've blocked this awaiting confirmation of the issue.

garethbowen on 27 Jun 2019

Looking forward to on-device numbers from @newtewt. But here are some numbers from my desktop (all measures in milliseconds).

Loading the List of Contacts (LHS)

Version | 1 | 2 | 3 | 4 | 5 | Mean
-- | -- | -- | -- | -- | -- | --
3.4.x | 21.2 | 21.9 | 21.8 | 24.9 | 28.3 | 23.62
3.5.x | 26.1 | 23 | 24.7 | 26.5 | 25.6 | 25.18

Does appear slightly slower, but not a statistically significant performance difference.

Loading a Single Contact (RHS)

This is the time from contact selection, to loading and rendering the last set of data for the selected contact (the contact summary). I do see a statistically significant slow-down here, pretty consistent across all sampled clinics (big and small).

On 3.4.x:

Sample | Clinic_1 | Clinic_2 | Clinic_3 | Clinic_4 | Clinic_5 | Clinic_6
-- | -- | -- | -- | -- | -- | --
1 | 110 | 206 | 336 | 941 | 1854 | 2240
2 | 123 | 219 | 369 | 844 | 1767 | 2634
3 | 102 | 110 | 408 | 847 | 1852 | 2301
Mean | 111.7 | 178.3 | 371.0 | 877.3 | 1824.3 | 2391.7

On 3.5.x:

Sample | Clinic_1 | Clinic_2 | Clinic_3 | Clinic_4 | Clinic_5 | Clinic_6
-- | -- | -- | -- | -- | -- | --
1 | 189 | 227 | 512 | 1102 | 2297 | 2507
2 | 208 | 259 | 461 | 996 | 2113 | 2637
3 | 205 | 251 | 515 | 989 | 2049 | 2462
Mean | 200.7 | 245.7 | 496.0 | 1029.0 | 2153.0 | 2535.3

What is being measured

I'm doing a clean dev build of the branches 3.4.x and 3.5.x. I'm using the seed data used by the performance test suite logged in as ac1.

LHS is the time to resolve the promise in the _query function of the contacts controller.
RHS is the time to resolve the promise in the setSelected function of the contacts controller.

kennsippell on 27 Jun 2019

@kenn Do you feel this is worth blocking 3.5.0? It seems kinda rough to me.

newtewt on 27 Jun 2019

Please mention the name correctly ;)

kenn on 27 Jun 2019

@kenn sweet name

kennsippell on 27 Jun 2019

Ha! Whoops sorry!

newtewt on 27 Jun 2019

😄1

I believe this issue was caused here where we introduce a delayed call to ContactViewModelGenerator.loadChildren.

Based on offline discussions, this will not block 3.5. But it is a fairly serious regression on lower-end devices.

kennsippell on 27 Jun 2019

Here are some updated measures for the left-side, which are the numbers reported from release testing. I see no statistically significant regression.

Desktop | 3.4 | 3.5
-- | -- | --
1 | 18 | 19
2 | 24 | 20
3 | 29 | 17
4 | 18 | 19
5 | 29 | 20
MEAN | 23.6 | 19

Techno F1 | 3.4 | 3.5
-- | -- | --
1 | 174 | 169
2 | 195 | 172
3 | 174 | 172
4 | 148 | 216
5 | 152 | 169
MEAN | 168.6 | 179.6

@newtewt I tested this on a dev build, but maybe this only occurs within the container or on production builds? Would you confirm the timings you're seeing one more time?

Right Side

Most of my investigation is focused on selecting a single contact (right-side), as I believe a regression exists there. I've been struggling a bit with establishing data-parity between the two builds I'm measuring.

In my first round of measurement (above), it turns out my 3.5 build had the standard config deployed and my 3.4 build did not (user error during deployment).
In my second round of measurement, it turns out the ac1 user created by the seed-data had a properly set contact_id attribute in the 3.5 dataset, but did not have it set properly on the 3.4 dataset. I'm unsure if this is a user-error or tooling issue. Investigating ...

These parity differences in the data resulted in some of the performance differences reported above. Here are my latest numbers for Tecno J1. Data is in milliseconds, a mean of three measures.

| 3.4 | 3.5 | Delta | Delta %
-- | -- | -- | -- | --
Clinic_1 | 1,122 | 1,168 | 46 | 4%
Clinic_2 | 1,235 | 1,365 | 129 | 10%
Clinic_3 | 4,115 | 4,465 | 350 | 8%
Clinic_4 | 9,988 | 10,443 | 455 | 5%
Clinic_5 | 22,108 | 24,427 | 2,319 | 10%
Clinic_6 | 28,265 | 29,417 | 1,152 | 4%

kennsippell on 28 Jun 2019

The PR above is ready for AT: 5755-contacts-perf

@newtewt I'll need your help setting up a repro of the magnitude of load times reported above. I've failed to reproduce times >10% on gamma.dev and gamma-b.dev with fresh seed data.

kennsippell on 3 Jul 2019

Went through this with @newtewt today and he helped clarify his measurement process. For each measurement, he clears the cache, resyncs data, and then measures the time-to-first-load for the tab. In contrast, I have been navigating to the tab to ensure everything is warmed and only after warming do I compare the tab's load times.

kennsippell on 3 Jul 2019

So this regression is only in the "first-load time" for the tab, and does not impact subsequent loads. It is caused by a need to warm the contacts_by_type view in 3.5, but in 3.4 this view is already warmed during bootstrap. This change is due to this #4423 as the PlaceHierarchy service warmed these views indirectly via the Contacts service.

Fixing This - I'm unsure of what to actually do here, as it is unclear what view warming should happen at startup vs what view warming should happen during the user's first experience. Should we explicitly warm all views on bootstrap - we could allow the user to click a button to "use the app before it is fully optimised"?

Alternatively, there seems to be an assumption that the RulesEngine service is responsible for view warming here. Though that isn't at all clear from reading its code that this is part of its duties. We could continue this sort of "balancing act" by updating it to warm the contacts_by_type view instead of the contacts_by_type_index_name view.

Quality Assurance -- While it is not without value to measure the first-load time, I think it is more important to monitor for regressions in not-first-load times. So I recommend that we update or replace that measurement in our release testing process. @newtewt @ngaruko

kennsippell on 4 Jul 2019

Ready for AT in 5755-dbwarm and 5755-contacts-perf.

On the 5755-dbwarm branch, would you please test for regressions in export?

kennsippell on 8 Jul 2019

@kennsippell These two branch seem slower than master on contact page, at least on first-load. The not-first-load seems a lot better but still, for the same user, it looks smoother than first-load, but not as smooth as on master. Tested on local. see screen recordings .
master.mov.zip
5755-dbwarm2.mov.zip

ngaruko on 9 Jul 2019

Thanks for testing this @ngaruko! I think the measures are a little unclear because you loaded the reports tab before loading the contacts tab when testing master. This actually ends up warming up parts of the contacts tab and speeds up the tab's first-load time. Pretty non-obvious, but that makes the comparison a bit unfair :)

If you try this on a mobile device, and navigate to the contacts tab as your first action for both control and experiment - I think you'll see the difference clearly. These were the times I measured (time of first-load for left-side panel).

Desktop | master | proposed
-- | -- | --
Sample 1 | 2379 | 44.3
Sample 2 | 2364 | 10.2
Sample 3 | 2278 | 37.2
MEAN | 2340 | 31

Tecno F1 | master | proposed
-- | -- | --
Sample 1 | 10576 | 350
Sample 2 | 10377 | 482
Sample 3 | 13062 | 339
MEAN | 11338 | 390

The branch 5755-contacts-perf should only improve second-load time and should have no impact on the first-load time.

kennsippell on 9 Jul 2019

Thanks @kenn for the clarification - I will close this and continue to monitor performance on large instances with feature phones! Will also pay some more attention during our release testing on a gamma.dev.

ngaruko on 9 Jul 2019

@kenn Can you please have a look at the conflicts on the branch before we close this ?

ngaruko on 9 Jul 2019

Please, don't mention my name. :)

kenn on 9 Jul 2019

👀1 ❤1

@ngaruko Merged conflicts on the 5755-contacts-perf branch.

kennsippell on 9 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Page Title vs. Back Title

diannakane · 6Comments

Improve application performance for high-utilization CHPs

ghost · 4Comments

Optimize the `fix-user-db-security` migration for databases with many users

SCdF · 3Comments

Telemetry could be aggregated at a configurable interval, or at configurable times

kennsippell · 3Comments

Checkboxes don't render correctly in the admin app

SCdF · 5Comments