Describe the bug
Performance timings (stopwatch) are slower now when loading the contacts tab on all devices (chrome desktop, tecno wx3, and J5 in the mobile app)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When the loading finishes on RHS of desktop this is the end. While on mobile app the contact list displays.
Time has increased ~5-6 seconds up from ~1-2 seconds
Environment
Details on times here.
https://docs.google.com/spreadsheets/d/13GW_gpZElcmW9AOs5r1EXwKligCDECdHVP_3aNcwqhw/edit#gid=0
The tests are being rerun so I've blocked this awaiting confirmation of the issue.
Looking forward to on-device numbers from @newtewt. But here are some numbers from my desktop (all measures in milliseconds).
Version | 1 | 2 | 3 | 4 | 5 | Mean
-- | -- | -- | -- | -- | -- | --
3.4.x | 21.2 | 21.9 | 21.8 | 24.9 | 28.3 | 23.62
3.5.x | 26.1 | 23 | 24.7 | 26.5 | 25.6 | 25.18
Does appear slightly slower, but not a statistically significant performance difference.
This is the time from contact selection, to loading and rendering the last set of data for the selected contact (the contact summary). I do see a statistically significant slow-down here, pretty consistent across all sampled clinics (big and small).
On 3.4.x:
Sample | Clinic_1 | Clinic_2 | Clinic_3 | Clinic_4 | Clinic_5 | Clinic_6
-- | -- | -- | -- | -- | -- | --
1 | 110 | 206 | 336 | 941 | 1854 | 2240
2 | 123 | 219 | 369 | 844 | 1767 | 2634
3 | 102 | 110 | 408 | 847 | 1852 | 2301
Mean | 111.7 | 178.3 | 371.0 | 877.3 | 1824.3 | 2391.7
On 3.5.x:
Sample | Clinic_1 | Clinic_2 | Clinic_3 | Clinic_4 | Clinic_5 | Clinic_6
-- | -- | -- | -- | -- | -- | --
1 | 189 | 227 | 512 | 1102 | 2297 | 2507
2 | 208 | 259 | 461 | 996 | 2113 | 2637
3 | 205 | 251 | 515 | 989 | 2049 | 2462
Mean | 200.7 | 245.7 | 496.0 | 1029.0 | 2153.0 | 2535.3
I'm doing a clean dev build of the branches 3.4.x and 3.5.x. I'm using the seed data used by the performance test suite logged in as ac1.
LHS is the time to resolve the promise in the _query function of the contacts controller.
RHS is the time to resolve the promise in the setSelected function of the contacts controller.
@kenn Do you feel this is worth blocking 3.5.0? It seems kinda rough to me.
Please mention the name correctly ;)
@kenn sweet name
Ha! Whoops sorry!
I believe this issue was caused here where we introduce a delayed call to ContactViewModelGenerator.loadChildren.
Based on offline discussions, this will not block 3.5. But it is a fairly serious regression on lower-end devices.
Here are some updated measures for the left-side, which are the numbers reported from release testing. I see no statistically significant regression.
Desktop | 3.4 | 3.5
-- | -- | --
1 | 18 | 19
2 | 24 | 20
3 | 29 | 17
4 | 18 | 19
5 | 29 | 20
MEAN | 23.6 | 19
Techno F1 | 3.4 | 3.5
-- | -- | --
1 | 174 | 169
2 | 195 | 172
3 | 174 | 172
4 | 148 | 216
5 | 152 | 169
MEAN | 168.6 | 179.6
@newtewt I tested this on a dev build, but maybe this only occurs within the container or on production builds? Would you confirm the timings you're seeing one more time?
Most of my investigation is focused on selecting a single contact (right-side), as I believe a regression exists there. I've been struggling a bit with establishing data-parity between the two builds I'm measuring.
ac1 user created by the seed-data had a properly set contact_id attribute in the 3.5 dataset, but did not have it set properly on the 3.4 dataset. I'm unsure if this is a user-error or tooling issue. Investigating ... These parity differences in the data resulted in some of the performance differences reported above. Here are my latest numbers for Tecno J1. Data is in milliseconds, a mean of three measures.
聽 | 3.4 | 3.5 | Delta | Delta %
-- | -- | -- | -- | --
Clinic_1 | 1,122 | 1,168 | 46 | 4%
Clinic_2 | 1,235 | 1,365 | 129 | 10%
Clinic_3 | 4,115 | 4,465 | 350 | 8%
Clinic_4 | 9,988 | 10,443 | 455 | 5%
Clinic_5 | 22,108 | 24,427 | 2,319 | 10%
Clinic_6 | 28,265 | 29,417 | 1,152 | 4%
The PR above is ready for AT: 5755-contacts-perf
@newtewt I'll need your help setting up a repro of the magnitude of load times reported above. I've failed to reproduce times >10% on gamma.dev and gamma-b.dev with fresh seed data.
Went through this with @newtewt today and he helped clarify his measurement process. For each measurement, he clears the cache, resyncs data, and then measures the time-to-first-load for the tab. In contrast, I have been navigating to the tab to ensure everything is warmed and only after warming do I compare the tab's load times.
So this regression is only in the "first-load time" for the tab, and does not impact subsequent loads. It is caused by a need to warm the contacts_by_type view in 3.5, but in 3.4 this view is already warmed during bootstrap. This change is due to this #4423 as the PlaceHierarchy service warmed these views indirectly via the Contacts service.
Fixing This - I'm unsure of what to actually do here, as it is unclear what view warming should happen at startup vs what view warming should happen during the user's first experience. Should we explicitly warm all views on bootstrap - we could allow the user to click a button to "use the app before it is fully optimised"?
Alternatively, there seems to be an assumption that the RulesEngine service is responsible for view warming here. Though that isn't at all clear from reading its code that this is part of its duties. We could continue this sort of "balancing act" by updating it to warm the contacts_by_type view instead of the contacts_by_type_index_name view.
Quality Assurance -- While it is not without value to measure the first-load time, I think it is more important to monitor for regressions in not-first-load times. So I recommend that we update or replace that measurement in our release testing process. @newtewt @ngaruko
Ready for AT in 5755-dbwarm and 5755-contacts-perf.
On the 5755-dbwarm branch, would you please test for regressions in export?
@kennsippell These two branch seem slower than master on contact page, at least on first-load. The not-first-load seems a lot better but still, for the same user, it looks smoother than first-load, but not as smooth as on master. Tested on local. see screen recordings .
master.mov.zip
5755-dbwarm2.mov.zip
Thanks for testing this @ngaruko! I think the measures are a little unclear because you loaded the reports tab before loading the contacts tab when testing master. This actually ends up warming up parts of the contacts tab and speeds up the tab's first-load time. Pretty non-obvious, but that makes the comparison a bit unfair :)
If you try this on a mobile device, and navigate to the contacts tab as your first action for both control and experiment - I think you'll see the difference clearly. These were the times I measured (time of first-load for left-side panel).
Desktop | master | proposed
-- | -- | --
Sample 1 | 2379 | 44.3
Sample 2 | 2364 | 10.2
Sample 3 | 2278 | 37.2
MEAN | 2340 | 31
Tecno F1 | master | proposed
-- | -- | --
Sample 1 | 10576 | 350
Sample 2 | 10377 | 482
Sample 3 | 13062 | 339
MEAN | 11338 | 390
The branch 5755-contacts-perf should only improve second-load time and should have no impact on the first-load time.
Thanks @kenn for the clarification - I will close this and continue to monitor performance on large instances with feature phones! Will also pay some more attention during our release testing on a gamma.dev.
@kenn Can you please have a look at the conflicts on the branch before we close this ?
Please, don't mention my name. :)
@ngaruko Merged conflicts on the 5755-contacts-perf branch.