Let's investigate in what situations a task might not get marked as done even if the task was completed (i.e. a feature/toggle was turned on).
Could there be a race condition somewhere that manifests only occasionally, on specific browsers or network conditions? Maybe only when the app state is fresh?
https://github.com/Automattic/wp-calypso/pull/33193 was an attempted fix and it might've reduced these problems but we've since observed this still happening.
This can be little tricky to replicate but here goes:
Could be happening with other tasks too, not just with downtime monitor.
State get updated right away.
The task was unfinished.
Context paObgF-eU-p2#comment-546
Team @Automattic/zelda have you observed this bug with .com onboarding checklist?
I think there are a few things. Problematic but may not be the cause is that the shape of the data is inconsistent. In the lifetime of the app I observe all of the following and I recall seeing null as well:
{ completed: false }
{ completed: true }
{ completed: "" }
{ completed: "1" }
I'm not able to reproduce this consistently, but I believe the root of the problem is a data race. There is Calypso state, synced data, and remote data.
The settings module uses a rest-api endpoint which queries the remote site directly. Module settings are very consistent. When we actually do a task like enabling monitor, we're changing options on a remote site. The settings page stays up to date because it queries the remote site.
I believe the checklist data uses synced data, which may not be up to date. This leads to stale data being presented in the Checklist when we return from the settings page until the changes are synced.
State exists locally in Calypso, which we update properly but may be overwritten with stale data by querying the checklist endpoint.
I believe the race looks something like this:
Module updates
Trigger correct state update
|
|
โฝ
Calypso checklist state
โโโโ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
๐โ
โ
โ
๐ต๐ต๐ต๐ต๐ต๐ต๐ต๐ต
โณ โณ
| |
| |
Checklist loads Return to checklist
queries synced data Query stale synced data
โณ โณ
| |
| |
Synced data
โโ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
๐ต๐ต๐ต๐ต๐ต๐ตโ
โ
โ
โ
โ
โณ โณ
| |
| |
Arbitrary sync Arbitrary sync
| |
| |
โณ โณ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
๐โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
โ
Remote site data
| โณ
| |
โฝ |
Settings loads Module updates
queries synced data Pushes remote data
I've created a PR that replaces the Jetpack Monitor task's completion state with the remote site module activation data: https://github.com/Automattic/wp-calypso/pull/34049
I believe that will make it impossible to hit the data race described above for Jetpack monitor. It's meant to validate this theory and not as a proposed fix.
Just observed it with the video hosting task:
JETPACK_MODULE_ACTIVATE_SUCCESS, which set completed to true for jetpack_video_hosting :+1: SITE_CHECKLIST_REQUEST. A little later, SITE_CHECKLIST_RECEIVE reset completion status for jetpack_video_hosting to false :-1: One note, SITE_CHECKLIST_RECEIVE has some dataRequests meta info, i.e. SITE_CHECKLIST_REQUEST is going through the data-layer:

So _maybe_ it's the data layer's cache that's getting in our way? :thinking:
So _maybe_ it's the data layer's cache that's getting in our way? ๐ค
I don't think there's any data layer caching involved here. If you navigate between /plans/my-plan/SITE_SLUG you'll see the network traffic for https://public-api.wordpress.com/rest/v1/sites/{ SITE_ID }/checklist
If my data race theory is correct, I can think of a few ways to address it:
So _maybe_ it's the data layer's cache that's getting in our way? thinking
I don't think there's any data layer caching involved here. If you navigate between
/plans/my-plan/SITE_SLUGyou'll see the network traffic forhttps://public-api.wordpress.com/rest/v1/sites/{ SITE_ID }/checklist
Point taken :+1: (Furthermore, plain data-layer doesn't seem to support caching based on freshness (http-data possibly does))
If my data race theory is correct, I can think of a few ways to address it:
- Stop using the checklist endpoint entirely, replace checklist state with module activation state in the checklist (the same data the settings use). This would essentially extend #34049 to all of the checklist tasks. This is probably viable for many of the tasks which are tied to module activation, but not all of them.
Yeah, plus:
- Replace logic in the checklist endpoint so that current remote data is fetched instead of using synced data.
Seems like the better choice then.
(Architecturally: Too bad this means working around yet another concept/tool that we have -- it means that that tool has deficiencies. Does it specifically mean that Sync is broken? Is it not syncing often enough for our purposes? Can we force-sync after module activation?)
I agree with the assessment @ockham.
This seems like another motivation for moving to a Jetpack compatible endpoint that handles task dismissal data on the site.
Moved to Jetpack Improved Onboarding for triaging.