Wp-calypso: Jetpack checklist: ensure state is updated after enabling a feature

Created on 17 Jun 2019  ยท  6Comments  ยท  Source: Automattic/wp-calypso

Let's investigate in what situations a task might not get marked as done even if the task was completed (i.e. a feature/toggle was turned on).

Could there be a race condition somewhere that manifests only occasionally, on specific browsers or network conditions? Maybe only when the app state is fresh?

https://github.com/Automattic/wp-calypso/pull/33193 was an attempted fix and it might've reduced these problems but we've since observed this still happening.

This can be little tricky to replicate but here goes:

Steps to reproduce

  1. Have a Jetpack site โ€” connect it, you can use free plan
  2. You'll end up in "Jetpack checklist"
  3. Choose "downtime monitoring" task
  4. Toggle monitoring on, and return to the checklist
  5. Observe how the task is still unfinished
  6. Refresh and observe how the task is now marked as done

Could be happening with other tasks too, not just with downtime monitor.

What I expected

State get updated right away.

What happened instead

The task was unfinished.

Context paObgF-eU-p2#comment-546

Team @Automattic/zelda have you observed this bug with .com onboarding checklist?

Checklist Jetpack [Pri] Normal [Type] Bug

All 6 comments

I think there are a few things. Problematic but may not be the cause is that the shape of the data is inconsistent. In the lifetime of the app I observe all of the following and I recall seeing null as well:

{ completed: false }
{ completed: true }
{ completed: "" }
{ completed: "1" }

I'm not able to reproduce this consistently, but I believe the root of the problem is a data race. There is Calypso state, synced data, and remote data.

The settings module uses a rest-api endpoint which queries the remote site directly. Module settings are very consistent. When we actually do a task like enabling monitor, we're changing options on a remote site. The settings page stays up to date because it queries the remote site.

I believe the checklist data uses synced data, which may not be up to date. This leads to stale data being presented in the Checklist when we return from the settings page until the changes are synced.

State exists locally in Calypso, which we update properly but may be overwritten with stale data by querying the checklist endpoint.

I believe the race looks something like this:

                              Module updates
                              Trigger correct state update
                                 |
                                 |
                                 โ–ฝ

Calypso checklist state
โ”โ”โ”โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…๐Ÿ“Œโœ…โœ…โœ…๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต

       โ–ณ                                  โ–ณ
       |                                  |
       |                                  |
Checklist loads                      Return to checklist
queries synced data                  Query stale synced data
       โ–ณ                                  โ–ณ
       |                                  |
       |                                  |

Synced data
โ”โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ต๐Ÿ˜ตโœ…โœ…โœ…โœ…โœ…

   โ–ณ                                             โ–ณ
   |                                             |
   |                                             |
Arbitrary sync                                 Arbitrary sync
   |                                             |
   |                                             |
   โ–ณ                                             โ–ณ

โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…๐Ÿ“Œโœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…
Remote site data

               |                  โ–ณ
               |                  |
               โ–ฝ                  |
       Settings loads           Module updates
       queries synced data      Pushes remote data

I've created a PR that replaces the Jetpack Monitor task's completion state with the remote site module activation data: https://github.com/Automattic/wp-calypso/pull/34049

I believe that will make it impossible to hit the data race described above for Jetpack monitor. It's meant to validate this theory and not as a proposed fix.

Just observed it with the video hosting task:

  1. Enabling the video hosting toggle triggered JETPACK_MODULE_ACTIVATE_SUCCESS, which set completed to true for jetpack_video_hosting :+1:
  2. Going back to the checklist triggered SITE_CHECKLIST_REQUEST. A little later, SITE_CHECKLIST_RECEIVE reset completion status for jetpack_video_hosting to false :-1:

One note, SITE_CHECKLIST_RECEIVE has some dataRequests meta info, i.e. SITE_CHECKLIST_REQUEST is going through the data-layer:

image

So _maybe_ it's the data layer's cache that's getting in our way? :thinking:

So _maybe_ it's the data layer's cache that's getting in our way? ๐Ÿค”

I don't think there's any data layer caching involved here. If you navigate between /plans/my-plan/SITE_SLUG you'll see the network traffic for https://public-api.wordpress.com/rest/v1/sites/{ SITE_ID }/checklist


If my data race theory is correct, I can think of a few ways to address it:

  • Stop using the checklist endpoint entirely, replace checklist state with module activation state in the checklist (the same data the settings use). This would essentially extend #34049 to all of the checklist tasks. This is probably viable for many of the tasks which are tied to module activation, but not all of them.
  • Replace logic in the checklist endpoint so that current remote data is fetched instead of using synced data.

So _maybe_ it's the data layer's cache that's getting in our way? thinking

I don't think there's any data layer caching involved here. If you navigate between /plans/my-plan/SITE_SLUG you'll see the network traffic for https://public-api.wordpress.com/rest/v1/sites/{ SITE_ID }/checklist

Point taken :+1: (Furthermore, plain data-layer doesn't seem to support caching based on freshness (http-data possibly does))

If my data race theory is correct, I can think of a few ways to address it:

  • Stop using the checklist endpoint entirely, replace checklist state with module activation state in the checklist (the same data the settings use). This would essentially extend #34049 to all of the checklist tasks. This is probably viable for many of the tasks which are tied to module activation, but not all of them.

Yeah, plus:

  • it'll lead to further inconsistencies with the WP.com checklist, which is embracing the checklist endpoint as their source of truth for tasks
  • if we ever want manually dismissible tasks for the Jetpack checklist, we need to store their completion status remotely, much like the WP.com checklist does, i.e. we'd need a dedicated endpoint
  • Replace logic in the checklist endpoint so that current remote data is fetched instead of using synced data.

Seems like the better choice then.

(Architecturally: Too bad this means working around yet another concept/tool that we have -- it means that that tool has deficiencies. Does it specifically mean that Sync is broken? Is it not syncing often enough for our purposes? Can we force-sync after module activation?)

I agree with the assessment @ockham.

This seems like another motivation for moving to a Jetpack compatible endpoint that handles task dismissal data on the site.

Moved to Jetpack Improved Onboarding for triaging.

Was this page helpful?
0 / 5 - 0 ratings