According to internal logs, some calypso_page_view
paths are reported without replacing IDs with :variable
s. Examples include:
(List updated on March 20, 2018 by @vindl - p7jreA-1xN-p2)
/checklist/:site/[site]
/checkout/[site_id]/personal
/checkout/[site]/theme:[theme_name]
/checkout/domain_map:[domain]/renew/[product_id]
/checkout/offsite_redirect/renew/[product_id]
/checkout/thank-you/[site]/[product_id]
/checkout/value_bundle/renew/[product_id]/[site]
Note that in the above routes, I'm not sure if the reported id
is actually product_id
or something else.
/comment/[comment_id]
/comments/all/[comment_id]
/customize/undefined
I also noticed a couple of strange cases like: /customize/WP-ADMIN
, /customize/WP-admin
, /customize/wd-admin
, /customize/wp-Admin
, and /customize/admin
. Not sure where these are coming from or why they are recorded.
/domains/add/[site]/google-apps
/domains/manage/[site]/transfer/precheck
/domains/manage/edit/[site]
/domains/manage/edit/[random_ip_address]
/domains/manage/[site]/email-forwarding
/domains/manage/transfer/precheck/[site]
/me/purchases/[purchase_id]
/me/purchases/[site]/[purchase_id]
/me/get-apps?retry=1
/me?4mstfrm=[id]
/me?q=[string]
/me?sid=[id]
/me?updated=password
/media/[filename]/:site
/page/[page_id]
/pages/[page_name?]
/people/invites/[invite_hash]
/plugins/akismet/localhost::danish
/post/[post_id]
/post/[site]/:site
/posts/my/[site]/:site
/stats/day/[site]/:site
/stats/post/[post_id]
There are a lot of /start/
page views recorded because the language is appended to the path. Should these be replaced with a :locale
variable or similar? For example:
/start/about/el-po
/themes/blog/filter/accessibility-ready+author-bio+blog-excerpts+fixed-layout+fixed-layout+fluid-layout+grid-layout+responsive-layout
These consist the majority of entries, because of all the different filter combinations that can be applied and recorded.
/themes/blog/free/filter/one-column/type/:tier/:site_id
Here it seems like the tier
as recorded both explicitly and with :tier
variable.
/themes/filter/subject:[subject_name]/:site_id
@yanirs can you provide more details on these calypso_page_view
events with errant path
?
Even better would be instructions on how to fetch them myself (I am a fairly new A12s) so I can trace where in Calypso code errant paths are being generated. Thx.
@yanirs I am unable to reproduce this. And, not sure if my Hive query is correct or not but it looks like there was only 1 instance of calypso_page_view
event with path
containing asterisk (*
) which appears to be the pattern above.
@donpark Sorry for the confusion. The paths don't contain asterisks, they contain actual IDs. I'll PM you a Hive query.
@donpark Thanks for working on this. Quick question: does #17938 also address the last two reported cases: /stats/post/*000000*
and plugins/browse/*example.wordpress.com*/:site
?
In any case, I'll rerun the queries in a few days to verify that the issues have been resolved.
@yanirs Unfortunately no.
Background: Calypso page view reporting are done by each route specific handlers using either analytics.pageView.record
or PageViewTracker
. In both case, the path to be reported is constructed programmatically which makes locating the source of erroneous reports tedious.
/stats/post/*000000*
is reported using PageViewTracker
which I haven't rigged yet to use the new sectionifyWithRoutes
function.
I still need to locate where plugins/browse/*example.wordpress.com*/:site
is coming from. It may have been removed in recent plugin store refactor.
Anyway, please do keep updating this issue with routes that needs fixing.
Oh, I just found that /checkout/thank-you/:receipt
could be reported with either receipt ID or domain name. Oy.
Thanks for the update, @donpark. Here are a few more routes:
/comments/all/example.wordpress.com
/comments/approved/example.wordpress.com
/comments/pending/example.wordpress.com
/comments/spam/example.wordpress.com
/comments/trash/example.wordpress.com
/domains/manage/edit/example.wordpress.com
/media/abc123
/page/123
/post/123
/posts/my/12345/:site
If every case requires specific handling, it may make sense to bring in the people who committed the original code. For example, I believe comment management is new in Calypso, so maybe @gwwar or someone else from her team can look into those paths.
If every case requires specific handling, it may make sense to bring in the people who committed the original code. For example, I believe comment management is new in Calypso, so maybe @gwwar or someone else from her team can look into those paths.
@yanirs I think we need to open up discussion on refactoring how we handle routing, maybe forking or replacing page
module. New routing features we need for better page view reporting are:
make routing path
pattern available to route handlers - this would remove the need to clumsily recreate the path
using sectionify
or sectionifyWithRoutes
.
support pre- and post- routing handlers - this can be used to automate page view reporting.
support path fragment patterns (so domain_map:foo.blog
can be reported as domain_map:domain
).
@donpark Yeah, it sounds like some refactoring is required to stop new paths with IDs from popping up. I'm unfamiliar with the codebase so I don't know what the best approach is -- probably best to mention people who can help and ask for their opinion.
cc @Automattic/lannister to update comment routes
I'm happy to help with the /comments
routes 馃憤
@yanirs @donpark If I understand the above correctly, I'm looking for instances of analytics.pageView.record
or PageViewTracker
that handle the routes mentioned above, and then adapting them to use the new sectionifyWithRoutes
? (I'd expect any refactoring of routing and/or replacing page.js
would be a long discussion and process, so happy to do this in the meantime.)
If I understand the above correctly, I'm looking for instances of analytics.pageView.record or PageViewTracker that handle the routes mentioned above, and then adapting them to use the new sectionifyWithRoutes?
@kwight yes. sectionify
can't handle complex paths so I added sectionifyWithRoutes
which uses an array of Route
instances to figure out the path
to report. It's a hack but, short of refactoring mentioned above, we need it when complex paths are used. I don't know if there any handlers for complex routes using PageViewTracker
but we may likely need to use sectionifyWithRoutes
in PageViewTracker
. Some of the misreported page views involving simple path patterns are I think just bugs.
Any help you can provide with page view reporting cleanup is highly appreciated. Thx.
Some of the misreported page views involving simple path patterns are I think just bugs.
I think this is our case for comments, looking at other uses of PageViewTracker
in the codebase. @yanirs I'm not familiar with how the variables work; does #18414 look like it will give what you need?
we need to open up discussion on refactoring how we handle routing, maybe forking or replacing page module. New routing features we need for better page view reporting are:
@donpark if you want to help with routing we've had it on our todo list for a long time - make a reliable Redux-based routing middleware. eliminating page.js
and moving to a declarative introspectable Redux-based approach could help us out in so many ways.
of course, this is a massive project, but one which can be done incrementally (start with a routing action that includes no page.js
middleware, start with one route at a time).
@dmsnell I'll add Redux-based routing refactor to my TODO list. Just started a bonfire going on the AT side so it may take a while for me to get to it. Until then, sniping misreported page view paths as they show up in Tracks will have to do.
Comments paths fixed yesterday in #18414 .
@yanirs do you mind updating the summary to include what is currently known as broken? I feel like we can clear the known ones out, close this, then open new issues as needed for any others that appear.
@gwwar Sorry for the delayed reply – I was on leave.
There are quite a few different paths. I'm not sure it makes sense to address them separately, especially as new ones pop up and discovering them requires manually scrutinising query results. It'd be better to come up with a general solution, as discussed above. However, I can send you some pointers to queries if you want to follow the manual approach.
@yanirs I'll be more than happy to fix any known issues, but if you feel that doing some manual work now may not be worth the effort, I'd suggest we mark this as Blocked
(maybe #16900 too) and create a new issue to work on a _framework_ solution for this.
@rodrigoi I don't see the need to create a new issue, which would essentially be a duplicate of this one. Fixing some paths while new ones break is not a sustainable solution, especially since this issue has been reported in August.
I'm not familiar with Calypso internals enough to say exactly what needs to be fixed, but as shown by @donpark's initial attempts and @dmsnell's comment above, it isn't a trivial fix. I think that the next step should be for someone who's familiar with Calypso analytics to dig into this issue and into #16900 (@markryall also mentioned he may have some capacity to work on this). I don't mind if the result of this investigation is a list of new issues, but let's keep this one open until we know what needs to be fixed. :slightly_smiling_face:
@rodrigoi @yanirs I would recommend the following:
cc @Automattic/team-calypso for 馃on how to make it easier to get this right
I audited the recorded page view events in order to determine for which ones the above issue is still present. Essentially I used the queries mentioned above and manual inspection to obtain the results (p7jreA-1xN-p2). I'm quite confident that this represents the majority (if not all) of the misreported paths in Calypso.
In the following list I'll use [variable]
as a placeholder for exact values that were not replaced with corresponding :variable
from path pattern in order to avoid exposing exact data. For example if /stats/example.wordpress.com
appears in reported page views, it will be presented here as /stats/[site]
(the exact variable names used are not important).
/checklist/:site/[site]
/checkout/[site_id]/personal
/checkout/[site]/theme:[theme_name]
/checkout/domain_map:[domain]/renew/[product_id]
/checkout/offsite_redirect/renew/[product_id]
/checkout/thank-you/[site]/[product_id]
/checkout/value_bundle/renew/[product_id]/[site]
Note that in the above routes, I'm not sure if the reported id
is actually product_id
or something else.
/comment/[comment_id]
/comments/all/[comment_id]
/customize/undefined
I also noticed a couple of strange cases like: /customize/WP-ADMIN
, /customize/WP-admin
, /customize/wd-admin
, /customize/wp-Admin
, and /customize/admin
. Not sure where these are coming from or why they are recorded.
/domains/add/[site]/google-apps
/domains/manage/[site]/transfer/precheck
/domains/manage/edit/[site]
/domains/manage/edit/[random_ip_address]
/domains/manage/[site]/email-forwarding
/domains/manage/transfer/precheck/[site]
/me/purchases/[purchase_id]
/me/purchases/[site]/[purchase_id]
/me/get-apps?retry=1
/me?4mstfrm=[id]
/me?q=[string]
/me?sid=[id]
/me?updated=password
/media/[filename]/:site
/page/[page_id]
/pages/[page_name?]
/people/invites/[invite_hash]
/plugins/akismet/localhost::danish
/post/[post_id]
/post/[site]/:site
/posts/my/[site]/:site
/stats/day/[site]/:site
/stats/post/[post_id]
There are a lot of /start/
page views recorded because the language is appended to the path. Should these be replaced with a :locale
variable or similar? For example:
/start/about/el-po
/themes/blog/filter/accessibility-ready+author-bio+blog-excerpts+fixed-layout+fixed-layout+fluid-layout+grid-layout+responsive-layout
These consist the majority of entries, because of all the different filter combinations that can be applied and recorded.
/themes/blog/free/filter/one-column/type/:tier/:site_id
Here it seems like the tier
as recorded both explicitly and with :tier
variable.
/themes/filter/subject:[subject_name]/:site_id
There are quite a few of them, but it looks like this could still be be tackled on case by cases basis, since it's not clear when will the Calypso routing be refactored in a way that could hopefully solve this problem in a more generic way. If you all agree I could create separate issues for each of the sections reported above so we could split work between devs and involve people that know more about each route.
If you all agree I could create separate issues for each of the sections reported above so we could split work between devs and involve people that know more about each route.
Let's do it! Feel free to write one per section to save a bit of time (with checkboxes if there are a number of paths to fix). Let me know if you need a hand with that.
I logged all of the issues separated by section in the following milestone https://github.com/Automattic/wp-calypso/milestone/238, and also added a framework issue to add eslint rule or reporting in order to prevent new errors from popping up. Closing this out for now because the new issues cover everything that was reported here. After all of the fixes are deployed, we will continue monitoring the logs to make sure that no new issues with page view reporting appear.
This is awesome, thanks @vindl!
Most helpful comment
I logged all of the issues separated by section in the following milestone https://github.com/Automattic/wp-calypso/milestone/238, and also added a framework issue to add eslint rule or reporting in order to prevent new errors from popping up. Closing this out for now because the new issues cover everything that was reported here. After all of the fixes are deployed, we will continue monitoring the logs to make sure that no new issues with page view reporting appear.