Site-kit-wp: Recognize 404s in entity detection

Created on 31 Aug 2020  Â·  4Comments  Â·  Source: google/site-kit-wp

The current entity detection doesn't handle 404s in the same way that WordPress does, which can yield incorrect results. WordPress has WP::handle_404() to modify the main WP_Query instance flags accordingly after running it. We'll need to implement a method which takes care of that generically.


_Do not alter or remove anything below. The following sections will be managed by moderators only._

Acceptance criteria

  • Site Kit's entity detection should handle 404s correctly, in the same way that WordPress core does:
    *Queries without results should always end up as a 404 (i.e. WP_Query::$is_404 is true), except for:
    * non-paginated valid author archives
    * non-paginated valid taxonomy archives
    * non-paginated valid post type archives
    * non-paginated home archive or search results

    • Queries for a single paginated post should end up as a 404 if the pagination is out of bounds (e.g. if the post is not paginated at all or if the page query parameter is greater than the number of available pages in the post).

Implementation Brief

  • Introduce a Synthetic_WP_Query utility class which extends WP core's WP_Query.
  • Extend WP_Query::get_posts():

    • Call WP_Query::get_posts().

    • Implement logic inside that method that duplicates the logic in WP::handle_404(), detecting a 404 based on query vars and found posts in the same way that WordPress does and calling WP_Query::set_404() as necessary.

  • Extend WP_Query::parse_query() to use a flag for whether the method has already been run for the given $query arguments, and if so, not run again.

    • This is an easy side effect win we can get from extending the class - this way we don't unnecessarily parse query vars twice, which will also remove the need for one usage of switching to frontend context as part of #1978.

  • Merge #1981.

QA Brief

  1. Create some sample content in WordPress to test:

    • a category with one post assigned to it

    • another category with no posts assigned to it

    • a user (without any posts written)

  2. Go to the Site Kit single URL details view {siteURL}/wp-admin/admin.php?page=googlesitekit-dashboard&permaLink={url}, replacing url as explained below. For every URL, if indicated with "yes" below, ensure the white box at the top shows a corresponding title; if indicated with "no" below, ensure the white box at the top is empty (i.e. this is not considered a valid WordPress URL, i.e. if you visited it in the frontend you would get a 404):

    • an existing post's URL --> yes

    • a URL in similar format to the existing post's one above, but with a post slug that does not exist --> no

    • the URL of the category with one post assigned to it --> yes

    • the URL of the category with no posts assigned to it --> yes

    • the URL of the category with one post assigned to it, followed by page/2/ (this would be the second page if the category would have more than 10 posts, which is not the case here) --> no

    • the URL of an author archive that has written at least one post --> yes

    • the URL of the author archive for the user above that does not have any post --> yes

    • the URL in a similar format of the other author archive URL, but with a user name that does not exist --> no

    • the URL of the main blog --> yes

    • the URL of the main blog, followed by page/{index} (ensure index is a number greater than the number of posts that exist on your site divided by 10) -> no

Changelog entry

  • Improve entity detection so that single URL details view only works for URLs which do not result in a 404 per WordPress behavior.
P1 Bug

All 4 comments

The run_query method feels a bit out of place as a public method on the factory class. It seems like the intention is to lazy-load the execution of the query, which is understandable but it seems a bit off that we would need to go back to the factory to do that. Ideally, I think the factory would simply provide an object that works the way we want 🙂

What do you think about this instead?:

  • Create a new Synthetic_WP_Query class that extends WP_Query
  • Synthetic_WP_Query::get_posts would be extended with the logic of this method (calling parent::get_posts first), to preserve the desired on-demand nature ($query would become $this)
  • WP_Query_Factory would be updated to return instances of Synthetic_WP_Query
  • The use would then be unchanged:
    E.g. Entity_Factory::from_url would still call $query->get_posts() rather than WP_Query_Factory::run_query which would no longer be necessary

@aaemnnosttv Good idea! I've amended the IB. I've also added a point on overriding parse_query so that it doesn't unnecessarily run twice, which I think we can do as part of this issue.

IB ✅

Tested

Installed latest release candidate, activated SK and followed the above steps:

image

image

Comparing the above screenshots, scenario 'yes' and 'no'

Passed QA ✅

Was this page helpful?
0 / 5 - 0 ratings