Site-kit-wp: Caching plugins interfering with site verification process

Created on 11 Nov 2019  ·  15Comments  ·  Source: google/site-kit-wp

For users who are not yet verified site owners, Site Kit obtains and places a verification <meta> tag in the <head> of every frontend request. This is then checked by the Site Verification API to satisfy the requirement for site ownership.

Caching plugins interfere with this when they serve a stale version of the page when requested by the site verification API which does not yet contain the added meta tag which causes the verification step to fail.

Currently the only solution for this is for the site owner to manually clear their cache and try again.

Ideally, Site Kit should attempt to clear the cache (or otherwise signal plugins to do this) for the homepage when adding a new meta tag to be displayed so that all tags are properly rendered on the next request.


_Do not alter or remove anything below. The following sections will be managed by moderators only._

Acceptance criteria

  • Introduce file verification mechanism instead of the HTML Tag (which would be much cleaner too).
  • Keep HTML Tag verification as a fallback.

Implementation Brief

  • Add a new supports query parameter to the OAuth_Client::get_proxy_setup_url in all places where version is currently added

    • A comma-separated list of supported features, conditionally including file_verification



      • Support should be based on whether or not the home_url contains a path


      • Sub-directory installs or any other setup which uses a path in the home URL is not supported



  • Update handle_verification_token to handle different types of verification tokens based on a new googlesitekit_verification_token_type parameter which will support either META or FILE types.

    • META types will be handled by the current Verification_Meta class



      • Renamed from Verification_Tag



    • FILE types will be handled by a new Verification_File class

    • If not passed, it should default to META to preserve the current behavior

  • Rename from Verification_Tag and references to Verification_Meta
  • In handle_verification_token when redirecting back to the proxy with verify=true include an additional verification_method query parameter with the current verification type
  • Verification_File should expose the same public methods as Verification_Meta
  • Save the file verification in a new googlesitekit_site_verification_file user option
  • Move Authentication::handle_verification_token() and Authentication::print_site_verification_meta() to the Site_Verification module class
  • Add Verification_File::OPTION to Reset

Handling Verification File Requests

Changelog entry

  • Add support for site verification via file as primary method while keeping site verification via meta tag as fallback.
P0 Enhancement

All 15 comments

@aaemnnosttv A couple of notes that relate to the (upcoming) proxy implementation for this (consider this a more technical addition to the ACs):

  • The plugin should send a file_verification_support=true query parameter in all /site-management/setup/ requests (i.e. in the same place where we send the version parameter. (We've decided that sending more specific parameters for new features is more sustainable in the long run. Let's keep the version in place, but going forward a distinct parameter should be used to inform the proxy about compatibility with something.)
  • In the function where the plugin currently receives the googlesitekit_verification_token, we should also check for an additional parameter googlesitekit_verification_token_type, which can either be META or FILE. Based on that, we should store the received token in a different user option (for parity with the existing one, I suggest calling it googlesitekit_site_verification_file).
  • The retrying part for the fallback will happen on the proxy end. The plugin has to only ensure it supports both types and that the functionality for rendering meta tags / intercepting *.html file requests for a (virtual) token file works correctly in general.

A couple of other thoughts regarding implementation:

  • I'd suggest implementing a new class Verification_File that works similar to the Verification_Tag class (potentially we should rename this existing class to Verification_Meta for consistency with what the type is called by SV API?). The class should as well include the get_all(), relying on another transient (that should be cleared in the same way as the other one when a new token is received).
  • For the implementation of intercepting the /xyz.html file requests, I suggest reviewing closely how the Yoast SEO plugin solves doing a similar thing for its XML sitemaps (which by that should be proven to work widely, I guess). See https://github.com/Yoast/wordpress-seo/blob/b8e1f946916e218c9f6ac5f57b2c2587342ce9e6/inc/sitemaps/class-sitemaps-router.php#L11, plus some related tweaks to ensure compatibility in https://github.com/Yoast/wordpress-seo/blob/b8e1f946916e218c9f6ac5f57b2c2587342ce9e6/inc/sitemaps/class-sitemaps.php#L13 (e.g. particularly https://github.com/Yoast/wordpress-seo/blob/b8e1f946916e218c9f6ac5f57b2c2587342ce9e6/inc/sitemaps/class-sitemaps.php#L138-L150 and https://github.com/Yoast/wordpress-seo/blob/b8e1f946916e218c9f6ac5f57b2c2587342ce9e6/inc/sitemaps/class-sitemaps.php#L217-L220). Because in our case the filename for the HTML file can theoretically be anything, we might need to rely directly on the URL instead of a query parameter and rewrite rule to not conflict with other plugins trying to also intercept /xyz.html file requests for other purposes. That leads us to https://github.com/Yoast/wordpress-seo/blob/b8e1f946916e218c9f6ac5f57b2c2587342ce9e6/inc/sitemaps/class-sitemaps.php#L227-L270 being the most important inspiration to look at.
  • Let's outsource the existing Authentication::handle_verification_token() and Authentication::print_site_verification_meta() to a dedicated class that will be responsible for all the verification logic, since it doesn't really belong there and especially now would become too much. It would actually be perfect if this was part of the (always-active) Site_Verification module class, if possible. Let's keep the class instances for Verification, Verification_Meta and Verification_File on Authentication for now for common access and BC.

I'd suggest implementing a new class Verification_File that works similar to the Verification_Tag class (potentially we should rename this existing class to Verification_Meta for consistency with what the type is called by SV API?)

That sounds great. I agree that it would be a good idea to rename the existing Verification Tag class as it's a bit confusing as it is now.

The class should as well include the get_all(), relying on another transient (that should be cleared in the same way as the other one when a new token is received).

Why would the file class require a way to get them all? It makes sense for <meta> tags since they all need to be rendered at the same time as there's no way to tell which user the request is for, but for the file method we only really need to make sure that there is a user with this same file to verify them. Everything we need is in the request itself (we don't actually need anything saved to perform the actual verification), we only need to make sure that we have qualified the response (that we have a matching user who can/should be verified).

Because in our case the filename for the HTML file can theoretically be anything, we might need to rely directly on the URL instead of a query parameter and rewrite rule to not conflict with other plugins trying to also intercept /xyz.html file requests for other purposes.

I agree that we probably need to rely on the URL itself, particularly because this isn't something that can be placed anywhere except for at the root of the path. As for the path, I don't think we need to intercept _all_ *.html requests however, only /google[a-z0-9]+.html files. Even if there are other plugins that try to serve other html files, we should be able to remain compatible (especially if we don't need to add a competing rewrite rule).

It would actually be perfect if this was part of the (always-active) Site_Verification module class, if possible

This sounds like a good plan as well 👍

@aaemnnosttv

Why would the file class require a way to get them all?

I guess the get_all() would allow you to fully rely on the functionality offered by that class, by getting all tokens and check whether the currently requested one is among those. But I agree with your point, a singular custom user meta query for that value would be more efficient. We'd need to make sure that this query result is either cached in a transient or fast enough to be acceptable uncached.

As for the path, I don't think we need to intercept all *.html requests however, only /google[a-z0-9]+.html files.

I believe that is true, however let's really ensure that the file tokens always start with google - I'm not sure this is documented anywhere. The google prefix is included in the token returned from the API, so theoretically it could be something different.

Even if there are other plugins that try to serve other html files, we should be able to remain compatible (especially if we don't need to add a competing rewrite rule).

Agreed, let's not add a rewrite rule, and instead only rely on the URL.

IB looks great as it is, great addition determining the support based on whether the home URL includes a path!

@aaemnnosttv Can you include the following two extra points (that I forgot to mention yesterday in the ACs)?

  • When checking googlesitekit_verification_token_type, let's make sure the plugin still works when the parameter is not given, assuming META as default (for BC with current proxy behavior).
  • In handle_verification_token(), when we redirect back to the proxy with verify=true, include another parameter verification_method with the value that the token was just received for (either META or FILE).

With that, this should be good to go to the execution board.

Back to you @felixarntz

We'd need to make sure that this query result is either cached in a transient or fast enough to be acceptable uncached.

We should definitely evaluate how expensive is the query at scale. With that said, this query would only run for google[a-z0-9]+.html requests unlike the meta verification implementation.

IB ✅

We should definitely evaluate how expensive is the query at scale. With that said, this query would only run for google[a-z0-9]+.html requests unlike the meta verification implementation.

Indeed. I don't _think_ the query should be very expensive (it's a checking for a key=value match in user meta) but if needed, we could use a transient as a way to rate limit the query. Even a TTL of 30s or 1m would prevent abusing the endpoint. Not sure this is necessary though. Would be good to test on a site with a lot of users.

@felixarntz I ran into a few issues on this one that are making it take a little longer that I wanted to raise and get your input on.

Firstly, sites that don't use pretty permalinks (permalink_structure) don't use rewrites so we have to rely on the request URI specifically (even the WP robots.txt doesn't work without permalinks). This isn't too big of a deal and should actually simplify things a good deal. I assume we aren't requiring people to use permalinks to use Site Kit so this should be re-implemented without using rewrites, correct?

As for the supports parameter, providing this to the proxy setup URL is a bit tricky. The only options seem to be either adding a new parameter to get_proxy_setup_url to pass an array of features (explicit but inconsistent), or using a filter (similar to required scopes). I tried both and the filter is cleaner, but in this case I don't think it's actually necessary (at least at this point). Whether or not the site supports file verification due to the presence of a path can already be easily inferred from the url parameter which is already passed. For that matter, it isn't clear yet whether or not _meta_ verification will even work if the site has a path in the home URL either.

@aaemnnosttv

so we have to rely on the request URI specifically

I was under the understanding we were going to do that already, given the previous discussion. Yoast SEO relies on a rewrite rule in WP, but I think we should be able to just look at the URL, and if it is one for a SV API file, intercept the request early (e.g. after_setup_theme) and short-circuit it with the according response.

As for the supports parameter, providing this to the proxy setup URL is a bit tricky.

Let's not worry too much about providing a more flexible way. For now we can just set this value internally in get_proxy_setup_url(), based on the home URL. Once we have another feature that requires to be explicitly added support, we can think about a more expressive solution.

I was under the understanding we were going to do that already, given the previous discussion.

Sorry for the misunderstanding, it was in the IB!
"Model core rewrite rules used for robots.txt to match requests for /google[a-z0-9]+.html"

I've gone ahead and made the necessary updates for the rest. Should be ready for initial review shortly 😄

See https://github.com/google/site-kit-wp/pull/882#pullrequestreview-319581735, I just wanted to reference that here: The proxy should only send the token itself, not the redundant google prefix and .html suffix.

Regarding QA: Note that the authentication service doesn't yet have the related functionality implemented. Please make sure that verifying a new site still works as it did before now, and then manually trigger requests to the side with the plugin activated, in order to verify it will act as expected in the future (once the authentication service has been updated):

  1. Send a request to the admin_url() with googlesitekit_verification_token, googlesitekit_verification_token_type, googlesitekit_verification_nonce and googlesitekit_code parameters. The nonce has to be an actually valid nonce, type has to be one of META / FILE (test both), the others can be pretty much anything. Ensure that the passed token is saved in the expected user option and the plugin redirects to the proxy with the same code that was passed, verify=true and the correct verification_method (this should obviously result in a failure on the proxy side then because the token was just something arbitrary).
  2. After doing the above with FILE, access /google<token-from-before>.html and ensure the expected response is sent.
  3. After doing the above with META, ensure that the home page includes the meta tag.

QA ✅

Meta tag redirects and works accordingly; the meta tag is on the homepage after sending the request.

File redirects properly and the file exists in the expected location after sending the request.

It validates the nonces and fails if they're off, etc. Looks good.

LGTM

Was this page helpful?
0 / 5 - 0 ratings