Site-kit-wp: Analytics: gtag misconfigured for AMP causing _gl query parameter injection while navigating on origin

Created on 11 Sep 2020  ·  9Comments  ·  Source: google/site-kit-wp

Bug Description

I originally opened this as an AMP issue: https://github.com/ampproject/amphtml/issues/30176. However, it appears the issue is with how Site Kit is configuring gtag.

On my WordPress site which is AMP-first, I have the Google Site Kit plugin running with Google Analytics configured like this (as provided by Site Kit):

<amp-analytics type="gtag" data-credentials="include">
    <script type="application/json">
        {"vars":{"gtag_id":"UA-123456-1","config":{"UA-123456-1":{"groups":"default","linker":{"domains":["weston.ruter.net"]}}}},"optoutElementId":"__gaOptOutExtension"}
    </script>
</amp-analytics>

From the homepage at https://weston.ruter.net/ if I then click on post for “Integrating with AMP Dev Mode in WordPress” which has the URL of https://weston.ruter.net/2019/09/24/integrating-with-amp-dev-mode-in-wordpress/ I am actually taken to a URL like:

https://weston.ruter.net/2019/09/24/integrating-with-amp-dev-mode-in-wordpress/?_gl=1~abcde5~

This happens for each internal link I click on my site. When I navigate around my site, each request includes a _gl query parameter, and then it gets stripped out with history.replaceState() when the page initializes (as far as I can tell).

As I understand, this is in order to measure customer journeys across domains. Nevertheless, I am not navigating across domains. I am navigating on my own domain.

This _gl query parameter only be added links for pages that are served on the AMP Cache over to the origin domain. The injection of this query parameter when navigating around on the origin server is problematic for a few reasons:

  1. It is distracting for users who care about what is in the Location bar. I see I am navigated to some URL with a long random string, and then it disappears. That's somewhat disconcerting.
  2. It can interfere with full-page caching since no two page requests will have the same URL.
  3. It also breaks service worker caching of navigation requests, unless you figure out logic to explicitly strip out the _gl parameter when caching a response or looking up a previously-cached response.

According to @zhouyx in https://github.com/ampproject/amphtml/issues/30176#issuecomment-690775882:

I can see two issues here.

  1. The destinationDomain is set to weston.ruter.net. By default, AMP won't decorate a url with the exact same domain. However the domains value override that. This instructs AMP to always to decorate urls to this domain.
  2. The proxyOnly value returned is false. As you mentioned, by default AMP doesn't decorate url if the page is served from the origin. However the proxyOnly: false config override that behavior, and instructs AMP to always decorate url no matter what.

The unexpected behavior should be fixed by changing the config. Thanks.

So for the first part, it appears that this configuration is not correct and should perhaps be removed:

https://github.com/google/site-kit-wp/blob/84c97270919ca63101b114c6a73a1c7ca63f4e49/includes/Modules/Analytics.php#L360-L362

Secondly, when amp-analytics is requesting the configRewriter.url this is returning a configuration of linkers._gl.proxyOnly being false. I'm not sure how that is specified, but it appears to be part of the problem.

Steps to reproduce

  1. Go to https://weston.ruter.net/
  2. Navigate around the site.
  3. Notice that the _gl query parameter is added to links when clicked and this is removed when the destination page is loaded.

_Do not alter or remove anything below. The following sections will be managed by moderators only._

Acceptance criteria

Implementation Brief

QA Brief

Changelog entry

Analytics P1 Bug

Most helpful comment

The linker.domains config was introduced in #1203 via c5a770544861c399117577b02615c6e7e12e74f5.

@felixarntz @aaemnnosttv In #1160 there is:

  • Unconditionally include AMP linker configuration in amp-analytics options for gtag via "linker": { "domains": ["wp-site-domain.com"] } (see https://developers.google.com/gtagjs/devguide/amp#link_domains).

According to the Link domains, it says:

The domain linker enables two or more related sites on separate domains to be measured as one.

Since only one domain is being tracked, apparently domains should be omitted. Otherwise, it says:

The capability to link to your canonical domain from the AMP cache is enabled by default.

So as long as the config isn't including "linker":"false", there doesn't seem to be any need to include anything.

All 9 comments

The linker.domains config was introduced in #1203 via c5a770544861c399117577b02615c6e7e12e74f5.

@felixarntz @aaemnnosttv In #1160 there is:

  • Unconditionally include AMP linker configuration in amp-analytics options for gtag via "linker": { "domains": ["wp-site-domain.com"] } (see https://developers.google.com/gtagjs/devguide/amp#link_domains).

According to the Link domains, it says:

The domain linker enables two or more related sites on separate domains to be measured as one.

Since only one domain is being tracked, apparently domains should be omitted. Otherwise, it says:

The capability to link to your canonical domain from the AMP cache is enabled by default.

So as long as the config isn't including "linker":"false", there doesn't seem to be any need to include anything.

This filter seems to fix the problem at least regarding the domains point:

add_filter(
    'googlesitekit_amp_gtag_opt',
    function ( $gtag_opt ) {
        foreach ( $gtag_opt['vars']['config'] as &$config ) {
            unset( $config['linker']['domains'] );
        }
        return $gtag_opt;
    }
);

The second point regarding proxyOnly:false I'm not sure about, as that seems to be coming from a response from https://www.googletagmanager.com/gtag/amp.

@westonruter My understanding from the documentation when this was originally implemented is that linker.domains is necessary to track traffic correctly between an AMP and non-AMP version of the site too. It would be great to get clarity on what exactly the domains configuration parameter does, and about whether the following are handled automatically by Analytics or requires some configuration:

  • Tracking between AMP and non-AMP versions of the same site (e.g. on same domain).
  • Tracking between AMP cache and origin.

For example, Site Kit configures Analytics linker with domains for both AMP and non-AMP, with that understanding in mind. If that is incorrect / not needed, it should be removed from both of these places I would assume.

Hi @felixarntz

domains, aka destinationDomains in the <amp-analytics> config instructs AMP which outgoing urls should be decorated.

This section documents the destinationDomains matching behavior.

If all you need is to decorate the urls from AMP cache to AMP origin on the same domain. I think the important part is to set proxyOnly value to true. So such decoration will only happen from AMP cache to origin. You may then choose to leave the destinationDomains as it is, or remove it and use the default value then. Does this makes sense? Thanks

@zhouyx Hello. I'm not sure I understand. The current amp-analytics being output by Site Kit looks like the following:

<amp-analytics type="gtag" data-credentials="include">
<script type="application/json">
{
    "vars": {
        "gtag_id": "UA-1234567-1",
        "config": {
            "UA-1234567-1": {
                "groups": "default",
                "linker": {
                    "domains": [
                        "example.com"
                    ]
                }
            }
        }
    },
    "optoutElementId": "__gaOptOutExtension"
}
</script>
</amp-analytics>

There's only one linker key here. Would it still apply? Should the domains key just just replaced with "proxyOnly": true? For a site that that exists only on one domain (e.g. example.com) what makes the most sense as a default configuration?

I agree, we probably don't need to add the domain here at all. Was this required previously to get linking from the cache to origin to work correctly? To your question @felixarntz I think we would need to test each scenario careful to ensure removing the linker domain doesn't affect how the sessions are recorded.

That said, I think the bug reported here: _the _gl parameter including the origin domain when being served from origin_ may indeed be a bug in amphtml. @westonruter if you turn off AMP on your site, I would expect the links to no longer include the _gl parameter, can you verify that?

I was recently reviewing how the linker code works in AMP for the Newspack project (in relation to this ticket) and noticed a discrepancy between the way the gtag auto-linker code works vs. the autoLinker code in amphtml.

In the gtag.js case, when a linker domain matches browser.document.location.hostname it is explicitly excluded from being added to auto linking (for forms as well). In AMP that is not the case. Maybe we can fix this upstream? I will respond on the amphtml issue you opened.

I created a potential fix upstream would would prevent the AMP auto-linker from ever adding link decorations to the same domain: https://github.com/ampproject/amphtml/pull/32100.

We can also test removing the linker section from Site Kit, maybe it was required previously and no longer is? I think we added this to ensure links from cache to origin were tracked as a single session, I'm guessing that was required previously then became the default at some point?

if you turn off AMP on your site, I would expect the links to no longer include the _gl parameter, can you verify that?

Correct. If I turn off AMP then the _gl parameter is no longer added to same-origin navigations. Likewise, when I switch to Transitional mode and browse non-AMP pages I do not have the _gl parameter added; however, when I switch to browse the AMP versions, then the _gl parameter is added.

Reviewing this doc: https://support.google.com/analytics/answer/7486764?hl=en it clearly implies that you do need to add your domain to the linker->domains data to get cache to origin sessions tracked correctly.

This doc explains why we have the linker attribute in place, unless it is no longer true. Based on that we need to leave this in Site Kit and work on the upstream fix.

Was this page helpful?
0 / 5 - 0 ratings