Amphtml: Prevent self-referrals in AMP pages

Created on 24 Jan 2017  Â·  23Comments  Â·  Source: ampproject/amphtml

Currently, when not overridden, the Viewer of an AMP page always reports as referrer window.document.referrer

This has the side effect that AMP-ANALYTICS always reports as referrer the previous visited site, and does not keep the original referrer of the current session when, for example, a user visits a second page.

At the end this means that a site like AMP Project, or a PWA that uses AMP Shadow, has almost all its traffic reported as referral.

I think that AMP could have the notion of session, like it already has for client in cid-impl.js, so we can do things like preserving the referral for the current session.

When Possible Feature Request analytics

Most helpful comment

correct. This issue will get fixed by that PR.

On Tue, Aug 22, 2017 at 1:17 PM, Rudy Galfi notifications@github.com
wrote:

It sounds like #11027 https://github.com/ampproject/amphtml/pull/11027
fixes this according to solution 2 described previously.

@avimehta https://github.com/avimehta @lannka
https://github.com/lannka Can you confirm this will also cover the "
https://google.com -> https://daily.spiegel.de ->
https://daily.spiegel.de/news/..." case (thereby making the
recommendation to add https://daily.spiegel.de to referral exclusion list
obsolete)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/7184#issuecomment-324139444,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAK8n6_7VnnX_yEVSdcSfA3o8xRytAQDks5sazdNgaJpZM4LsqMN
.

All 23 comments

I'm not sure I follow. Going to www.ampproject.org, then click on a link correctly updates the referer to the last visited page.

  1. Visit https://www.ampproject.org
  2. Click any link
  3. In Inspector: window.services.viewer.obj.getReferrerUrl()
  4. Click another link
  5. In Inspector: window.services.viewer.obj.getReferrerUrl()

From Viewer service perspective is correct, my concern is from an Analytics implementation perspective.

If I'm an user that arrives to from an organic search, like Google search, AMP-ANALYTICS should report Google as referral. If the user clicks a site link, the referral is still Google, since it's still the same session, and the source of that session is still Google.

My point here is that perhaps AMP-ANALYTICS should not use Viewer service to know the referral, but something more sophisticated that takes under consideration the current session.

Google Analytics Javascript implementation behaves like this

Oohhh, I think I get it now:

  1. Search for [Amp By Example]
  2. Click top result
  3. Click any link (should cause a top page navigation, not iframe navigation)
  4. In Inspector: window.services.viewer.obj.getReferrerUrl()

In this case, I get https://ampbyexample-com.cdn.ampproject.org/v/s/ampbyexample.com/?amp_js_v=6 as my referrer.

More information from Google Analytics regarding this here

What I think is that perhaps we could have a Session service, to keep a session based referral, so url-replacement-impl.js could use it instead of referral from Viewer

/to @avimehta

Maybe also @mkhatib and @dvoytenko can take a quick look. We need to override the DOCUMENT_REFERRER to somehow either give the correct referrer in a SPA type app, or let the SPA/app shell override the header. Something simple like return ampDoc.getRoot().referrer || viewerForDoc(this.ampdoc).getReferrerUrl() may be all we need to allow the shell to take control.

I think that it is the cause of self-referral that reference source exclusion configuration has not been done in analytics management page.

However, in normal analytics.js, there is a process that does not send a referrer if the current domain and the referrer domain are the same.
it is setting at "Always Send Referrer" option. https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference?hl=en#alwaysSendReferrer

If you do the same movement as analytics.js, when referrer's domain and canonicalUrl's domain are the same, referrer should not send.

/to @zhouyx could you please take a look?

Talked to @cramforce. One possible solution would be us providing a sessionId along with referrer.
But want to make sure how Google Analytics handle issues like this? Does the logic live in client side or server side? @avimehta

hello,

Trying to understand the issue and wanted to confirm if I am understanding it correctly. Here are the repro steps:

  1. Search for a news site on google to trigger the AMP carousel: like this
  2. Tap on first page.
  3. Swipe to the next page.
  4. Notice that both the pages send referer as google.com.

From analytics perspective, this shows up as a user visiting a page on a site after clicking on a link from google, going back to google and then clicking on another page again. Am I understanding the issue correctly so far?

If the understanding is correct, possible solutions would be

  • Do nothing. This is how behaviour would have gotten measured and reported before AMP carousel.
  • Somehow track that the referrer has been passed to a particular domain/page before and do not pass it again to the same page/domain while within the same carousel.
  • Modify Google Analytics and any other analytics processing to not restart a session but continue the session if users visits from the same referer back to back.

@avimehta The issue originally reported is not exactly that.

Imagine a site that has AMP as a canonical URL too, like Daily Spiegel, and you do the following steps from a Desktop:

  1. Perform this search in Google

  2. Follow the first search result to Daily Spiegel

  3. The first page view done by AMP-ANALYTICS will send the referrer google.com

  4. Inside Daily Speigel, follow the link to the first article

  5. The second page view done by AMP-ANALYTICS will send the referrer daily.spiegel.de

In Google Analytics, this path will be counted as two visits instead of one (because the referral change): Two visits of 1 PV / V each one, the first visit coming from google.com, and the second visit an auto-referral from daily.spiegel.de

The expected behaviour, from an Analytics point of view, is that it should be 1 visit of 2 PV / V, and google.com as referral.

@avimehta I think the issue is different from what you describe. The issue happens for pages NOT in viewer.
amp-analytics will only report referral as window.document.referrer. When navigating inside the site later on, it will report itself as the referral instead of google.com

hmm. @oliverfernandez Does excluding the referrer using "Referral exclusions" setting https://support.google.com/analytics/answer/2795830?hl=en not work for this case?

You will need to exclude referrals from https://daily-spiegel-de.cdn.ampproject.org for it to work.

Note that even after you do this, the visits will be currently attributed as two visits because the client ids in cache and your domain are different. We are working on a fix for that though.

@avimehta As @zhouyx says, we are excluding here the AMP viewer, that's why I put as an example a site that has AMP as canonical URLs. No AMP cache involved (so no daily-spiegel-de.cdn.ampproject.org domain)

The path would be

https://google.com -> https://daily.spiegel.de -> https://daily.spiegel.de/news/...

I think I understand it now. I recorded a video of the walk through (apologies about the artifacts in the video). I think you are talking about hits 3 and 4. Is that correct?

I see few ways of solving this:

  • You still use referral exclusion but specify https://daily.spiegel.de as the domain. This is not desirable because this will have to be done by all GA customers using AMP but fixes the issue today.
  • AMP supports a parameter that can be passed to DOCUMENT_REFERRER. If specified, variable will be set to '' when the referrer and page domains are the same. GA can then use that param and skip sending the param for all of the hits.

I need to verify that GA will work correct with the second solution but afaik, it should be fine with an empty &dr parameter.

Ping - any update on this?

I like the second sol, @oliverfernandez Does it sounds good to you?

I need to verify that GA will work correct with the second solution but afaik, it should be fine with an empty &dr parameter.

@avimehta can we confirm on GA's support?

@zhouyx Confirmed that GA works correctly.

Send following hits:

https://google-analytics.com/collect?v=1&t=pageview&tid=UA-XXX-Y&cid=1&dl=/

https://google-analytics.com/collect?v=1&t=pageview&tid=UA-XXX-Y&cid=1&dr=http://google.com&dl=/
https://google-analytics.com/collect?v=1&t=pageview&tid=UA-XXX-Y&cid=1&dr=&dl=/foo
https://google-analytics.com/collect?v=1&t=pageview&tid=UA-XXX-Y&cid=1&dr=&dl=/bar

As expected, the first hit created a direct session.
Second hit created a new session attributing it to google.
Third and fourth hits were added to the same session and the source was kept as Google.

I suggest we include the need to set up the referral exclusion in GA as part of the implementation Guides.
https://www.ampproject.org/docs/reference/components/amp-analytics
and Amp by Example.
I am happy to do the documentation - if someone can let me know the best place to add it. Ideally we can do this referral exclusion programmatically at some stage.

@zhouyx Yes, I think that not sending the referral is the way to go, since it's what the GA Javascript library is doing (they only send the referral in the first hit of the session)

@grantkemp. Setting up referral exclusion in GA is the first approach below. I'll say let's finalize on which approach to go first. @rudygalfi

1 You still use referral exclusion but specify https://daily.spiegel.de as the domain. This is not desirable because this will have to be done by all GA customers using AMP but fixes the issue today.
2 AMP supports a parameter that can be passed to DOCUMENT_REFERRER. If specified, variable will be set to '' when the referrer and page domains are the same. GA can then use that param and skip sending the param for all of the hits.

It sounds like #11027 fixes this according to solution 2 described previously.

@avimehta @lannka Can you confirm this will also cover the "https://google.com -> https://daily.spiegel.de -> https://daily.spiegel.de/news/..." case (thereby making the recommendation to add https://daily.spiegel.de to referral exclusion list obsolete)?

correct. This issue will get fixed by that PR.

On Tue, Aug 22, 2017 at 1:17 PM, Rudy Galfi notifications@github.com
wrote:

It sounds like #11027 https://github.com/ampproject/amphtml/pull/11027
fixes this according to solution 2 described previously.

@avimehta https://github.com/avimehta @lannka
https://github.com/lannka Can you confirm this will also cover the "
https://google.com -> https://daily.spiegel.de ->
https://daily.spiegel.de/news/..." case (thereby making the
recommendation to add https://daily.spiegel.de to referral exclusion list
obsolete)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ampproject/amphtml/issues/7184#issuecomment-324139444,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAK8n6_7VnnX_yEVSdcSfA3o8xRytAQDks5sazdNgaJpZM4LsqMN
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mrjoro picture mrjoro  Â·  3Comments

torch2424 picture torch2424  Â·  3Comments

radiovisual picture radiovisual  Â·  3Comments

mkhatib picture mkhatib  Â·  3Comments

akshaylive picture akshaylive  Â·  3Comments