Session stitching from non-AMP to AMP pages breaks if the AMP page contains multiple <amp-analytics> tags and the first tag uses CLIENT_ID() to set a variable.
Create a non-AMP page that uses Google Analytics via analytics.js.
Create an AMP page with <amp-analytics> utilizing CLIENT_ID() to set a variable, e.g.
"vars": { "vid": "CLIENT_ID(AMP_ECID_GOOGLE)" }. Append <amp-analytics type="googleanalytics"> just _after_ the first <amp-analytics> tag.
https://www.google-analytics.com/r/collect beacon for the cid reported.https://www.google-analytics.com/r/collect beacon for the cid reported.Here is a minimal example the demonstrates the issue.
cmphys.comcid set in cookie _gl and sent in beacon www.google-analytics.com/r/collectcid sent in beacon www.google-analytics.com/r/collectThe order of the <amp-analytics> scripts changes the outcome. If you put <amp-analytics type="googleanalytics"> before the custom <amp-analytics> tag, the correct cid is used in the googleanalytics beacon. This can be tested in the sample website above by clicking "Visit the AMP page with <amp-analytics type="googleanalytics"> first" from the non-AMP page.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
1911062056110
/cc @ampproject/wg-analytics
Thanks for reporting. That definitely sounds like a bug to me. @micajuine-ho let's dig deeper into this after the CCPA work.
Does it still happen when using the amp-analytics variable instead of the platform variable, e.g.:
{ "vid": "${clientId(AMP_ECID_GOOGLE)}" }
(...assuming that I'm reading the docs correctly and this is valid syntax in the first place)
Sorry for the delay @mbrowne . I just confirmed that the issue still exists when using { "vid": "${clientId(AMP_ECID_GOOGLE)}" }. The cid is incorrect the first time the AMP page loads (a new cid like amp-abcabcabc is generated), and then is correct on subsequent visits to the AMP page (the cid generated by the non-AMP page is used).
I am investigating now. We think it might have something to do with the caching implementation of scope client id. Will update when I make progress
Hi @mattwomple, it seems that GA creates a _ga cookie, but then the custom amp-analytics tag creates a scope cookie through CLIENT_ID(AMP_ECID_GOOGLE). Then the googleanalytics analytics tag searches for the _ga cookie, but falls back to our amp provided scope cookie.
So both the custom analytics and googleanalytics tag return the amp provided scope cookie.
An easy way to fix this would be in the custom config change CLIENT_ID(AMP_ECID_GOOGLE) to CLIENT_ID(AMP_ECID_GOOGLE,,_ga) so that it will use the _ga cookie.
What do you think about this?
@micajuine-ho Thanks for the analysis. Based on your suggestion, I tested the following vid values in the custom amp-analytics config:
CLIENT_ID(AMP_ECID_GOOGLE) - cid is incorrect in Google beacon and matches vid in custom beaconCLIENT_ID(AMP_ECID_GOOGLE,,_ga) - cid is correct in Google beacon and matches vid in custom beaconCLIENT_ID(abc) - cid is correct in Google beacon and distinct from vid in custom beaconCLIENT_ID(abc,,_ga) - cid is correct in Google beacon and matches vid in custom beaconI'll start with: Thanks, I appreciate the workaround. I can share the _ga cookie in my application. I still consider this a _workaround_ though.
Then the googleanalytics analytics tag searches for the _ga cookie, but falls back to our amp provided scope cookie.
Isn't the behavior of CLIENT_ID(x,y,z): 1. search for cid scope definition x, and then if not available, 2. search for value in cookie z? In this case, I can see why Google Analytics might short-circuit the cid lookup from _ga since AMP_ECID_GOOGLE had already been generated by the custom amp config. But, this leaves me a little confused about the inconsistent results when visiting the AMP page on the first visit and on the second visit. Using the sample site in my original post:
https://android.cmphys.com/temp/analytics-stitching-canonical.html to set _gacid sent to google-analytics.com/r/collect is new (starts with amp-)cid sent to google-analytics.com/r/collect is the same as in #1 (does not start with amp-)These inconsistent results when visiting the AMP page suggest there is still something not behaving correctly. Do you agree?
@mattwomple
Isn't the behavior of CLIENT_ID(x,y,z): 1. search for cid scope definition x, and then if not available, 2. search for value in cookie z?
It actually finds the cookie value z first if present, and then returns the scope value if present. Just a nuance, so essentially the same as you outlined.
These inconsistent results when visiting the AMP page suggest there is still something not behaving correctly. Do you agree?
On the first visit to _Visit the AMP page with
amp-..., and stored in cache and memoryOn the second visit:
amp-... cookie stored in the browser from earlier and returns itamp-...., and instead uses the correct GA..... I'm not sure if this is the correct behavior but this it why it's happening. I've been told that the optional cookiename parameter was added for backwards compatibility sake, and that this final design is not the most optimal.
@micajuine-ho Thanks for taking the time to spell out the actions taken by amp-analytics on each page load. This is super helpful and will be a great reference for me to refer back to. This code block explains a lot:
cid.externalCidCache_[scope] is not defined if the cookie exists. So, as you mentioned, each call to CLIENT_ID() returns a distinct value _if distinct cookies exist_, even if their scope names match.
I can see a few options for improvement. Let me know if you think any of these are worth consideration:
cid.externalCidCache_[scope] could be set in this block. That would at least make the cid used in each analytics config consistent on each visit to a page, whether the cookie exists or not. This means that distinct cookies names would be ignored, though. The cookie name associated with the first use of a scope name would be used in all subsequent configs.cid.externalCidCache_[scope] could be set in this block _and_ the condition in this block could be changed fromif (cid.externalCidCache_[scope]) {if (cid.externalCidCache_[scope] && existingCookie) {AMP_ECID_GOOGLE becomes reserved and tied to cookie name _ga. Documentation is updated emphasize that a single scope name must only be used with one cookie name, or unexpected behavior could occur.@mattwomple I think that the third option is the way to go, as it involves no changes to the runtime, which could have unintended side effects to existing users.
Additionally, looking through the our list of vendor configs, there are only 4 analytics vendors who use the macro CLIENT_ID(AMP_ECID_GOOGLE,,_ga) (including googleanalytics and gtag). No other vendors use the the optional cookie name parameter. To me this indicates that it is the intended behavior for these vendors to use the _ga cookie created by googleanalytics.
@zhouyx What are your thoughts on how to proceed?
Read the thread. Thank you both for the detailed explanation.
Can I summarize the issue to how should CLIENT_ID(scope, , opt_cookieName) behave.
Current order:
cachedCid with same scope generated by other tag > cid from opt_cookieName > cid generated from scope
This caused inconsistency with different tag orders and even with multiple page load.
My understanding is that the cachedCid is a must because we need to keep the value same for all. And then we want to stick to the existing cookie if possible.
The order works correctly as long as all tags pair the AMP_ECID_GOOGLE along with _ga, which in my opinion should always be the case if one wants to use the GA cid.
@lannka Do you recall why we need a scope and an opt_cookieName
Also add @zikas for ideas.
CID API accepts two scopes. You can think of them as a required off-origin cookie scope (AMP cache), and an optional origin cookie scope.
We started with only one cookie scope, which is the AMP_ECID_GOOGLE here. The optional origin cookie scope was introduced because AMP wants to leverage the existing cookie on the origin (You example also showed this). When the optional origin cookie scope is not defined, the required scope will be used to read and store cookie on the origin instead.
In the GA case, they provide both the AMP_ECID_GOOGLE and _ga because of two reasons.
_ga is the cookie name for all non AMP pages, it would be impossible to change _ga to AMP_ECID_GOOGLE. Even we do so, we鈥檇 lost the benefit of getting existing client id on publishers origin.AMP_ECID_GOOGLE to _ga as you proposed is a good option. However the AMP team didn鈥檛 proceed with this because AMP_ECID_GOOGLE had been used for quite some time. Switching to a new scope name meant we would lose all the existing data, and would cause a one time reset of every publishers GA dashboard.Since we鈥檝e agreed that the provided optional cookie name should override the scope value. I think it makes sense to always return the value from opt_cookieName if it exists.
But still, in my understanding the scope and opt_cookieName come as a set, and should always be set together if one wants to leverage the id from a specific vendor.
PR #26130 is a reasonable approach to this problem. Cookie first, scope second, else generate new cookie and scope. From my earlier comment, the updated behavior would be:
CLIENT_ID(AMP_ECID_GOOGLE) - cid is correct in Google beacon and distinct from vid in custom beaconCLIENT_ID(AMP_ECID_GOOGLE,,_ga) - cid is correct in Google beacon and matches vid in custom beaconCLIENT_ID(abc) - cid is correct in Google beacon and distinct from vid in custom beaconCLIENT_ID(abc,,_ga) - cid is correct in Google beacon and matches vid in custom beaconImportantly, these results are consistent in both analytics-stitching-amp-googlelast.html and analytics-stitching-amp-googlefirst.html.
Most helpful comment
Thanks for reporting. That definitely sounds like a bug to me. @micajuine-ho let's dig deeper into this after the CCPA work.