Seeing a lot of broken CORS requests in some of our content due to the publisher subdomain being one-way hashed by the Google AMP CDN.
For example, this AMP doc:
According to the encoding documented in the Google AMP cache docs (https://developers.google.com/amp/cache/overview), the expected Google AMP cache subdomain for our example should be:
But this is being redirected to this one-way hashed subdomain:
According to the docs this is done in certain cases, but for unclear reasons:
Subdomains created by the Google AMP Cache will be human-readable when character limits and technical specs allow, and will closely resemble the publisher's own domain.
Where technical limitations prevent a human readable subdomain, a one-way hash will be used instead.
Not clear why this domain is subject to "technical limitations". Bug?
Anyway, in this example, the Origin is now presented as 7ai3pvhopvh4fo4mm3sukbcq2wp6ubqw5ndoini2u3d6epbxubaa.cdn.ampproject.org, which does not pass our CORS header validation as specified by the AMP docs.
So how should publishers be validating these one-way hashed publisher subdomains?
Unassigning for amp cache to look at.
Hello!
The AMP Cache provides a URL API that can give you the AMP URL and AMP Cache URL for a specific URL. See this reference: https://developers.google.com/amp/cache/reference/acceleratedmobilepageurl/rest/
You should be able to use that API to determine the correct host to use in the CORS header that will pass validation.
Would this be suitable for your case?
Apologies, I spoke too soon because I didn't realize the hashed URL was as a result of a redirect and the AMP URL API also returned the non-hashed version. Let me investigate more what's triggering the redirect.
The technical limitation that is causing the hashed URL is step #1: "Converting the AMP document domain from IDN (Punycode) to UTF-8." Since this fails, we redirect to the hashed subdomain. I believe this is working as intended.
The bug is that currently the AMP URL API does not reflect the hashed URL; it returns the non-hashed URL that you originally predicted. We are going to fix this issue so the API should also returned the hash URL.
Once the AMP URL API is fixed, you should be able to use it to correctly predict what to use in the Origin header for CORS.
Tracked internally as 111651862
Thanks @jgluk! A few follow-up questions:
The technical limitation that is causing the hashed URL is step #1: "Converting the AMP document domain from IDN (Punycode) to UTF-8." Since this fails, we redirect to the hashed subdomain. I believe this is working as intended.
It's not clear to me why this conversion is failing in this example, since the conversion shouldn't result in a change as far as I can see (the domain isn't using any extended unicode chars.) Can you help me understand why this one is failing?
You should be able to use that API to determine the correct host to use in the CORS header that will pass validation.
So since this is a one-way hash, and hitting this API during a user request is not realistic, a publisher would need to validate all their domains against this API before they start publishing on them and keep a whitelist of hash domains to validate the Origin against. This requires the algo (both the criteria for hashing as well as the hash output) will remain stable - is that a safe assumption?
This should all be very well documented in the AMP CORS docs. Also, could this hashing be somehow surfaced through the Google Search Console for AMP, to give publishers better insight into the fact their CORS request domains may be subject to hashing?
@jasti Can this be tracked as an AMP CORS docs issue in addition to the AMP cache bug referenced by @jgluk? I think AMP needs to provide more documentation re: AMP CORS validation in the case where Google CDN decides to hash the publisher subdomain, though it's unclear to me at this point what the correct recommendation should be.
@src-code I looked more into the specifics of that URL and why it's failing. In the AMP URL format, there are 4 steps to convert the domain into an AMP subdomain, part of which involves IDN/Punycode. IDN does not accept hyphens as the third and fourth characters, presumably because only the prefix "xn--" is allowed for internationalized domains. Therefore, es--us-vida--estilo-yahoo-com ends up being rejected due to the "es--" prefix.
@pbakaus what do you think about improving our AMP Cors docs mentioned above? https://github.com/ampproject/amphtml/issues/16779#issuecomment-412693578
@jasti I think that would be worthwhile, but I'm super confused about all of this. I think we need an engineer to write this up first, and then we can mold it into the docs.
Hi @pbakaus - In summary:
Right now, the AMP CORS doc tell publishers to validate the Origin header as https://<publisher's subdomain>.cdn.ampproject.org, but this is incomplete. Google CDN may, at its discretion, redirect users to a CDN URL where the publisher subdomain is hashed into a string that isn't human-readable, which will result in CORS requests being made with an Origin that doesn't pass the validation described above.
This hashing is not mentioned anywhere in the AMP docs, so it is easy for publishers to be unaware it is happening and breaking their CORS requests when their docs are served from the Google AMP cache.
My understanding is that publishers would therefore need to verify each of their domains on the Google CDN to see if they're being redirected to a hashed domain, and if they are, whitelist those hashed subdomains in their CORS header validation along with the standard publisher subdomain patterns. (However it's unclear to me if this hash will be stable enough to be whitelisted by the publisher like this.)
It's also a bit of a pain right now for publishers to test and/or discover that their domain is being hashed like this - it'd be nice if this could be surfaced more easily somehow.
Hope this helps!
thanks, this helps! assigning to Crystal to prioritize.
Closing bug as this is an AMP Cache specific. That project is tracking internally.
Hi @Gregable - is the AMP CORS docs update being tracked internally as well? I was thinking this bug was now tracking the doc update task? Or do we need a new issue to track the doc updates?
I misunderstood what's left of this issue. Reopening and updating.
The AMP URL API should be working again as expected so it should help with the CORS issue in the meantime.
This is a high priority issue but it hasn't been updated in awhile. @CrystalFaith Do you have any updates?
This is a high priority issue but it hasn't been updated in awhile. @CrystalFaith Do you have any updates?
This is a high priority issue but it hasn't been updated in awhile. @CrystalFaith Do you have any updates?
This is a high priority issue but it hasn't been updated in awhile. @CrystalOnScript Do you have any updates?
@CrystalOnScript for updates on this P1 issue.
@jgluk - Does the update to the AMP URL API change this URL hashing issue in any way? Is a documentation update still needed?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.