Amphtml: Font file binary contents being modified by CDN

Created on 21 Nov 2019  Â·  18Comments  Â·  Source: ampproject/amphtml

What's the issue?

AMP page fails to load fonts because they are being possibly corrupted by the CDN. The original page will load a 13kB font named icomoon.ttf while the CDN page will load a 15kB of the same file that doesn't really work.
Analyzing the binary contents of both files, it looks like the contents are being enconded as UTF-8 at some point due to the difference in size being lots of bytes from the original file being replaced by the hex representation of � (0xEFBFBD).

How do we reproduce the issue?

Compare the size and contents of the two files being served linked bellow.
The original one: https://www.collinsdictionary.com/external/fonts/icomoon.ttf?1pqdoj&version=4.0.17 and you will get these headers:

accept-ranges: bytes
access-control-allow-origin: https://www.collinsdictionary.com
age: 248062
cache-control: max-age=2592000
cf-cache-status: HIT
cf-ray: 539216e869a5eeee-CDG
content-length: 13292
content-type: application/octet-stream

The one served by the CDN: https://www-collinsdictionary-com.cdn.ampproject.org/r/s/www.collinsdictionary.com/external/fonts/icomoon.ttf?1pqdoj&version=4.0.17 and you will get these headers:

accept-ranges: bytes
access-control-allow-origin: *
alt-svc: quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000
cache-control: private, max-age=2591999
content-disposition: attachment
content-length: 15671
content-type: application/octet-stream

You can also check the final page at (with the broken font) here https://www.google.com/amp/s/www.collinsdictionary.com/amp/english/like_1 vs the working one here: https://www.collinsdictionary.com/amp/english/like_1

What browsers are affected?

All browsers

Which AMP version is affected?

N/A

Soon Bug caching

Most helpful comment

I believe this is due to an interaction between the Google AMP Cache and the Cloudflare AMP Real URL worker.

When a request is made to an AMP Real URL-enabled site with SXG + crawler headers, then the response is sent through a UTF-8 round-trip. AMP documents are UTF-8 so this is fine, but for non-UTF-8 resources (like fonts), this results in corruption.

Currently, the Google AMP Cache sends SXG request headers if either:

  • It's a cache miss and the request is for an AMP document
  • It's a cache hit (async update)

I'm changing the cache hit behavior to match the cache miss behavior -- only send for AMPHTML URLs. I think this will fix it, but there's a lot of moving parts so I'll reserve certainty. The release typically takes a couple of weeks; I'll update this bug once that's happened.

In the meantime, a workaround would be to put fonts on a separate domain that does not have AMP Real URL enabled. I know that's super inconvenient, but that should work immediately anyway.

All 18 comments

@jgluk

Is this this something that broke recently or has it always been broken?
If the first, when did you notice it was broken?

@jgluk We first noticed problems with the fonts around August 1st. It might also be relevant that was also around the time we activated AMP Real URL at Cloudflare.

I believe this can be worked around by changing the content-type header field value to font/ttf or application/font-ttf or application/x-font-ttf.

Internally this is http://b/139003573.

@ramongtx Thanks for the anecdata that it correlated with your AMP Real URL launch. On the Google AMP Cache side, I can't think of any changes we made in this area -- plus most of our changes were many months prior to August. So my first suspicion is that activating AMP Real URL caused your origin response headers to change? But I will also search our code history for relevant changes around this date.

@maciejmackowiak Did you fix it after posting this? When I change the cachebust param on the AMP Cache URL, it appears to have the correct size.

@twifkak thanks for quick replay,

I'm aware that changing cachebust "solves" the issue but the fun part is we didn't change anything on our end to fix it. So that's the problem - why did it happen? And how to avoid it in the future?
We didn't touch icon font in some time so the issue looks random.

Also, changing the cachebust query param would not even be a good quick fix as it would also require invalidating the AMP Cache of every single AMP page for androidpolice.com.

The result is that their icon font is corrupted on the cache, and icons don't display.

I'm seeing the issue with AP fixed now in Incognito as well as a desktop browser, but my main Chrome for Android, it's still not loading the font right. I bet it's the local browser cache.

So yeah, this happened the 2nd time now, and it's unclear why - it seems something on the CDN side corrupts the file.

@maciejmackowiak Did you happen to save the corrupted versions for examination here? Could you post them and the originals in some immutable place so they could be diffed later by any interested parties?

@morsssss @westonruter Do you have any insight into what may be happening here? I wouldn't want to create a cronjob to dump CDN caches for the fonts every 30 minutes, which I think would be the correct solution here instead of changing the cachebust param.

@twifkak We are also using CF Real URL, but I believe the first time it happened was before AMP launch and to our dev version of the site, which would have been before CF Real URL was even available to us.

Just looked into our ticket history. CF told us AMP Real Url was turned on for androidpolice.com on July 24.

Hi Artem,

We turned on AMP Real URL for your site androidpolice.com on July 24. Sorry this wasn't communicated to you earlier. We don't have a good way of emailing our Enterprise customers directly and I will try to fix this as soon as possible.

On July 26, I created the original ticket about dev:
image

One thing I'm not clear about is if the AMP Real URL feature applies to just the main domain or subdomains too. If it's subdomains, the timing is indeed quite suspicious.

But at the same time, how would it explain what we're seeing?

@archon810 , I think we've got something for you here. Please stay tuned...

For further investigation those are the corrupted and original files:
corrupted.fonts.zip

Maybe it will help somehow.

Hi all - thanks for the reports. I believe I've diagnosed the issue (has to do with SXG), and have an idea for a fix. I'll run some tests & have an update tomorrow.

I believe this is due to an interaction between the Google AMP Cache and the Cloudflare AMP Real URL worker.

When a request is made to an AMP Real URL-enabled site with SXG + crawler headers, then the response is sent through a UTF-8 round-trip. AMP documents are UTF-8 so this is fine, but for non-UTF-8 resources (like fonts), this results in corruption.

Currently, the Google AMP Cache sends SXG request headers if either:

  • It's a cache miss and the request is for an AMP document
  • It's a cache hit (async update)

I'm changing the cache hit behavior to match the cache miss behavior -- only send for AMPHTML URLs. I think this will fix it, but there's a lot of moving parts so I'll reserve certainty. The release typically takes a couple of weeks; I'll update this bug once that's happened.

In the meantime, a workaround would be to put fonts on a separate domain that does not have AMP Real URL enabled. I know that's super inconvenient, but that should work immediately anyway.

Hi all, the fix is rolled out to prod and I believe it's working. Please reopen if you find a counterexample.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zhouyx picture zhouyx  Â·  103Comments

jpettitt picture jpettitt  Â·  42Comments

zhouyx picture zhouyx  Â·  60Comments

ericlindley-g picture ericlindley-g  Â·  60Comments

darobin picture darobin  Â·  48Comments