While debugging a slow running Gatsby build with a Netlify support person, they indicated to me that cache-busting urls aren't required on Netlify, and in fact artificially slow down the builds.
Say you're building 1000s of static pages, and each one references the same CSS/JS assets, if the url to any of those assets change, then even if there are no changes to the actual static page _content_ it invalidates the Netlify build cache for every page. Every one of those pages has to go through Netlify's build processing, which can take 10s of minutes if there are a lot of pages. Alternatively, if the text of those pages remained stable between builds (assuming no actual content changes of course) the build time would be dramatically reduced.
Some of the things the Netlify person said:
I understand that some content changes but I find it hard to believe that every page needs its content changed and suspect that the likely cause of that is an asset fingerprinting plugin from your build framework that changes a filename for a css or js file that is included in every file. Worth looking into if you want to make the build really fast :)
While we can do that, I think a BETTER approach would be to make sure you don't change EVERY page with EVERY build. As it stands you have updated thousands of pages with a build (you can see it in this example screenshot), and this is an antipattern - if the content doesn't change (by checksum), we don't re-upload or re-process, which would save you time regardless of processing settings.
Some Netlify articles discussing their caching strategy:
Prior discussion:
I'm no caching expert, just throwing this out there! But am assuming the cache invalidation strategy they discuss above validate the notion that cache busting urls aren't required.
If turning off cache busting urls on Netlify builds turns out to be a sensible notion, would a config option on gatsby-plugin-netlify
be appropriate?
It sounds like the problem is Netlify's build processing (no idea what they mean by that). Gatsby is designed so you can build the page and immediately send it to the CDN. But in general, I'm not really sure what they mean.
It'd be quite rare that every page changes on every build.
Can you post that logs from a build? I'm curious what is taking time.
Gatsby's part of the build was taking a similar amount of time as it does on my local system. About 25s for about ~2800 pages, which is fine. Here's a snippet of the logs that happen afterwards:
3:12:08 PM: Build script success
3:12:08 PM: Starting to deploy site from 'public'
3:16:59 PM: Starting post processing
3:17:17 PM: Finished processing build request in 7m30.83631574s
3:23:13 PM: Post processing done
3:23:13 PM: Site is live
So that is about 5 mins to upload 150mb of files, and then a further ~7 mins for Netlify to do its "post-processing". I didn't have any actual post-processing options turned on in the Netlify UI, so I gather the post-processing above is part of Netlify's processing for caching and delivery to its CDN etc.
In the deploy summary:
All I'm going on here is that the Netlify person identified the sheer number of changed assets as the culprit, and if cache busting urls weren't used (which apparently aren't required) then this quantity would reduce dramatically for most builds (that didn't touch the content of the pages).
This was surprising and confusing to me too, which is why I thought I'd raise it. Quite willing to accept the problem might be with their build processing, and/or I've been given a bum steer.
Hi @jedrichards ,
I work at Netlify, and I just wanted to add a bit more context here. In case anyone is unaware, we process every NEW file in your deploy. During this processing we check for insecure links in html files, we look for forms for netlify form handling, image and other assets that we may process, etc. Most of this can be disabled which speeds up the processing, some things, like form processing, can only be disabled if you contact support (we expect to make it possible to disable from the UI at some point in the future, but there's no timeline for this).
I mentioned we only do this for new files, meaning if a file is unchanged between deploys, we DON'T process it again, and we don't re-upload it either since we maintain blobs of uploaded files in our origin servers. This means if you have a deploy with 10,000 files, but only 2 of them are new pages and none of the others changed, we don't do anything with the others, we only process the 2 new pages and upload those 2 pages, and otherwise create references to the existing files so when we push the deploy to our CDN, the nodes know which files to serve.
The issue that Jed is describing is that if you are doing cache busting, and that cache busting is changing the contents of all of your assets, that means we will end up re-processing ALL of the files in every single deploy. This will slow down your deploy since it means you can't take advantage of the optimizations we have. If all you do is rename a file, this shouldn't break things since the SHA is only on file contents and not metadata. But if the cache busting makes even a minor change to the file contents then we will end up reprocessing it and re-uploading it.
Thanks @futuregerald!
I think it would be interesting to see exactly _what_ is changing in your files between builds — perhaps there is something meaningless we're changing in files that could be avoided. If you want to look into that, please report back with your findings!
But as I don't see any immediate next actions for this, I'll close out this issue.
@KyleAMathews @jedrichards @futuregerald I am experiencing the very same when deploying to Netlify, and I have a working demo that probes that changing only one file changes ALL files between builds.
The root of the problem is webpack-runtime-[HASH].js
, which changes in every build and is included in all pages.
Please repeat the following set of commands to reproduce the problem:
1) Build a gatsby site
wget https://github.com/asilgag/gatsby-benchmark/archive/master.zip
unzip master.zip
cd gatsby-benchmark-master/markdown-8000/
npm install
gatsby build
2) Save the generated "public" folder, make a change only in one file, and build it again:
mv public/ public.first
echo `date` >> src/pages/articles/2019/01/01/test-001.md
gatsby build
3) Compare previous and current public folder, and you will see that ALL pages have changes
diff --brief -Nr public.first/ public/
Output from diff:
Files public.first/404/index.html and public/404/index.html differ
Files public.first/404.html and public/404.html differ
Files public.first/articles/2019/01/01/test-001/index.html and public/articles/2019/01/01/test-001/index.html differ
Files public.first/articles/2019/01/01/test-002/index.html and public/articles/2019/01/01/test-002/index.html differ
Files public.first/articles/2019/01/01/test-003/index.html and public/articles/2019/01/01/test-003/index.html differ
Files public.first/articles/2019/01/01/test-004/index.html and public/articles/2019/01/01/test-004/index.html differ
Files public.first/articles/2019/01/01/test-005/index.html and public/articles/2019/01/01/test-005/index.html differ
Files public.first/articles/2019/01/01/test-006/index.html and public/articles/2019/01/01/test-006/index.html differ
Files public.first/articles/2019/01/01/test-007/index.html and public/articles/2019/01/01/test-007/index.html differ
Files public.first/articles/2019/01/01/test-008/index.html and public/articles/2019/01/01/test-008/index.html differ
Files public.first/articles/2019/01/01/test-009/index.html and public/articles/2019/01/01/test-009/index.html differ
Files public.first/articles/2019/01/01/test-010/index.html and public/articles/2019/01/01/test-010/index.html differ
Files public.first/articles/2019/01/01/test-011/index.html and public/articles/2019/01/01/test-011/index.html differ
Files public.first/articles/2019/01/01/test-012/index.html and public/articles/2019/01/01/test-012/index.html differ
Files public.first/articles/2019/01/01/test-013/index.html and public/articles/2019/01/01/test-013/index.html differ
Files public.first/articles/2019/01/01/test-014/index.html and public/articles/2019/01/01/test-014/index.html differ
Files public.first/articles/2019/01/01/test-015/index.html and public/articles/2019/01/01/test-015/index.html differ
Files public.first/articles/2019/01/01/test-016/index.html and public/articles/2019/01/01/test-016/index.html differ
Files public.first/articles/2019/01/01/test-017/index.html and public/articles/2019/01/01/test-017/index.html differ
Files public.first/articles/2019/01/01/test-018/index.html and public/articles/2019/01/01/test-018/index.html differ
Files public.first/articles/2019/01/01/test-019/index.html and public/articles/2019/01/01/test-019/index.html differ
Files public.first/articles/2019/01/01/test-020/index.html and public/articles/2019/01/01/test-020/index.html differ
Files public.first/articles/2019/01/01/test-021/index.html and public/articles/2019/01/01/test-021/index.html differ
Files public.first/articles/2019/01/01/test-022/index.html and public/articles/2019/01/01/test-022/index.html differ
Files public.first/articles/2019/01/01/test-023/index.html and public/articles/2019/01/01/test-023/index.html differ
Files public.first/articles/2019/01/01/test-024/index.html and public/articles/2019/01/01/test-024/index.html differ
...
@futuregerald Thanks for the info! Can you confirm whether Netlify's cache invalidation strategy means that cache busting urls are optional? E.g. when a site is redeployed and a user revisits the site, they'll see latest assets whether cache busting urls have been used or not?
@jedrichards, I also work at Netlify and, yes, this is exactly correct. Because of our cache invalidation, the cache busting URLs are not only optional - they are an anti-pattern when site is hosted with Netlify.
Now, the hash in the filename is a great idea for many places where Gatsby sites are hosted, but at Netlify it is "reinventing the wheel" so to speak.
At Netlify, we recommend not including hashes in file/asset names because our service is doing a similar thing using an etag
HTTP response header.
So there is still a hash, but it isn't part of a filename. It's in the response headers instead (and automatically managed by our CDN).
If the asset has changed, this etag
headers will also change. This forces the a fresh download by the web browser of the new version anytime a file/asset changes.
This happens even when the filenames are unchanged but the file content is different.
This also has a nice side effect that, when files do not change between deploys, a new request for the asset will return a 304 (not modified).
There is a blog post with more details (written by the head of our support team) here:
https://www.netlify.com/blog/2017/02/23/better-living-through-caching/
We too are experiencing long deploy times on netlify due to this.
Our 5k pages site takes more than 15 minutes for processing and uploading.
We are following #12066 by @asilgag hoping for some news on this.
@federicobadini
We've created a plugin that removes the cache-busting JavaScript filenames:
https://www.npmjs.com/package/gatsby-plugin-remove-fingerprints
This has improved some of our builds but 10 minutes. For example, we applied it to a site that had about 4800 generated files uploaded per build that took around 14 minutes to upload. After adding this plugin deploys were reduced to about 6 minutes.
It appears that Netlify is smart about their caching and diffing and Gatsby's cache-busting filenames were a detriment to this process.
I would also recommend adding this to gatsby-plugin-netlify as a option.
Sounds fantastic. I'll check it out immediately.
@overlordofmu
when files do not change between deploys, a new request for the asset will return a 304 (not modified).
Right, but for most requests, that's the majority of the delay, so this method doesn't really gain you much in terms of performance.
I guess this is why Netlify's own sites, such as https://www.netlify.com/blog/, use unique URLs and long max-ages, against the advice you're giving here 😄.
Here's some general advice on caching https://jakearchibald.com/2016/caching-best-practices/
Just to add to this, I came across this exact issue (Netlify / Gatsby) - and some experimenting found that even with a brand new site from the starter template, running successive gatsby build
generates commons.js files with different hashes, but the exact same content when doing a diff.
Other chunks files remain the same - same hash, same content. Just the common one changes
Most helpful comment
@federicobadini
We've created a plugin that removes the cache-busting JavaScript filenames:
https://www.npmjs.com/package/gatsby-plugin-remove-fingerprints
This has improved some of our builds but 10 minutes. For example, we applied it to a site that had about 4800 generated files uploaded per build that took around 14 minutes to upload. After adding this plugin deploys were reduced to about 6 minutes.
It appears that Netlify is smart about their caching and diffing and Gatsby's cache-busting filenames were a detriment to this process.
I would also recommend adding this to gatsby-plugin-netlify as a option.