Seems like it would be nice if Gatsby would clear out the public folder before generating new assets so that we are guaranteed a fresh and accurate copy of the site on each gatsby build
.
The specific case where I ran into this and wished it worked this way was when I looked and found an extraneous manifest/index.html
folder/file in the site, but couldn't figure out why. I realized I had added a manifest.json
file for Android icon support, so I then moved it out but the manifest folder remained until I blew it away manually.
Following the philosophy of React's model of rebuilding structures from scratch every time, I think this would lead to less surprises and old/orphaned pages.
Curious what other's thoughts are?
This would be a breaking change as some people might have adopted workflows depending on the public folder being there. Adding a "static" directory at the same which has files that get copied over verbatim might be a good plan https://github.com/gatsbyjs/gatsby/issues/450
FWIW, the suggestions in #450 are exactly what I followed, and where my manifest.json file, amongst others, currently lives.
@jlyman ah cool! Yeah, it seems like a really common need and a static
folder a really common convention so we should bless that in core.
I think that's especially important because some people (like me) may like .gitignore public.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Just to weigh in in case this is still under consideration...
I would tend to agree with the original poster - in addition the the benefits of of eliminating orphaned pages, a clean public folder also allows for a one-step build process, e.g. gatsby build && aws s3 cp public/ my-s3-bucket
, without pushing an ever-growing bundle to a remote server.
We ended up deleting only html files on build https://github.com/gatsbyjs/gatsby/issues/1811
@jmknoll perhaps use an s3 client which does a sync upload?
I just ran into this and think it is worth discussing again.
In my opinion the public folder should contain generated code and generated code only, so that it is safe to add it to .gitignore
. Changes to the public folder have to be overwritten during the next build.
As was already mentioned, the static
folder exists to copy static files (like maybe robots.txt
, or a favicon) to the final build without modification.
Because gatsby doesn’t clean up the entire public folder (only HTML & CSS files), all the generated JS files for components and pages and whatnot stay there and the folder grows and grows over time. Using an upload tool that syncs a local folders to a remote is therefore fairly useless as nothing ever gets deleted. The remote server grows and grows. Yes, technically it is in sync with the local folder, but the local folder contains stale files.
When running this on a CI system this is not an issue as you start with a clean build every time, but it is an issue when running builds locally and then deploying (syncing) from there.
I agree, that this feature would be nice to have, but it is also very simple for deverlopers to add:
// gatsby-node.js
const rimraf = require('rimraf');
const PUBLIC_FOLDER = `${__dirname}/public`;
exports.onPreBuild = () => {
// empty /public folder
rimraf.sync(PUBLIC_FOLDER + '/*');
};
I'm kind of thinking of it from the perspective of the principle of least surprise. There's already starting to be an established pattern for this (only HTML files are deleted), but it seems like, without consulting the docs, this might be a "surprising" outcome.
Seems like the principle of least surprise would just be that a static site generator would generate a fresh and whole copy of the static site on each run, without selectively removing some files and not others?
I agree with @baba43 that it's an easy feature to add, but it still was a surprise to learn it didn't operate that way to start. My two cents.
Keep in mind that deleting public
directory will cause regenerating responsive images when using gatsby-ransformer-sharp
with every build. Right now plugin can skip that if image is already there.
Totally shooting from the hip here, but maybe @pieh's case of cached resources would be a good situation where a separate .cache
folder might be appropriate, similar to Webpack's caching strategy? Downside would be increased storage requirements for the building computer, but you might be able to resist regenerating all images or other resource-intensive operations while still maintaining a prod-ready public
directory.
I agree with @jlyman. While it certainly is easy to implement, it’s not how code generation tools are generally expected to behave.
The case of image transformation brought up by @pieh is certainly interesting. Maybe gatsby could delete everything by default but expose an API for plugins to mark certain output folders as „sticky“ during the build? These plugins would then have to implement a better algorithm to remove files that are no longer needed though.
I‘m not necessarily arguing for a „dumb“ rm -rf
, but instead would mainly like builds to be more predictable. After a successful build the public folder should mirror the site exactly, not more and not less.
We could also add some logic of removing stale files (we would need to keep track what files are created during the build). But there are also some edge cases where having stale files is not a bad thing - consider situation:
User visits your page and is browsing through it, while he is on your page you push update with new files and deleting old ones - now user who keeps browsing will have broken site, because files referenced in previous build that he has loaded are not there.
Interesting, I hadn't thought about the publish change mid-browsing. I suppose it depends on if the hash of the file changes or not (haven't looked, don't know). If the page is gone because it was purposefully removed, we wouldn't want the user to be able to continue browsing to it. But if the page is legitimately still there but the user can't navigate to it anymore because the hash on the filename has changed that's less optimal.
But I guess so is keeping around what amounts to any version of the site you've ever built, in an ever expanding folder. Trade-offs both ways.
Hash changes when content changes - that's cache busting mechanism to avoid using cached stale data. Just so we are clear - I'm not against doing any work on this, but we have to be smart about it. There are for example ways to reload users browser when we push update (simplest would be periodically polling very small file with version of current site, and if it changes we can trigger browser reload).
I just ran into this and agree with all the reasons already stated why the public folder should be cleared.
From what I read, the reasons to keep previous files around are backward compatibility with workflows that depend on this behavior and the mid-browsing-deploy issue.
I would argue that a workflow that relies on local copies on the machine of a single developer is inherently error prone: switching machines, version control issues that require re-cloning or having colleagues will break this anyway. Migration is simple: don't delete the previous files on the server if that is important.
The current solution to the mid-browsing-deploy issue (keeping old files around) isn't used consistently in practice, as can be seen by starters deleting public
before each build. So I would see this as a separate issue that requires a more targeted solution, like the one @pieh suggested. If that's a problem worth solving, then that should be it's own issue.
@zimmi could you share your workflow and _why_ you need this folder cleared?
We recently introduced a gatsby clean
command that when invoked, will clear out the .cache
and public
folders.
Generally speaking, keeping the public folder around (for local builds) and adding to .gitignore
is a good practice. It will make your local builds faster, because images and other content is _cached_ there so that we don't regenerate files unnecessarily, which can be a heavy operation.
Upon deploying, it's reasonable that most solutions (e.g. a CI service, Netlify, etc.) will and should be given a clean slate environment (i.e. no cache, public, etc. folders) and so the issue is a moot point. Even working from a non-clean slate, I'm not entirely sure _why_ you need the public directory cleared? Are you running into issues?
From Gatsby's perspective, we don't really have any plans to clear out the public folder on build. Happy to hear your opinions as to why this practice should change, though.
Thanks for chiming in!
@DSchau thank you for the explanation, much appreciated. I also didn't know about gatsby clean
, nice!
To be honest, I didn't run into any actual problems with this yet, I was just surprised by the folder growing after deleting code and running build
again. For me it's just another thing to remember when building for production, as I have no CI system in place yet. So my workflow would be:
develop
and write code until site looks goodbuild
serve
public
and run build
again (possibly also serve
out of paranoia)_So I find it easier to just do a clean build
than trying to remember if I deleted public
or not.
Keeping the files around in public
for performance reasons makes sense when this is done often, but isn't that what develop
is for? I usually only run build
when I'm reasonably sure that the site is done, but maybe there are use cases I'm not aware of?
Like I see it, build
could also be called release
, then it would be obvious that the folder should always be clean.
I hope that clarifies my standpoint a little.
Edit: I just noticed that running develop
also creates the public
folder. I would have expected all development artifacts to be inside .cache
and only the actual final build output to be inside public
. Does that mean I have to delete public
for a clean release every time, or will those "development-files" be cleaned up before running build
?
@zimmi I think I'd recommend just shifting your perception on this a little bit, if you're able!
For me it's just another thing to remember when building for production, as I have no CI system in place yet
That's the thing--you don't have to remember to do this, and there's really no reason to--especially because you haven't/likely won't run into issues.
Static content is _cheap_ (you can host gigabytes of data for pennies), so a growing public folder (from old builds) is generally not a huge concern.
Additionally, to respond to this question:
Keeping the files around in
public
for performance reasons makes sense when this is done often, but isn't that whatdevelop
is for
Well - sorta. But you also get the benefits of caching if the public folder is already populated from develop
. In other words, if you've already been using gatsby develop
a build will be faster.
Like I see it,
build
could also be calledrelease
, then it would be obvious that the folder should always be clean.
Maybe! If this is a huge concern for you, you can do something like this, but I'll note that I wouldn't really recommend doing this
{
"scripts": {
"prerelease": "gatsby clean",
"release": "npm run build"
}
}
Hopefully this is helpful! Gatsby is flexible enough here to do whatever you want--but I'd encourage you to try and avoid removing the public folder and manually deleting things when you don't have to. Let us give you faster builds!
Well, I guess it's just a philosophical difference then. I would rather have reproducible builds that only depend on the inputs than save a couple seconds / minutes on build time. I still believe that's the better default. That can always be optimized when the page grows so large that it matters, Gatsby is really fast (for me at least).
One potential issue is deleting sensitive information that was accidentally published. After a rebuild it wouldn't be accessible anymore by browsing, but it would still be accessible online. Much the same situation as API-keys in repositories on GitHub: you need to purge the history.
I also vote for a principle of least surprise.
public should be always fresh generated
gatsby clean
not solve this because it clears also the .cache
I vote for an opt-out of the public clean if some like to preserve some files.
This caused me some issues, as we have several static site generators (moving to gatsby) and they all think they own public
folder. So gatsby was deleting files it didn't own. This is one workaround:
gatsby-node.js
const fs = require("fs-extra");
const tmpPublic = "public.prevent_gatsby_delete";
exports.onPreInit = () => {
if (fs.pathExistsSync("public")) {
fs.moveSync("public", tmpPublic, { overwrite: true });
}
};
exports.onPreBootstrap = () => {
if (fs.pathExistsSync(tmpPublic)) {
fs.moveSync(tmpPublic, "public", { overwrite: true });
}
};
maybe then it is better if every generator has it own public dir and in the deploy step you copy over all the files to an final public folder?
maybe then it is better if every generator has it own public dir and in the deploy step you copy over all the files to an final public folder?
Indeed. But there's no way to configure output-dir in Gatsby as far as I saw by reading the source. Could have the entire projects inside different folders, but that's not A+ either. It'd also be possible to run mv public public.gatsby
after each "step" so to say as well. That does feel nicer, but on the other hand it could lead to some gotchas if you don't use that npm script.
Anyway, we're aiming for this to be temporary.
I agree, that this feature would be nice to have, but it is also very simple for deverlopers to add:
// gatsby-node.js const rimraf = require('rimraf'); const PUBLIC_FOLDER = `${__dirname}/public`; exports.onPreBuild = () => { // empty /public folder rimraf.sync(PUBLIC_FOLDER + '/*'); };
For anyone reading this currently - please don't use this solution - it leads to problems like https://github.com/gatsbyjs/gatsby/issues/25289
If you absolutely must clear public
- you also need to clear .cache
directory as well (we have now gatsby clean
command for doing that) - and you need to do this before running gatsby build
, because onPreBuild
runs too late (some files are already outputted to public dir and those get deleted) - so command like gatsby clean && gatsby build
is best bet
Gatsby should ensure it's never in a situation where such errors can occur.
I just deleted the public folder because it was overflowing with junk, thinking it'd just get rebuilt. But nope, a completely unrelated error and hours of googling leads to this issue that says that .cache has to be deleted too if one is going to delete public folder.
If empty public folder + non-empty cache doesn't work in any situation, why can't gatsby detect that and rebuild cache as well? And why not just do some cleaning before/after every build so the user never has to bother in the first place?
The dependency between .cache/ and public/ is an incredibly annoying one. If one attempt to set up CI for this the choice seems to be between fast builds (keep .cache and public) or a public/ folder that does not increase into eternity.
I had to go for slow builds because the size requirement of the artifact is a stringent one.
And this unfortunately also leaves the clever EXPERIMENTAL_PAGE_BUILD thing out of reach because that obviously depends on caching.
My wish was that there was a kind of selective "post"-cleaning of public/ which only left the most recent build. Since public/ either has hashed js or replaced static filenames, that seems possible.
Literally all they need to do is store the JSON and such in .cache and copy things over to public as needed. It's crazy how long this issue has been open.
Most helpful comment
I'm kind of thinking of it from the perspective of the principle of least surprise. There's already starting to be an established pattern for this (only HTML files are deleted), but it seems like, without consulting the docs, this might be a "surprising" outcome.
Seems like the principle of least surprise would just be that a static site generator would generate a fresh and whole copy of the static site on each run, without selectively removing some files and not others?
I agree with @baba43 that it's an easy feature to add, but it still was a surprise to learn it didn't operate that way to start. My two cents.