Parcel: [RFC] File naming strategy

Created on 21 Feb 2018  ยท  24Comments  ยท  Source: parcel-bundler/parcel

This is a meta issue to discuss a number of things that have come up about file naming in Parcel.

  1. Keep original filenames for HTML files: #433, #280, #557, #307
  2. Hashing assets based on file content for cache busting: #717, #188, #753, #756, #829
  3. Putting assets in a separate folder from HTML: #233

I think we can come up with a cohesive file naming strategy that meets all of these needs.

  • We hash all assets based on file contents to produce filenames like index.a8b29e.js, except in the following cases (taken from the rules outlined by @Munter here). In those cases, we use the original filenames.

    • Any graph entry point (usually html)

    • Any asset linked to with an <a href>

    • Any asset linked to with a <meta http-equiv="refresh">

    • Serviceworkers (must keep consistent file name across builds)

    • humans.txt, robots.txt, .htaccess, favicon.ico

    • Cache manifests, rss and atom feeds

  • Place hashed assets (things not matched by above rules) in dist/assets e.g. dist/assets/index.a8b29e.js. This would be flattened as it is currently, so src/some/path/something.js would be placed in e.g. dist/assets/something.fd5se2.js.
  • Place non-hashed assets (things matched by the above rules) in the root, and create directories as needed to match the original paths. For example, if an HTML file were linked to from <a href="/some/path/something.html"> the output file would be dist/some/path/something.html.
  • We also support the -o or --out-file CLI option, which would override the default name for the entry file. If not provided, and the entry file matches main in package.json, use the package name.

The only case where this breaks down is if the input path started with /assets - which is the folder we're already using for static things. Not sure what to do about that: I guess we could try to generate a unique name for the assets folder or something. Open to suggestions here!

Otherwise, I think this strategy solves most of the issues listed above. Please let me know your feedback and make any suggestions you think would improve this strategy!

cc. @zeakd @songlipeng2003 @ssuman @Munter @benhutton @shanebo @leeching @gamebox @npup

RFC

Most helpful comment

would it be sensible to make the naming configurable if needed (e.g. via a .rc file), and in that case just resort to old school get parameter cache busting?
the .rc file could look like this (im thinking asset type based):

{
"html":"dist/[name].html",
"css":"dist/css/[name].css",
"js":"dist/js/[name].[package.version].js"
}

All 24 comments

I think its a great strategy, though I'd like a CLI option for not putting the assets in a subfolder! Is index._hash_.js based on the root file name or on the budle file name?

I recommend not making up a unique name for the assets folder. At least not per build at least. The point of correct hashing is to get content addressable urls, so the assets directory has to be predictable and identical across runs so caching headers can be configured as imutable for the path.

I'd just prepend an underscore or two, just to lessen the likelihood of name clashes by departing from nice human names: __assets

I think the strategy looks good. As for the assets folder, I agree with @Munter, perhaps just double underscore, _assets_ or some variant thereof. I think having a unique name for the assets folder could cause issues; for example when invalidating files on AWS CloudFront / CDNs, etc.

Why not just let it default to assets and be overridden by a command line option (like we do with out-dir)? You could specify another name for the folder, or no folder at all.

+1

A sensible default and a sensible option sounds great to me too.

2018-02-21 14:49 GMT+01:00 Ben Hutton notifications@github.com:

Why not just let it default to assets and be overridden by a command line
option (like we do with out-dir)? You could specify another name for the
folder, or no folder at all.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/parcel-bundler/parcel/issues/872#issuecomment-367331131,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABXvdVEUky97Y1Y7znelTRYYfN0fKx1ks5tXB7SgaJpZM4SM14H
.

looks good! I think it is perfect for html. But is there something reason to collect all hashed file to assets folder? I don't know about cache control deeply, but IMO these Parcel ways needs more and more options.

I mean, how about src/some/path/something.js just to dist/some/path/something.fd5se2.js? and write src/assets/index.js if you needs assets folder.

I think, with special parcel ways, parcel needs more and more options to customize..

That's certainly an option. We could just put all of the hashed assets in the root. It was requested in #233 and elsewhere to put static files in a separate folder though, so I was trying to accommodate that. I guess maybe it makes it easier to separate things you might upload to a CDN and things you need to put on your webserver maybe? Why did you need that @npup?

This option would look like:

dist/
โ”œโ”€โ”€ index.html
โ”œโ”€โ”€ something
โ”‚ย ย  โ””โ”€โ”€ about.html
โ””โ”€โ”€ index.a8b29e.js

Alternatively, we could make two roots: one for static assets, and one for HTML. So the output would be:

dist/
โ”œโ”€โ”€ html
โ”‚ย ย  โ”œโ”€โ”€ index.html
โ”‚ย ย  โ””โ”€โ”€ something
โ”‚ย ย      โ””โ”€โ”€ about.html
โ””โ”€โ”€ static
    โ””โ”€โ”€ index.a8b29e.js

Then you could easily upload the html directory to your webserver, and static to your cdn.

@devongovett parcel is useful when make static folder page like git page. However, two roots as default would not work with static folder page, and it looks little ugly with --public-url option. ex) PUBLICURL/../static/index.a6gy7d.js

I like leaving them in the root as the simple default.

Just provide a new option for the build directory for entry points (the non-asset files), defaulting to the same as --out-dir. Your second example, @devongovett, would be reproducible with --out-dir=dist/static --entry-point-dir=dist/html.

@zeakd --public-url is really only required for the hashed assets, since the others keep (and can point to) their relative location. So it doesn't need to look ugly with --public-url.

this feature is a must needed one.. +1 for this...

i like the folder structure that @devongovett was mentioned here.

dist/
โ”œโ”€โ”€ html
โ”‚   โ”œโ”€โ”€ index.html
โ”‚   โ””โ”€โ”€ something
โ”‚       โ””โ”€โ”€ about.html
โ””โ”€โ”€ static
    โ””โ”€โ”€ index.a8b29e.js

Please do not move url addressable and browser navigator files into a HTML folder. It is crucial that you retain the same folder structure from the web root as in your source directory for these files. If you fail to do this, it has implications on how a web server has to be set up, which dialers the ability to use standard static hosting and servers.

It's important to put hashed assets into a specific folder with a predictable name so it gets easy to configure cache headers for these immutable files without having to configure regex matches in your server

What I was after was a way to be able to be able to identify all generated
files _in the general case_.
This is useful for postprocessing as well as the SPA case. Being able to
put the generated assets in a sub directory with a name of choice solves
the general case bit (a bit better than just a nice default).

I don't have any special opinion on whether the HTML should go into a
directory of its own as well. I haven't had a need for that myself so far.

/p

2018-02-21 21:32 GMT+01:00 Devon Govett notifications@github.com:

That's certainly an option. We could just put all of the hashed assets in
the root. It was requested in #233
https://github.com/parcel-bundler/parcel/issues/233 and elsewhere to
put static files in a separate folder though, so I was trying to
accommodate that. I guess maybe it makes it easier to separate things you
might upload to a CDN and things you need to put on your webserver maybe?
Why did you need that @npup https://github.com/npup?

Alternatively, we could make two roots: one for static assets, and one for
HTML. So the output would be:

dist/
โ”œโ”€โ”€ html
โ”‚ โ”œโ”€โ”€ index.html
โ”‚ โ”œโ”€โ”€ about.html
โ””โ”€โ”€ static
โ””โ”€โ”€ index.a8b29e.js

Then you could easily upload the html directory to your webserver, and
static to your cdn.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/parcel-bundler/parcel/issues/872#issuecomment-367462047,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABXvVHPJNTBeH_AkujDSmkRXWUmJVaHks5tXH1agaJpZM4SM14H
.

I would suggest using base62 instead of hex as this leads to shorter hash IDs. This is especially powerful with a fast hash algorithm like xxhash. See also by little side-project: https://github.com/sebastian-software/asset-hash

I think this goes along with multiple entry points (#189).

  • The only reason you'd care about the name of a file is if you need to be linking to it somehow (.html files become website URLs, .js files become importable modules)
  • If you need to link to it, that should be considered an "entry" point
  • All entry points should have unique names and should generate those names in the output
  • All other generated modules are implementation details, we can try to make the names nicer, but they are an implementation detail that can and will change and should not be relied upon.

In order for hashes of file contents to mean anything for cache busting and/or versioning, those files (and therefore hashes) need to be the same when you build with the same input code. This is not currently the case... I think something along the lines of #780 needs to be merged with or before this

would it be sensible to make the naming configurable if needed (e.g. via a .rc file), and in that case just resort to old school get parameter cache busting?
the .rc file could look like this (im thinking asset type based):

{
"html":"dist/[name].html",
"css":"dist/css/[name].css",
"js":"dist/js/[name].[package.version].js"
}

Adding naming configuration should really not be needed. A good default is perfectly fine. Keep the name of the asset if there was one. If it's not a linkable entry point, inject a hash into the name to achieve content addressability.

The only possible use for naming configuration would be if you have your static assets served through an external proxy CDN, so you need to update the urls from the parent asset from /assets/foo-hash.png to https://mycdn.host.com/assets/foo-hash.png for example

As @zeakd wrote:
"I mean, how about src/some/path/something.js just to dist/some/path/something.fd5se2.js? and write src/assets/index.js if you needs assets folder."

IMHO this rule is the best implicit "asset" folder configuration option (because it is somewhat explicit, but does not need any additional options).
Apart from being able to keep meaningful names for assets if they are used elsewhere (downloaded or might also be SEO relevant), it also helps a lot to backreference the original source.

@ioss not gathering the content addressable files in a common directory makes it harder to configure a server to send out an immutable cache header for them

@Munter I don't understand how the above "rule" from zeakd would prevent you from gathering files in a (or a handful of) common directory? Just have your assets in /src/assets/... and they would end up in /dist/assets/... and you would have the original proposal.
Except for the flattening path part, which shouldn't be a problem concerning the servers "immutable cache header rule".

Contrary: should you (for whatever reason?) do not want to send out immutable cache headers for some of the files, you'd have a hard time to exclude them, especially as they would change their name every time their content changes.

Implemented this strategy in #1025. Please let me know what you think and help test it out!

Closing since #1025 is merged. Please help test using the master branch - a release will hopefully come next week!

would it be sensible to make the naming configurable if needed (e.g. via a .rc file), and in that case just resort to old school get parameter cache busting?
the .rc file could look like this (im thinking asset type based):

{
"html":"dist/[name].html",
"css":"dist/css/[name].css",
"js":"dist/js/[name].[package.version].js"
}

I strongly support using an (optional) configuration file here.
Folder and file structure have a few implications on projects, especially at scale.
For example I may need to scan all image assets in my project to do some OCR (true use case from a past job) where sorting images to folders may have positive performance implications (I'm talking thousands of images).
Please consider this solution.

@yonimor scan your source folder, not your build artefacts

Was this page helpful?
0 / 5 - 0 ratings

Related issues

urbanhop picture urbanhop  ยท  3Comments

davidnagli picture davidnagli  ยท  3Comments

termhn picture termhn  ยท  3Comments

466023746 picture 466023746  ยท  3Comments

philipodev picture philipodev  ยท  3Comments