Html-webpack-plugin: Revisit WSI<->HWP integration

Created on 6 Nov 2020  ·  26Comments  ·  Source: jantimon/html-webpack-plugin

Hi, as mentioned before, I'm thinking about drawing a line in the sand and dropping backward compatibility where necessary.

One thing I'm wondering about is whether we can find a way to handle this better:

https://github.com/waysact/webpack-subresource-integrity/blob/4c965fcbed636177da3c98173b294f38806cac92/index.js#L315

For a start, this doesn't compile with Typescript (without coercing pluginArgs to any).

How about something like this? It's just one of many ways to do it, let me know your thoughts.

type Attributes = { [attributeName: string]: string | boolean };

interface TemplateParameter {
  compilation: any;
  htmlWebpackPlugin: {
    tags: {
      headTags: HtmlTagObject[];
      bodyTags: HtmlTagObject[];
    };
    files: {
      publicPath: string;
      js: Array<string>;
      css: Array<string>;
      manifest?: string;
      favicon?: string;
      fileAttributes: { [fileName: string]: Attributes }; // <-- new attribute
    };
    options: Options;
  };
  webpackConfig: any;
}

Most helpful comment

Had a nice voice chat with @jantimon and @sokra about this, thanks guys. To recap:

  • Webpack 4 does have content-based hashing, but it's not "real" -- hashes are calculated not on the final-final output and therefore more likely to change even when the final content is the same
  • This is the reason why things work differently now in Webpack 5
  • HWP is running in OPTIMIZE_INLINE so that the files it (and any ServiceWorker plugins) outputs can receive a "real" content hash as well
  • RCHP can be told which hashes to look out for, so it's not limited to the hashes used for filenames -- it's straightforward to have it scan for all SRI hashes as well
  • Hooking into updateHash doesn't change the way filename hashes are calculated, but rather by doing so RCHP can be "taught" how to handle SRI hashes (perhaps the docs could be improved here).

I think I've got everything now to add back support for content hashes. I'll let you know once I've got a chance to implement it @jantimon , shouldn't be too long.

All 26 comments

Here you are extending files in an unexpected way - that's why it is showing an error..
The "correct" typed way would be to use a Map or WeakMap.
But I totally agree that we should rather extend the html-webpack-plugin here instead.

Adding an additional attribute could also address the issue that is hardly possible to find out the chunks which the file belongs to.

Just a note: html-webpack-plugin@next will probably need to change the point in time where it generates the html to be able to access the final optimized css/js assets.. I asked for support here: https://github.com/webpack/webpack/issues/11822

The "correct" typed way would be to use a Map or WeakMap.

I'm not entirely sure what you mean... if you mean a Map instead of an Object (as in Map<string, Attributes>) that sounds fine. I don't see how a WeakMap would work here, can you elaborate?

Just a note: html-webpack-plugin@next will probably need to change the point in time where it generates the html to be able to access the final optimized css/js assets.. I asked for support here: webpack/webpack#11822

Sounds good. Doesn't look like there's much movement on the issue... For now this is what I'm using for testing:

  • In my plugin (SRI): Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE + 200
  • In your plugin (HWP) (only in index.js, not in child-compiler.js): Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_TRANSFER - 200

Do you think that would work with the other plugins you were mentioning (Workbox etc.)?

Hey @jscheid

you are right we should probably use a self defined stage priority for now - this is quite dangerous but can be fixed once the webpack core team picks up on this issue

I would like to use the same level as the SRI e.g. (Compilation.Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE + 200) - if you hook into the html-webpack-plugin this should probably also just work or do you see any issues here?

Hmm, mine has to run before yours, right? So that when you generate the HTML, the integrity values are there. How do you want to guarantee that when we both use the same stage? Or am I misunderstanding?

I am not sure if every plugin should have its own stage..

If I understand you correctly you don't want to generate the SRI twice.
What about adding a cache?

That way it doesn't matter which plugin comes first and it would work with and without the html-webpack-plugin:

class SRI {
  apply(compiler) {
    compiler.hooks.compilation.tap("SRI", (compilation) => {
      let sriHashes;

      compilation.hooks.processAssets.tap(
        {
          name: "SRI",
          stage: Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE + 200,
        },
        (assets) => {
          sriHashes = sriHashes || generateSriHashes();

          // Do all non Html-Webpack-Plugin things
        }
      );

      HtmlWebpackPlugin.getHooks(compilation).beforeAssetTagGeneration.tapAsync(
        "SRI",
        (data, cb) => {
          sriHashes = sriHashes || generateSriHashes();

          // Do all the Html-Webpack-Plugin things

          cb(null, data);
        }
      );
    });
  }
}

function generateSriHashes() {
   // Do the real work here once per compilation
}

Right... yes, I suppose that would work.

Pushed it as [email protected] 🍀 with Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE + 200

Please let me know if there is anything I can do to support this issue

Thanks, seems to be working well. I'll probably push something for you to look at and play with in the next few days.

@jantimon actually, I think we might have to move it from OPTIMIZE_SIZE + 200 to OPTIMIZE_HASH + 200, since OPTIMIZE_HASH can still modify the assets.

I can't find OPTIMIZE HASH here: https://webpack.js.org/api/compilation-hooks/#processassets

PROCESS_ASSETS_STAGE_ADDITIONAL - Add additional assets to the compilation.
PROCESS_ASSETS_STAGE_PRE_PROCESS - Basic preprocessing of the assets.
PROCESS_ASSETS_STAGE_DERIVED - Derive new assets from the existing assets.
PROCESS_ASSETS_STAGE_ADDITIONS - Add additional sections to the existing assets e.g. banner or initialization code.
PROCESS_ASSETS_STAGE_OPTIMIZE - Optimize existing assets in a general way.
PROCESS_ASSETS_STAGE_OPTIMIZE_COUNT - Optimize the count of existing assets, e.g. by merging them.
PROCESS_ASSETS_STAGE_OPTIMIZE_COMPATIBILITY - Optimize the compatibility of existing assets, e.g. add polyfills or vendor prefixes.
PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE - Optimize the size of existing assets, e.g. by minimizing or omitting whitespace.
PROCESS_ASSETS_STAGE_SUMMARIZE - Summarize the list of existing assets.
PROCESS_ASSETS_STAGE_DEV_TOOLING - Add development tooling to the assets, e.g. by extracting a source map.
PROCESS_ASSETS_STAGE_OPTIMIZE_TRANSFER - Optimize the transfer of existing assets, e.g. by preparing a compressed (gzip) file as separate asset.
PROCESS_ASSETS_STAGE_ANALYSE - Analyze the existing assets.
PROCESS_ASSETS_STAGE_REPORT - Creating assets for the reporting purposes.

They must have forgotten to update the documentation, but it's there and being used.

@jscheid I released 5.0.0-alpha.14 with PROCESS_ASSETS_STAGE_OPTIMIZE_HASH but as mentioned in https://github.com/jantimon/html-webpack-plugin/issues/1527#issuecomment-724543871 this will be only a intermediate solution

Perfect, thank you! All tests in my soon-to-be-released next branch are now passing against alpha.14. I'll push it soon so that you can test against it as well.

Nice 💯

@jscheid can we close this or is there anything left to do?

I released the html-webpack-plugin as beta and I assume that the phase will not change anymore for the html-webpack-plugin.

@jantimon Thanks for the heads up. I've just upgraded my WIP branch to latest webpack and HWP, but unfortunately tests now fail for no reason that's immediately obvious.

I haven't followed the conversation on the other tickets. I'm still running with PROCESS_ASSETS_STAGE_OPTIMIZE_HASH + 200, which used to work before this upgrade. Do you know if this is something I need to change now?

@jantimon I'm still looking into this and trying to work out why exactly the tests are failing now, but I don't think it can possibly work with the latest pipeline order:

  1. Computing SRI requires the chunk contents to be final, by definition it will break if a chunk is modified after SRI is computed.
  2. Because chunks can reference other chunks by name (for lazy-loading / code splitting), by extension this means that SRI requires chunk names to be final (except for the names of top-level chunks.)
  3. Generating index.html requires SRI so that it can be baked into the integrity attribute. It also requires the names of top-level chunks.
  4. Therefore, index.html generation has to happen after all chunk contents and all names are final.
  5. If I'm not mistaken, your plugin now runs in stage OPTIMIZE_INLINE.
  6. But couldn't assets change after this? For example, in "OPTIMIZE_HASH: Optimize the hashes of the assets, e.g. by generating real hashes of the asset content" which happens later (2500 > 700)?

I might be missing something here, apologies if I didn't follow the other threads in the last few weeks, I was distracted by other things.

@jantimon as expected, all tests pass again with this change:

diff --git a/index.js b/index.js
index d532fd2..f75b123 100644
--- a/index.js
+++ b/index.js
@@ -213,7 +213,7 @@ function hookIntoCompiler (compiler, options, plugin) {
           /**
            * Generate the html after minification and dev tooling is done
            */
-          webpack.Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_INLINE
+          webpack.Compilation.PROCESS_ASSETS_STAGE_OPTIMIZE_HASH + 200
         },
         /**
          * Hook into the process assets hook

(+ 1 works as well as + 200, but + 0 doesn't, it breaks my test webpack4-contenthash.)

I've been trying to follow the other threads but I still can't quite figure out what prompted the change to OPTIMIZE_INLINE. Can you shed some light?

I think the bottom line is this:

  1. As explained in my previous comment, there needs to be a cut-off point in the hooks at which modifications to assets (at least to those covered by SRI) are disallowed: both modifications to the asset _contents_ as well as their _names_.

    • Ideally this would be enforced by Webpack through freezing the corresponding data structures or such, but at the very least there should be an edict.

    • I believe that _de facto_ this point is currently PROCESS_ASSETS_STAGE_OPTIMIZE_HASH + 1. Of course, without a clear guideline, much less enforcement, we cannot rely on it, but it's the best we have for the time being.

  2. HWP needs to run at or after this point if it wants to support SRI.

Would you agree?

sorry for the delay I'll take a look next year :)

hey @jscheid

sorry once again for the delay - I talked to @sokra about this topic and he proposed a different solution.

lets say [contenthash].html includes an entry [contenthash].js which lazy import a split chunk [contenthash].js

as you can imagine if the split chunk file is minified it's hash & filename change - as this filename is also used inside the entry the content of that file and therefore also the hash & filename change - and in the end the html file which includes a script tag to the entry also needs to update and therefore also its hash & filename changes..

to solve that @sokra came up with the realContentHash optimisation which updates the hashes in the end:

  • PROCESS_ASSETS_STAGE_ADDITIONAL: additional images and files e.g.. copy-plugin
  • PROCESS_ASSETS_STAGE_OPTIMIZE_SIZE: JS/CSS/Img/HTML minimizing
  • PROCESS_ASSETS_STAGE_OPTIMIZE_INLINE: add html (with or without inlining) JS/CSS
  • PROCESS_ASSETS_STAGE_SUMMARIZE: Service Worker
  • PROCESS_ASSETS_STAGE_OPTIMIZE_HASH: if optimization.realContentHash recalculate the hash including SRI

his proposed code for SRI is this one:

https://github.com/webpack/webpack/blob/7a6e0429f983119af97401a75834cc8f43fc2d60/test/configCases/process-assets/html-plugin/webpack.config.js#L148-L163

class SriHashSupportPlugin {
    apply(compiler) {
        compiler.hooks.compilation.tap("sri-hash-support-plugin", compilation => {
            RealContentHashPlugin.getCompilationHooks(compilation).updateHash.tap(
                "sri-hash-support-plugin",
                (input, oldHash) => {
                    if (/^sha512-.{88}$/.test(oldHash) && input.length === 1) {
                        const hash = createHash("sha512");
                        hash.update(input[0]);
                        return `sha512-${hash.digest("base64")}`;
                    }
                }
            );
        });
    }
}

Please let me know if you believe this might work for you :)

Hi @jantimon , Happy New Year!

Hmm, reading this I think the code you posted is just changing the way the _filename_ hash is calculated. Not sure how that would help with the problem I described above.

Keep in mind that the SRI spec allows to use multiple hashes per file (for example both sha-384 and sha-512.)

What was the reason for the change to OPTIMIZE_INLINE?

Ok, reading the code I see that the idea is to scan all assets for something that looks like a hash, find the asset referenced by the hash, and replace the old by the new hash. I suppose this could work in theory even when multiple hashes are used per file.

It still smells like a hack though, so I'd like to understand:

  1. Why is this necessary now when everything was working fine in Webpack 4 (which also has content-based hashing)
  2. What if the user uses a different (custom) plugin in phase OPTIMIZE_HASH?
  3. It looks like RCHP only scans for hashes used in the webpack contenthash settings but we can't know that this covers all hash functions the user wants to use with SRI, can we?
  4. Wouldn't overriding the updateHash hook change the way filename hashes are calculated (as per the docs)? Doesn't this all feel a bit... messy? After all, SRI and file naming are related, but ultimately different concepts.

Hey @jscheid I tried to reach out to you but couldn't find you on gitter or twitter - can you please send me your email address to invite you to slack?

Had a nice voice chat with @jantimon and @sokra about this, thanks guys. To recap:

  • Webpack 4 does have content-based hashing, but it's not "real" -- hashes are calculated not on the final-final output and therefore more likely to change even when the final content is the same
  • This is the reason why things work differently now in Webpack 5
  • HWP is running in OPTIMIZE_INLINE so that the files it (and any ServiceWorker plugins) outputs can receive a "real" content hash as well
  • RCHP can be told which hashes to look out for, so it's not limited to the hashes used for filenames -- it's straightforward to have it scan for all SRI hashes as well
  • Hooking into updateHash doesn't change the way filename hashes are calculated, but rather by doing so RCHP can be "taught" how to handle SRI hashes (perhaps the docs could be improved here).

I think I've got everything now to add back support for content hashes. I'll let you know once I've got a chance to implement it @jantimon , shouldn't be too long.

Should be all sorted now ✨

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GerkinDev picture GerkinDev  ·  3Comments

lonelydatum picture lonelydatum  ·  3Comments

azat-io picture azat-io  ·  4Comments

ghaiklor picture ghaiklor  ·  3Comments

var-bp picture var-bp  ·  3Comments