Pdf.js: Webpack should handle loading worker instead of setting workerSrc

Created on 20 May 2019  Â·  21Comments  Â·  Source: mozilla/pdf.js

When following the Webpack example and importing pdfjs-dist with import * as pdfjsLib from 'pdfjs-dist'; Webpack will create a pdfjsWorker.js and also automatically load it in the browser. The file may be named differently(hashed names, prefixes, etc.)

Still pdf.js requires to set a absolute path: pdfjsLib.GlobalWorkerOptions.workerSrc = 'pdfjsWorker.js';

This will let Webpack load the worker, and then pdf.js will also load the worker itself. Why is that the case? Instead we could just create a Worker and let Webpack do the loading?

Also i am having heavy trouble setting the workerSrc: The filename might be pdfjsWorker.js in development, but in production it has hashes and differential loading prefixes. I could use an external worker.js but then the worker(1.5mb) will be loaded twice and should not be included in Webpack bundling.

1-other

Most helpful comment

No I didnt find any. I used a second, precompiled, pdfjsWorker. So as far as I see the 1.5 MB worker gets loaded twice (bundled and loaded with webpack and loaded by pdf.js again but from different source).

If i get this right, when using Webpack, pdf.js should not require a workerSrc and let the dependencies handle by Webpack.

All 21 comments

@MickL I'm facing the same. Did you find any workarounds?

No I didnt find any. I used a second, precompiled, pdfjsWorker. So as far as I see the 1.5 MB worker gets loaded twice (bundled and loaded with webpack and loaded by pdf.js again but from different source).

If i get this right, when using Webpack, pdf.js should not require a workerSrc and let the dependencies handle by Webpack.

FWIW you can try importing import pdfjsLib from 'pdfjs-dist/webpack'; which handles the url assignment automatically. It does seem to come with a caveat though, if you're using (a newer version of) create-react-app the hot module replacement doesn't seem to be compatible right now. There is an example project I've found today: https://github.com/yurydelendik/pdfjs-react

Closing since we've changed the way we're working with Webpack to remove those dependencies from pdfjs-dist.

@timvandermeij would you mind clarifying how you are working with Webpack now? I looked around quite a bit but could not figure it out by myself.

I got trapped in the error described in #10997, realized that my fully working app stopped working today because it was based on unstable sources, moved from links to an npm install of pdfjs-dist as advised there and tried to follow instructions here to adapt it to my use if Webpack, to no avail. The example provided is based on React, which I am not familiar with, nor using.

My setup is:

  • ES6 modules bundled with Webpack/babel, and Django as the back end. Django is not involved in the use of pdfjs other than providing a base template and reference to the pdf file; also, I am using webpack without Django's html-webpack-plugin (even if installed) because I seem to have an easier life compiling assets directly where Django expects them;
  • package.json relevant section:
    "devDependencies": {
        "@babel/core": "^7.10.4",
        "@babel/preset-env": "^7.10.4",
        "babel-loader": "^8.1.0",
        "html-webpack-plugin": "^4.3.0",
        "webpack": "^4.43.0",
        "webpack-bundle-tracker": "^1.0.0-alpha.1",
        "webpack-cli": "^3.3.12",
        "webpack-dev-server": "^3.11.0",
        "worker-loader": "^3.0.2"
      },
      "dependencies": {
        "bootstrap-icons": "^1.0.0-alpha5",
        "npm": "^6.14.8",
        "pdfjs-dist": "^2.4.456"
      }

As far as I understand, I need only to figure out how to import properly pdfjs and set the worker up, i.e. a couple of lines of code (see in code):

import pdfjsLib from 'pdfjs-dist/webpack' //  <--- unsure about this  [line 1]
// import Worker from 'worker-loader!./Worker.js'; // this should not be necessary AFAIU

////////////////////////////////////////////
//// instantiate pdf
export const pdfView = () => {

  pdfjsLib.GlobalWorkerOptions.workerSrc = '../../node_modules/pdfjs-dist/build/pdf.worker.js';
  // ^^ [ line 2 ] this gets interpreted as a web address rather that an abs address in my src/ folder

  // defined through Django template tag in select.html
  const loadingTask = pdfjsLib.getDocument(pdfData.myPdfDoc)

  pdfData.myPdf = loadingTask.promise.then(pdf => {
    pdfData.pdfTotalPageN = pdf.numPages;
    return pdf;
  })
}

Please let me know if you want me to open a new bug or if you can provide the required two lines of code, references or applicable examples in this thread.
Thanks in advance

[...] would you mind clarifying how you are working with Webpack now?

We tried to isolate the Webpack logic into this example so that it's self-contained and no other parts of PDF.js require its dependencies, also because we try to focus on the library itself and not on integration with the various JS frameworks. We're not familiar with them and in general can't answer questions about them; the examples are merely provided as a starting point.

There is an additional example at https://github.com/yurydelendik/pdfjs-react/blob/4deabd1165395821acd4b6d3bc05dd6fef19b97f/src/App.js#L6 that seems to indicate that you're using it correctly. You should indeed also set the workerSrc option.

[...] to no avail

It's not clear _what_ is actually not working because no running example has been provided. This makes it not possible to know what's going on.

Thanks a lot for your prompt answer @timvandermeij

There is an additional example at https://github.com/yurydelendik/pdfjs-react/blob/4deabd1165395821acd4b6d3bc05dd6fef19b97f/src/App.js#L6 that seems to indicate that you're using it correctly. You should indeed also set the workerSrc option.

Also the linked example is a React setup and I am not sure if/how this influences the results.
What I noticed is that there does not seem to be a setting of the workerSrc option. I searched the term also in the rest of the repo and did not find any line of code instantiating it. Which could be coherent with some instructions I remembered reading in the process (I could not find them again, alas) that were mentioning that there is no need to instantiate or configure the worker _as long as it is installed in the same bundle_ (pdfjs-dist).

It's not clear _what_ is actually not working because no running example has been provided. This makes it not possible to know what's going on.

Let me try adding a few more info that might give hints, maybe the interpretation is obvious to you:

  • I changed the import statement as in the example provided, like so:
import pdfjsLib from 'pdfjs-dist/webpack'

////////////////////////////////////////////
//// instantiate pdf
export const pdfView = () => {
  logDebug(module.id.split('/').slice(-1)[0], ['pdfView initialized']);
  // pdfjsLib.GlobalWorkerOptions.workerSrc = '../../node_modules/pdfjs-dist/build/pdf.worker.js';

  // defined through Django template tag in select.html
  const loadingTask = pdfjsLib.getDocument(pdfData.myPdfDoc)

  pdfData.myPdf = loadingTask.promise.then(pdf => {
    pdfData.pdfTotalPageN = pdf.numPages;
    return pdf;
  })
}
  • This is the feedback from Webpack attempting to compile the resources:
WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileAsyncWasmPlugin' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js

WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileWasmPlugin' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js

ERROR in (webpack)/lib/node/NodeTargetPlugin.js
Module not found: Error: Can't resolve 'module' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/webpack/lib/node'
 @ (webpack)/lib/node/NodeTargetPlugin.js 11:1-18
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js
Child HtmlWebpackCompiler:
     1 asset
    Entrypoint HtmlWebpackPlugin_0 = __child-HtmlWebpackPlugin_0
    [./node_modules/html-webpack-plugin/lib/loader.js!./src/src-select.html] 4.57 KiB {HtmlWebpackPlugin_0} [built]

I looked at my node_modules directories, and:

Warnings 1. and 2. -- 'FetchCompileAsyncWasmPlugin.js' is not in my node_modules/webpack/lib/web/ directory, although there's a 'FetchCompileWasmTemplatePlugin.js'
Error 3. -- No module named 'module' in node_modules/webpack/lib/web/ either.

One thing I asked myself is: is there a need to perform some post npm install actions (I remember seeing a gust command or similar around, but could not find that instruction back either) that might generate the missing resources?

Thanks again

What I noticed is that there does not seem to be a setting of the workerSrc option. I searched the term also in the rest of the repo and did not find any line of code instantiating it.

The examples all set them, see https://github.com/mozilla/pdf.js/search?q=workerSrc&unscoped_q=workerSrc, even the Webpack example at https://github.com/mozilla/pdf.js/blob/50bc4a18e8c564753365d927d5ec6a6d2cce3072/examples/webpack/main.js, so I'm not sure why it was not found.

Moreover, if I look at the error log, all errors seem to originate from somewhere _inside_ Webpack and worker-loader, and seem completely unrelated to PDF.js. FetchCompileAsyncWasmPlugin is not something that is in the PDF.js codebase at all. I have the feeling that the root cause of the errors is not PDF.js, but something else in your project, but that's impossible to tell for us unfortunately.

Moreover, if I look at the error log, all errors seem to originate from somewhere _inside_ Webpack and worker-loader, and seem completely unrelated to PDF.js. FetchCompileAsyncWasmPlugin is not something that is in the PDF.js codebase at all. I have the feeling that the root cause of the errors is not PDF.js, but something else in your project, but that's impossible to tell for us unfortunately.

I suspect you are right.

Let me try one more time to bother you, and in case it doesn't work I promise I'll stop.

Looking at the Webpack example you linked, I found they instantiate the worker like this:

var pdfPath = "../learning/helloworld.pdf";

// Setting worker path to worker bundle.
pdfjsLib.GlobalWorkerOptions.workerSrc =
  "../../build/webpack/pdf.worker.bundle.js";

I don't have the build/webpack dirs because I make Webpack compile directly in my Django directories (something that looks like KJ_import/static/docs/bundles/).
In there I see this as the output:

index.js
index.worker.js

and looking inside the compiled index.js resource I get a

"use strict";
eval("__webpack_require__.r(__webpack_exports__);\n/* harmony default export */ __webpack_exports__[\"default\"] = (function() {\n  return new Worker(__webpack_require__.p + \"index.worker.js\");\n});\n\n\n//# sourceURL=webpack:///./node_modules/pdfjs-dist/build/pdf.worker.js?./node_modules/worker-loader/dist/cjs.js");

section that calls the index.worker.js module.

Do you see a way I could amend the ../../build/webpack/pdf.worker.bundle.js path to make it usable in my case? Assuming the example was referencing directly the resource _after it had been built_ I tried pdfjsLib.GlobalWorkerOptions.workerSrc = 'index.worker.js', but the result does not seem to change much:

WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileAsyncWasmPlugin' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js

WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileWasmPlugin' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js

ERROR in (webpack)/lib/node/NodeTargetPlugin.js
Module not found: Error: Can't resolve 'module' in '/home/giampaolo/dev/KJ_import/KJ-JS/node_modules/webpack/lib/node'
 @ (webpack)/lib/node/NodeTargetPlugin.js 11:1-18
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./src/js/views/pdfViews.js
 @ ./src/js/index.js
Child worker-loader node_modules/pdfjs-dist/build/pdf.worker.js:
     1 asset
    Entrypoint pdf.worker = index.worker.js
       2 modules
ℹ 「wdm」: Failed to compile.

Anyway: thanks a million again for your answers and their speed, even on a Sunday.

Do you see a way I could amend the ../../build/webpack/pdf.worker.bundle.js path to make it usable in my case?

That path is indeed only valid for the example itself when the steps from the README at https://github.com/mozilla/pdf.js/blob/master/examples/webpack/README.md are followed. The gulp dist-install line makes that work.

If you use pdfjs-dist you don't need that since the required Webpack bits are distributed along with it as outlined in https://github.com/mozilla/pdf.js/blob/master/examples/webpack/README.md#worker-loading. Looking at that in more detail, you indeed shouldn't have to set the workerSrc at all because the zero-configuration Webpack file already does that for you; see https://github.com/mozilla/pdfjs-dist/blob/master/webpack.js#L27-L31 (this is distributed in pdfjs-dist).

Fantastic, thank you. You found the same resource I had been reading in my searches.
I still have to figure out what's not working but you helped me ruling out quite a few bits.
Kindest regards,
Giampaolo

Fantastic, thank you. You found the same resource I had been reading in my searches.
I still have to figure out what's not working but you helped me ruling out quite a few bits.
Kindest regards,
Giampaolo

Hey Giampaolo,

I'm also running into the same issue. I haven't been able to completely resolve the issue, but I was able to see that the worker-loader is trying to require the FetchCompileWasmPlugin here:

https://github.com/webpack-contrib/worker-loader/blob/master/src/index.js#L26

Seems like there may be some inconsistencies between Webpack 4 and 5?

Hey Giampaolo,

I'm also running into the same issue. I haven't been able to completely resolve the issue, but I was able to see that the worker-loader is trying to require the FetchCompileWasmPlugin here:

https://github.com/webpack-contrib/worker-loader/blob/master/src/index.js#L26

Seems like there may be some inconsistencies between Webpack 4 and 5?

Oh great pick @edcheung1 . Do you think we should raise the issue with the Webpack team and maybe open an issue?
Looking at their website I understood to ask on SO first, which I did without feedback so far, so it might be a good idea.

I'm running into the same problem here. admittedly this is a super old project with a lot of out-of-date dependencies so maybe I'm missing something, but I can't get a newer version of pdfjs-dist to work where previously I had it working by importing 'pdfjs-dist/webpack'. now after updating pdfjs-dist, worker-loader and webpack, I am getting the following output:

WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileAsyncWasmPlugin' in '/home/rett/projects/LSCPortalFE/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./app/scripts/modules/PDFJSTools.ts
 @ ./app/scripts/UploadModalCtrl.ts
 @ ./app/scripts/angular-scripts.js
 @ multi (webpack)-dev-server/client?http://localhost:9000 @babel/polyfill ./app/scripts/deps.js ./app/scripts/angular-scripts.js ./app/scripts/stylesheet-bundle.js

WARNING in ./node_modules/worker-loader/dist/index.js
Module not found: Error: Can't resolve 'webpack/lib/web/FetchCompileWasmPlugin' in '/home/rett/projects/LSCPortalFE/node_modules/worker-loader/dist'
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./app/scripts/modules/PDFJSTools.ts
 @ ./app/scripts/UploadModalCtrl.ts
 @ ./app/scripts/angular-scripts.js
 @ multi (webpack)-dev-server/client?http://localhost:9000 @babel/polyfill ./app/scripts/deps.js ./app/scripts/angular-scripts.js ./app/scripts/stylesheet-bundle.js

ERROR in (webpack)/lib/node/NodeTargetPlugin.js
Module not found: Error: Can't resolve 'module' in '/home/rett/projects/LSCPortalFE/node_modules/webpack/lib/node'
 @ (webpack)/lib/node/NodeTargetPlugin.js 11:1-18
 @ ./node_modules/worker-loader/dist/index.js
 @ ./node_modules/worker-loader/dist/cjs.js
 @ ./node_modules/pdfjs-dist/webpack.js
 @ ./app/scripts/modules/PDFJSTools.ts
 @ ./app/scripts/UploadModalCtrl.ts
 @ ./app/scripts/angular-scripts.js
 @ multi (webpack)-dev-server/client?http://localhost:9000 @babel/polyfill ./app/scripts/deps.js ./app/scripts/angular-scripts.js ./app/scripts/stylesheet-bundle.js

+1 FWIW facing the exact same error msgs mentioned above with a clean setup of create-react-app and using the App component from https://github.com/yurydelendik/pdfjs-react.

We are also facing the exact same issue after upgrading the pdfjs library,
@timvandermeij can the issue be re-opened?

vue-cli4,same error

I was able to solve the problem with the responses on
https://stackoverflow.com/questions/63553008/looking-for-help-to-make-npm-pdfjs-dist-work-with-webpack-and-django

Nevertheless, as i was still facing other issues using pdfjs with vue 3... To make it work i ended up using version 2.0.943 of the pdfjs-dist package. Not the best solution, but the only way i found to make it work after a week of trial and error...

+1 FWIW facing the exact same error msgs mentioned above with a clean setup of create-react-app and using the App component from https://github.com/yurydelendik/pdfjs-react.

Currently facing the exact same issue. Did you find a fix?

Same issue here, I followed all steps described in the sparse docs/examples and countless online blog post for old version and still no clear way to integrate pdfjs to a project.

This post shows all the hoops one went through to make it work: https://stackoverflow.com/questions/63553008/looking-for-help-to-make-npm-pdfjs-dist-work-with-webpack-and-django

Would really be nice if the dev experience was nicer, in the mean time will try to downgrade to 2.0.943 as the post above suggests ...

Here's the fix copied from SO, posted by Siddhesh on 20th October:

This issue seems to arise due to esModule option introduced in [email protected].
The fix for this was merged in (pre-release) [email protected]
You can fix this by either upgrading pdfjs-dist to v2.6.347 OR downgrading worker-loader to v2.0.0

It's easiest to downgrade worker-loader, as the pdfjs-dist containing the fix has not yet been released to npm.

You can then import pdfjs-dist with:

let pdfjs = require("pdfjs-dist/webpack");
let loadingTask = pdfjs.getDocument(url);     

This works for me within a Vue.js component, in a project created by vue-cli. I'm using pdfjs-dist 2.5.207 and worker-loader 2.0.0.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmisdm picture dmisdm  Â·  3Comments

jigskpatel picture jigskpatel  Â·  3Comments

xingxiaoyiyio picture xingxiaoyiyio  Â·  3Comments

AlexP3 picture AlexP3  Â·  3Comments

liuzhen2008 picture liuzhen2008  Â·  4Comments