Pdf.js: Readme should have `npm install pdf.js` on how to use... And a better tutorial. (I've added a suggestion on how could be improved)

Created on 15 Sep 2020  路  9Comments  路  Source: mozilla/pdf.js

I think is a big oversight forgetting to add this, I had to open package.json to figure out if was even on npm... I haven't seen a readme without npm install ... for long time. and this seem like a big project.

1-other

Most helpful comment

Tutorial could be so much better. I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples . When I first read I feel like I'm in 2010 using jquery...

How to actually use it with yarn/npm

Installation

Install: yarn add pdfjs-dist @types/pdfjs-dist (replace yarn add for npm install if you use npm. Also remove @types/pdfjs-dist if you don't use typescript. (anyone still uses npm or vanilla js??)

Setup

Webpack specific code (if you use yarn, you most likely use webpack or a bundler) so I think this example makes more sense than using the cdn

// We need to get the url of the worker (we use min for prod)
import workerSrc from '!!file-loader!pdfjs-dist/build/pdf.worker.min.js'

// Use require because import doesn't work for some obscure reason. Also use `webpackChunkName` so it will not bundle this huge lib in your main code
const pdfjsLib = require(/* webpackChunkName: "pdfjs-dist" */ `pdfjs-dist`)

// Now you assign the worker file path to the `pdfjsLib` (yes, it's that cumbersome)
pdfjsLib.GlobalWorkerOptions.workerSrc = workerSrc

Usage

Add a <canvas id="pdf-canvas"/> to your HTML.

Then in a js file put:

    // Loads the actual pdf
    const pdf = await pdfjsLib.getDocument(props.url).promise

    // Opens page 1?
    const page = await pdf.getPage(1)

    // Creates or gets a viewport?
    const viewport = page.getViewport({scale: 1})

    const canvas = document.getElementById('pdf-canvas')
    const context = canvas.getContext('2d')
    canvas.width = 600
    canvas.height = 300

    const renderContext = {
      canvasContext: context,
      viewport: viewport,
    }
    await page.render(renderContext).promise

This is what I'd expect from a tutorial TBH. Yes, I complain about it, but I also give you a way to improve.

Why I think this is better?

  • Concise
  • To the point
  • Clear code

Minor details in guides

All these small details gives me the impression of a poorly written library. (even if it might not be)

canvas.height = viewport.height;
canvas.width = viewport.width;

Why in the world would you switch width, height order to height, width? Just wasted time to fix bug where I expected width and height ordering.

Promises async await

Mentions await in many places, but tutorial is loadingTask.promise.then... let's write some nice code with async await

Still on es5?

vars, no arrow functions...

Naming

const loadingTask = pdfjsLib.getDocument(props.url), loadingTask is weird

the-canvas... come on..

All 9 comments

And this installs "pdf.js": "^0.1.0", wtf... OMG is actually installing another package https://github.com/stanfeldman/pdf.js, what a mess. Please edit package.json to be at least different, people like me would install the wrong package, which could even be malware...

But... how to use this with the regular npm install?

Ok, I saw now burried in a block of text.. is npm install pdfjs-dist... I think should be way way more obvious.

IMO people that use it is much higher than people that contribute, therefore I think you should have guide for users first and a guide for contribution later.

Tutorial could be so much better. I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples . When I first read I feel like I'm in 2010 using jquery...

How to actually use it with yarn/npm

Installation

Install: yarn add pdfjs-dist @types/pdfjs-dist (replace yarn add for npm install if you use npm. Also remove @types/pdfjs-dist if you don't use typescript. (anyone still uses npm or vanilla js??)

Setup

Webpack specific code (if you use yarn, you most likely use webpack or a bundler) so I think this example makes more sense than using the cdn

// We need to get the url of the worker (we use min for prod)
import workerSrc from '!!file-loader!pdfjs-dist/build/pdf.worker.min.js'

// Use require because import doesn't work for some obscure reason. Also use `webpackChunkName` so it will not bundle this huge lib in your main code
const pdfjsLib = require(/* webpackChunkName: "pdfjs-dist" */ `pdfjs-dist`)

// Now you assign the worker file path to the `pdfjsLib` (yes, it's that cumbersome)
pdfjsLib.GlobalWorkerOptions.workerSrc = workerSrc

Usage

Add a <canvas id="pdf-canvas"/> to your HTML.

Then in a js file put:

    // Loads the actual pdf
    const pdf = await pdfjsLib.getDocument(props.url).promise

    // Opens page 1?
    const page = await pdf.getPage(1)

    // Creates or gets a viewport?
    const viewport = page.getViewport({scale: 1})

    const canvas = document.getElementById('pdf-canvas')
    const context = canvas.getContext('2d')
    canvas.width = 600
    canvas.height = 300

    const renderContext = {
      canvasContext: context,
      viewport: viewport,
    }
    await page.render(renderContext).promise

This is what I'd expect from a tutorial TBH. Yes, I complain about it, but I also give you a way to improve.

Why I think this is better?

  • Concise
  • To the point
  • Clear code

Minor details in guides

All these small details gives me the impression of a poorly written library. (even if it might not be)

canvas.height = viewport.height;
canvas.width = viewport.width;

Why in the world would you switch width, height order to height, width? Just wasted time to fix bug where I expected width and height ordering.

Promises async await

Mentions await in many places, but tutorial is loadingTask.promise.then... let's write some nice code with async await

Still on es5?

vars, no arrow functions...

Naming

const loadingTask = pdfjsLib.getDocument(props.url), loadingTask is weird

the-canvas... come on..

I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples

Generally speaking: Most of the examples provided are purposely generic, and not framework/bundler specific, since it's very difficult to maintain and support a wide range of different of frameworks/bundlers for the regular PDF.js contributors.

Note that https://mozilla.github.io/pdf.js/examples/index.html#hello-world-walkthrough contain links to additional examples.

Webpack specific code [...]

For the reasons outlined above, we don't want to advertise e.g. Webpack specifically in basic examples; however the examples folder contains the following: https://github.com/mozilla/pdf.js/tree/master/examples/webpack

[...] so I think this example makes more sense than using the cdn

It may be more appropriate for your use-case, but everyone probably isn't using Webpack,

Also remove @types/pdfjs-dist if you don't use typescript.

No one should be using those, since they're not official and most likely neither up-to-date or correct.

However, official TypeScript definitions are being distributed here: https://github.com/mozilla/pdfjs-dist/tree/master/types

canvas.width = 600
canvas.height = 300

Note that simply hard-coding of values, without any regard for the viewport, may have generally unintended side-effects for many PDF documents.

Why in the world would you switch width, height order to height, width?

It's hopefully obvious that mistakes can/will happen, since everyone is human. (Also, this really seem like a minor issue for rather than a huge "bug"..)

Mentions await in many places, but tutorial is loadingTask.promise.then... let's write some nice code with async await

vars, no arrow functions...

The general PDF.js library is still usable in environments/browsers without native support for those features, hence we purposely don't use things like async/await in general examples; note e.g. the two different kind of builds mentioned in https://mozilla.github.io/pdf.js/getting_started/#download

const loadingTask = pdfjsLib.getDocument(props.url), loadingTask is weird

What's weird about it?
The getDocument call returns a PDFDocumentLoadingTask-instance, see
https://github.com/mozilla/pdf.js/blob/b0c7a74a0c7403aba86087f8180d75c16ec58602/src/display/api.js#L196-L208
and
https://github.com/mozilla/pdf.js/blob/b0c7a74a0c7403aba86087f8180d75c16ec58602/src/display/api.js#L438-L468

the-canvas... come on..

Huh?

I see what you mean, but still, is hard to get started... I didn't even find the webpack example. I'd say there is a discoverability issue on the readme. I'd suggest you to look at other libraries and take a look on how they architect their readme. I don't usually take my time to write so much, unless I think there is a big room for improvement.

I don't have time to read everything on the readme to find examples and then webpack. See what happened:

  • I searched for npm on readme: found a few results, somehow I missed the text usage with NPM and Bower under the pdfjs-dist name.. My patter recognition didn't notice a npm install blabla so I missed.
  • I searched for typescript: no results. then my default is to use @types/blabla. Again, apparently I got the wrong library
  • I searched for webpack: no results. I guess I should have found More examples can be found in the examples folder., press that link, then webpack... but didn't happen.

All these details makes it hard to get started. For example as a user of this library I don't even want to look at the source code, especially when I'm in rush. (I know is superficial, but sometimes you can't read source code of every lib you use, especially if I'm only trying out to do a proof of concept)

Also https://github.com/mozilla/pdf.js/blob/master/examples/webpack/main.js

pdfjsLib.GlobalWorkerOptions.workerSrc =
  "../../build/webpack/pdf.worker.bundle.js";

I'm not sure that's something to suggest, would be nice to give an example that can be used by a noobie. That import is a hack. Also still not async/await and vars. (but not a big deal)

I didn't even find the webpack example. I'd say there is a discoverability issue on the readme.

We don't want to advertise all individual examples separately in the README since it would make it much bigger than necessary and big README files may be even less inviting to actually read. In my experience it's pretty common to just Ctrl + F for "example" and find them.

I don't have time to read everything on the readme to find examples and then webpack.

To be fair, we can change the README all we want, but if people can't be bothered to actually read the README then you can't be surprised that you're missing information...

I searched for npm on readme: [..] somehow I missed the text usage with NPM and Bower under the pdfjs-dist name. My patter recognition didn't notice a npm install blabla so I missed.

I think NPM is listed pretty clear with a link to a page with more details, including the pdfjs-dist name. The README should be concise, so we can't put all details in there. The "somehow I missed " and "pattern recognition didn't notice" to me just seem to indicate that the search was incomplete.

I searched for typescript: no results.

According to https://www.typescriptlang.org/docs/handbook/declaration-files/publishing.html#including-declarations-in-your-npm-package the types property in the pdjs-dist package is the standard way of publishing those types, and we do this in https://github.com/mozilla/pdfjs-dist/blob/master/package.json#L5. We only ship TypeScript definitions in the final NPM package, so I don't know what there would be to mention about TypeScript here...

sometimes you can't read source code of every lib you use, especially if I'm only trying out to do a proof of concept

I agree that you shouldn't need to read the source code, but even for a POC you'll need to do _some_ reading of e.g., READMEs and example code.

All in all, this issue contains _a lot_ of different points about not only the README but also the examples and the source code syntax, so it's kind of all over the place. In order for this issue to be actionable, let's focus on the README and leave the rest as-is since, as indicated, there are good reasons for the code to be like this.

Let's discuss how the README could be improved to make getting started easier. Do you have examples of README files from other projects that you'd recommend we follow if they have a more clear structure? It could help to serve as inspiration to change our README file.

How do I type RenderTask? The PDFPageProxy.render type is RenderTask but I can't seem to be able to import the RenderTask def.

Please don't post unrelated questions here. This has _nothing_ to do with the README.

Given that https://github.com/mozilla/pdf.js/issues/12379#issuecomment-692955362 asked for more detailed information about what to improve in the README, but none has been provided, should we close this for now until more actionable information is available?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BrennanDuffey picture BrennanDuffey  路  3Comments

sujit-baniya picture sujit-baniya  路  3Comments

AlexP3 picture AlexP3  路  3Comments

SehyunPark picture SehyunPark  路  3Comments

PeterNerlich picture PeterNerlich  路  3Comments