I think is a big oversight forgetting to add this, I had to open package.json to figure out if was even on npm... I haven't seen a readme without npm install ...
for long time. and this seem like a big project.
And this installs "pdf.js": "^0.1.0",
wtf... OMG is actually installing another package https://github.com/stanfeldman/pdf.js
, what a mess. Please edit package.json to be at least different, people like me would install the wrong package, which could even be malware...
But... how to use this with the regular npm install
?
Ok, I saw now burried in a block of text.. is npm install pdfjs-dist
... I think should be way way more obvious.
IMO people that use it is much higher than people that contribute, therefore I think you should have guide for users first and a guide for contribution later.
Tutorial could be so much better. I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples . When I first read I feel like I'm in 2010 using jquery...
Install: yarn add pdfjs-dist @types/pdfjs-dist
(replace yarn add
for npm install
if you use npm. Also remove @types/pdfjs-dist
if you don't use typescript. (anyone still uses npm or vanilla js??)
Webpack specific code (if you use yarn, you most likely use webpack or a bundler) so I think this example makes more sense than using the cdn
// We need to get the url of the worker (we use min for prod)
import workerSrc from '!!file-loader!pdfjs-dist/build/pdf.worker.min.js'
// Use require because import doesn't work for some obscure reason. Also use `webpackChunkName` so it will not bundle this huge lib in your main code
const pdfjsLib = require(/* webpackChunkName: "pdfjs-dist" */ `pdfjs-dist`)
// Now you assign the worker file path to the `pdfjsLib` (yes, it's that cumbersome)
pdfjsLib.GlobalWorkerOptions.workerSrc = workerSrc
Add a <canvas id="pdf-canvas"/>
to your HTML.
Then in a js file put:
// Loads the actual pdf
const pdf = await pdfjsLib.getDocument(props.url).promise
// Opens page 1?
const page = await pdf.getPage(1)
// Creates or gets a viewport?
const viewport = page.getViewport({scale: 1})
const canvas = document.getElementById('pdf-canvas')
const context = canvas.getContext('2d')
canvas.width = 600
canvas.height = 300
const renderContext = {
canvasContext: context,
viewport: viewport,
}
await page.render(renderContext).promise
This is what I'd expect from a tutorial TBH. Yes, I complain about it, but I also give you a way to improve.
Why I think this is better?
All these small details gives me the impression of a poorly written library. (even if it might not be)
canvas.height = viewport.height;
canvas.width = viewport.width;
Why in the world would you switch width, height order to height, width? Just wasted time to fix bug where I expected width and height ordering.
Mentions await in many places, but tutorial is loadingTask.promise.then
... let's write some nice code with async
await
var
s, no arrow functions...
const loadingTask = pdfjsLib.getDocument(props.url)
, loadingTask is weird
the-canvas
... come on..
I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples
Generally speaking: Most of the examples provided are purposely generic, and not framework/bundler specific, since it's very difficult to maintain and support a wide range of different of frameworks/bundlers for the regular PDF.js contributors.
Note that https://mozilla.github.io/pdf.js/examples/index.html#hello-world-walkthrough contain links to additional examples.
Webpack specific code [...]
For the reasons outlined above, we don't want to advertise e.g. Webpack specifically in basic examples; however the examples folder contains the following: https://github.com/mozilla/pdf.js/tree/master/examples/webpack
[...] so I think this example makes more sense than using the cdn
It may be more appropriate for your use-case, but everyone probably isn't using Webpack,
Also remove
@types/pdfjs-dist
if you don't use typescript.
No one should be using those, since they're not official and most likely neither up-to-date or correct.
However, official TypeScript definitions are being distributed here: https://github.com/mozilla/pdfjs-dist/tree/master/types
canvas.width = 600
canvas.height = 300
Note that simply hard-coding of values, without any regard for the viewport
, may have generally unintended side-effects for many PDF documents.
Why in the world would you switch width, height order to height, width?
It's hopefully obvious that mistakes can/will happen, since everyone is human. (Also, this really seem like a minor issue for rather than a huge "bug"..)
Mentions await in many places, but tutorial is
loadingTask.promise.then
... let's write some nice code withasync
await
var
s, no arrow functions...
The general PDF.js library is still usable in environments/browsers without native support for those features, hence we purposely don't use things like async
/await
in general examples; note e.g. the two different kind of builds mentioned in https://mozilla.github.io/pdf.js/getting_started/#download
const loadingTask = pdfjsLib.getDocument(props.url)
, loadingTask is weird
What's weird about it?
The getDocument
call returns a PDFDocumentLoadingTask
-instance, see
https://github.com/mozilla/pdf.js/blob/b0c7a74a0c7403aba86087f8180d75c16ec58602/src/display/api.js#L196-L208
and
https://github.com/mozilla/pdf.js/blob/b0c7a74a0c7403aba86087f8180d75c16ec58602/src/display/api.js#L438-L468
the-canvas
... come on..
Huh?
I see what you mean, but still, is hard to get started... I didn't even find the webpack example. I'd say there is a discoverability issue on the readme. I'd suggest you to look at other libraries and take a look on how they architect their readme. I don't usually take my time to write so much, unless I think there is a big room for improvement.
I don't have time to read everything on the readme to find examples and then webpack. See what happened:
npm
on readme: found a few results, somehow I missed the text usage with NPM and Bower under the pdfjs-dist name.
. My patter recognition didn't notice a npm install blabla
so I missed. typescript
: no results. then my default is to use @types/blabla
. Again, apparently I got the wrong librarywebpack
: no results. I guess I should have found More examples can be found in the examples folder.
, press that link, then webpack... but didn't happen.All these details makes it hard to get started. For example as a user of this library I don't even want to look at the source code, especially when I'm in rush. (I know is superficial, but sometimes you can't read source code of every lib you use, especially if I'm only trying out to do a proof of concept)
Also https://github.com/mozilla/pdf.js/blob/master/examples/webpack/main.js
pdfjsLib.GlobalWorkerOptions.workerSrc =
"../../build/webpack/pdf.worker.bundle.js";
I'm not sure that's something to suggest, would be nice to give an example that can be used by a noobie. That import is a hack. Also still not async/await and vars. (but not a big deal)
I didn't even find the webpack example. I'd say there is a discoverability issue on the readme.
We don't want to advertise all individual examples separately in the README since it would make it much bigger than necessary and big README files may be even less inviting to actually read. In my experience it's pretty common to just Ctrl + F for "example" and find them.
I don't have time to read everything on the readme to find examples and then webpack.
To be fair, we can change the README all we want, but if people can't be bothered to actually read the README then you can't be surprised that you're missing information...
I searched for npm on readme: [..] somehow I missed the text usage with NPM and Bower under the pdfjs-dist name. My patter recognition didn't notice a npm install blabla so I missed.
I think NPM is listed pretty clear with a link to a page with more details, including the pdfjs-dist
name. The README should be concise, so we can't put all details in there. The "somehow I missed " and "pattern recognition didn't notice" to me just seem to indicate that the search was incomplete.
I searched for typescript: no results.
According to https://www.typescriptlang.org/docs/handbook/declaration-files/publishing.html#including-declarations-in-your-npm-package the types
property in the pdjs-dist
package is the standard way of publishing those types, and we do this in https://github.com/mozilla/pdfjs-dist/blob/master/package.json#L5. We only ship TypeScript definitions in the final NPM package, so I don't know what there would be to mention about TypeScript here...
sometimes you can't read source code of every lib you use, especially if I'm only trying out to do a proof of concept
I agree that you shouldn't need to read the source code, but even for a POC you'll need to do _some_ reading of e.g., READMEs and example code.
All in all, this issue contains _a lot_ of different points about not only the README but also the examples and the source code syntax, so it's kind of all over the place. In order for this issue to be actionable, let's focus on the README and leave the rest as-is since, as indicated, there are good reasons for the code to be like this.
Let's discuss how the README could be improved to make getting started easier. Do you have examples of README files from other projects that you'd recommend we follow if they have a more clear structure? It could help to serve as inspiration to change our README file.
How do I type RenderTask? The PDFPageProxy.render type is RenderTask but I can't seem to be able to import the RenderTask def.
Please don't post unrelated questions here. This has _nothing_ to do with the README.
Given that https://github.com/mozilla/pdf.js/issues/12379#issuecomment-692955362 asked for more detailed information about what to improve in the README, but none has been provided, should we close this for now until more actionable information is available?
Most helpful comment
Tutorial could be so much better. I'm talking about https://mozilla.github.io/pdf.js/examples/index.html#interactive-examples . When I first read I feel like I'm in 2010 using jquery...
How to actually use it with yarn/npm
Installation
Install:
yarn add pdfjs-dist @types/pdfjs-dist
(replaceyarn add
fornpm install
if you use npm. Also remove@types/pdfjs-dist
if you don't use typescript. (anyone still uses npm or vanilla js??)Setup
Webpack specific code (if you use yarn, you most likely use webpack or a bundler) so I think this example makes more sense than using the cdn
Usage
Add a
<canvas id="pdf-canvas"/>
to your HTML.Then in a js file put:
This is what I'd expect from a tutorial TBH. Yes, I complain about it, but I also give you a way to improve.
Why I think this is better?
Minor details in guides
All these small details gives me the impression of a poorly written library. (even if it might not be)
Why in the world would you switch width, height order to height, width? Just wasted time to fix bug where I expected width and height ordering.
Promises async await
Mentions await in many places, but tutorial is
loadingTask.promise.then
... let's write some nice code withasync
await
Still on es5?
var
s, no arrow functions...Naming
const loadingTask = pdfjsLib.getDocument(props.url)
, loadingTask is weirdthe-canvas
... come on..