Pdf.js: total is undefined in LoadingTask.onProgress

Created on 4 Nov 2017  路  17Comments  路  Source: mozilla/pdf.js

Attach (recommended) or Link to PDF file here:
https://jsfiddle.net/p3ybwp7d/1/

Configuration:

  • Web browser and its version: Google Chrome - Version 62.0.3202.75
  • Operating system and its version: Mac OS 10.13
  • PDF.js version: 1.10.97
  • Is a browser extension: No

Steps to reproduce the problem:

Create loading task and add onProgress callback:

let loadingTask: any = PDFJS.getDocument(this.src);

loadingTask.onProgress = (progressData) => {
  // progressData won't contain "total", only "loaded"
};

What is the expected behaviour? (add screenshot)
In previous versions, onProgress did return both total and loaded.

What went wrong? (add screenshot)
total field is undefined in loadingTask.onProgress callback.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
https://jsfiddle.net/p3ybwp7d/1/

1-core

Most helpful comment

It's related to fetch, since you can observe the same issue in Firefox too (with the dom.streams.enabled and javascript.options.streams prefs set in about:config).

All 17 comments

Is this Chrome-specific? In Firefox I get the following output, which looks good:

PDF.js ProgressData
{"loaded":14480,"total":1016315}
{"loaded":836400,"total":1016315}
{"loaded":1016315,"total":1016315}

It's related to fetch, since you can observe the same issue in Firefox too (with the dom.streams.enabled and javascript.options.streams prefs set in about:config).

@timvandermeij , Can I try this issue ?

Since this issue can be observed in two different browsers, are we sure that this isn't a problem with the Fetch standard[1] itself?
If it's a Fetch standard limitation, or perhaps a browser one, then I'm not sure if we'd be reasonably able to fix this in the PDF.js library.


[1] This looks like the relevant part of the specification: https://fetch.spec.whatwg.org/#terminology-headers

Basically while using the Firefox browser, total is defined in this manner
https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/network.js#L427
and while using Chrome browser, totalis defined in this way because it uses the fetch_stream
https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/fetch_stream.js#L152
So basically this._contentLength is undefined in both cases but in the first case data.total contains the total count so it doesn't cause a problem. So we can try to do something similar in the second case too.


Shall I give this a try ?

So basically this._contentLength is undefined in both cases but in the first case data.total contains the total count so it doesn't cause a problem.

In fetch_stream we are setting this._contentLength = source.length here and both source.length and data.total are logically same things. So I don't think doing the way you are thinking going to solve the problem.

@mukulmishra18 , ok let me look into this.

@mukulmishra18 , could you explain when is the PDFFetchStream constructor called.https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/fetch_stream.js#L34 I just want to see from where is the argument source passed. As source.length turns out to be undefined

First we are setting PDFNetworkStream at https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/pdf.js#L35-L45

based on the environment and support for stream. So if streaming is supported by the browser then PDFNetworkStream is PDFFetchStream.

After this we are calling this constructor with all the provided params from: https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/api.js#L255

If you want to see what these params are, you can read here: https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/api.js#L98-L140

@mukulmishra18 , Thanks for all this explanation and now I can understand things better.


In fetch_stream we are setting this._contentLength = source.length here and both source.length and data.total are logically same things.

But when I debugged this I found out that in both the cases i.e. be it fetchStream or not , the source.length is always undefined. I also found out that in the source object there exists no length parameter

https://github.com/mozilla/pdf.js/blob/fad2a3f427db76033873200b77ecb137420a7119/src/display/network.js#L433
In this line data.total contains the value of total size

https://github.com/mozilla/pdf.js/blob/fad2a3f427db76033873200b77ecb137420a7119/src/display/network.js#L142
In this line evt is actually the data object used above. So should I try doing something like this in the fetchStream case too.
Or
Can you suggest me something ? Because I can see that @Snuffleupagus was right on this as I think this is a fetch API limitation

Can you suggest me something ?

I will suggest you to create a simple PDF.js app and try to run in Chrome and check if you are getting right headers(especially Content-Length) somewhere: https://github.com/mozilla/pdf.js/blob/237bc2ef9df204069c4996e14433e0a35123444a/src/display/fetch_stream.js#L106

I also think it may be a problem of fetch standard or browser as mentioned in https://github.com/mozilla/pdf.js/issues/9103#issuecomment-357745218. If that is the case, we can't do a lot in PDF.js to fix this.

I will suggest you to create a simple PDF.js app

I have been using the app all the time, mentioned in https://github.com/mozilla/pdf.js/issues/9103#issue-271184807

check if you are getting right headers(especially Content-Length) somewhere:

That's what I'm trying to say that I can't find the content.length and it always comes out to be undefined. So even I now think that this is a short coming of the Fetch API

FYI, _contentLength is set at:

https://github.com/mozilla/pdf.js/blob/6b7e2cbcd1fbfd68c17f92178ce47df7f6665c31/src/display/fetch_stream.js#L104-L115

and the value originates from https://github.com/mozilla/pdf.js/blob/6b7e2cbcd1fbfd68c17f92178ce47df7f6665c31/src/display/network_utils.js#L42-L47

But this value is guarded behind range requests. Before the above snippet, the function returns early if Range requests are disabled:
https://github.com/mozilla/pdf.js/blob/6b7e2cbcd1fbfd68c17f92178ce47df7f6665c31/src/display/network_utils.js#L23-L42

I don't see an immediate reason for blocking that, so perhaps it makes sense to unconditionally use the value of the Content-Length header (unless Transfer-Encoding is specified but not starting with identity).

Hi! My first comment in GitHub, sorry if I make a mistake.
I kind of found a solution:
The suggestedLength of returnValues is never really updated with the length calculated.

So I did:

returnValues.suggestedLength = length ; ,
before any "if" .

Also the Http header must match, so attention with Content-Length and Content-length (case sensitive)

We just updated to 1.10.97 which broke our loading task. As a (hopefully temporary) workaround, we do a header-only fetch for the content-length header beforehand:

const total = await fetch(new Request(documentUrl, { method: 'HEAD', credentials: 'include' }))
    .then(res => parseInt(res.headers.get('content-length'), 10));

const loadingTask = window.PDFJS.getDocument({url: documentUrl, withCredentials: true});
loadingTask.onProgress = ({ loaded }) => {
    // do stuff with `loaded` and `total`
};

Adding good-beginner-bug label because I've already explained the issue and how it can be fixed in https://github.com/mozilla/pdf.js/issues/9103#issuecomment-363436612

can anybody help me with the architecture of pdf.js . i'm working on the project.

Was this page helpful?
0 / 5 - 0 ratings