Pdf.js: How to check if range-requests are in use?

Created on 18 May 2017  路  18Comments  路  Source: mozilla/pdf.js

We are having a hard time serving large PDFs to our customers with pdf.js. Some investigation learned us that the concept of "range requests" could fix this. Therefore we tried to generate a fastWebView-enabled PDF with ghostscript:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dFastWebView=true

The generated PDF is being served by Apache/2.4.18 which (correct me if I'm wrong) supports range requests.

Now how can I test that pdf.js actually uses range requests?

Most helpful comment

PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters

All 18 comments

Now how can I test that pdf.js actually uses range requests?

There is no diagnostics information coming from PDF.js core yet. However browser console shall have 206 responses in the network monitor. If you don't see 206s for files more than 128k, then there is a problem with server -- inspect request and response HTTP headers for initial XHR.

Please notice some WebKit-based browser still have a defect with caching such requests, so we are disabling that for them (e.g. Safari).

Closing as answered. Provide more concrete information/example for better explanation. See also https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range

Attaching screenshot for expected range request activity:

screen shot 2017-05-18 at 8 13 23 am

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

Correct.

@rogierlommers PDF.js will make an attempt to load entire PDF with first XHR and when headers come it will abort the fetch. With local connections, you might not see that since it's really fast. Try to do it at remote server. Also pay attention to the caching -- it's okay for content to be cached, but that means you might be receiving entire PDF from first XHR.

(Assuming you guys are working on the same problem) See also #8425

See that first 200 has only 4.0kb in length:

screen shot 2017-05-19 at 8 34 41 am

Sorry for all my questions, but please have a look at attached screenshot. As you can see, I get 206s, indicating that range-requests are working fine. Right? But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

screen shot 2017-05-22 at 07 44 15

But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

I don't understand what file and what is expected? By looking at 9789027673633.pdf 200 response, it downloaded 12.7kb, next 206 response asked 64.3kb. Unless your file only 12.7kb, then your next 206 requests/responses look fishy.

Then we have a different understanding of this feature. My assumption was that:

  • if a PDF is web-optimzed
  • pdf.js only downloads the first x bytes
  • until the user selects other pages of the document
  • then the bytes corresponding to the other pages will be downloaded

Now my conclusion is that

  • pdf.js starts downloading a web-optimized pdf
  • if page 1 is succesfully downloaded, it starts render this page client-side
  • and continue downloading the remaining bytes of the document (regardless if the user has selected/requested these pages

PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters

Thanks; it all works fine now.

Hi @yurydelendik ,

I'm having a similar issue and I had a question about your comment:

See that first 200 has only 4.0kb in length:

So what should the response body be for the initial response with the 200 code before a range request has been made by pdf.js? Can the body be empty, for example, as long as there is a response header Accept-Ranges: bytes response header? Will that trigger pdf.js to make a range request?

Thanks

All http responses needs to be valid, so first response must be piped in-full until it's cancelled.

@yurydelendik Thanks for your response.

I'm wondering if it was ever discussed just performing range requests from the outset (maybe configurable via a param) ?

I noticed the RFC says:

A client MAY generate range requests without having received this header field for the resource involved .

This would potentially help the server (depending on how it was implemented) so it wouldn't have to load the whole document for that initial request.

And then the client, in those cases, wouldn't need to cancel that initial response and then switch to range requests. Wouldn't that be simpler?

Thanks

@dlandis sorry, I don't follow your thoughts. There is an option to override default behavior -- you can implement PDFDataRangeTransport with only HTTP range requests. It's not possible in general case IMHO.

you can implement PDFDataRangeTransport with only HTTP range requests

@yurydelendik Thanks again, it sounds like that is what I need. I don't suppose there is an example?

@yurydelendik i have some issue in downloading the pdf using range requests. I am using spring boot application to provide download service. viewer.html makes 1st request which is cancelled since service supports range request and initiates partial request which is as expected but there are no further requests from browser, Its just one where i am expecting it to request till whole pdf is downloaded. Is there any special header that needs to be added in response so that browser sends all request to service.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aaronshaf picture aaronshaf  路  3Comments

liuzhen2008 picture liuzhen2008  路  4Comments

kleins05 picture kleins05  路  3Comments

timvandermeij picture timvandermeij  路  4Comments

xingxiaoyiyio picture xingxiaoyiyio  路  3Comments