Pdf.js: How to check if range-requests are in use?

Created on 18 May 2017 · 18Comments · Source: mozilla/pdf.js

We are having a hard time serving large PDFs to our customers with pdf.js. Some investigation learned us that the concept of "range requests" could fix this. Therefore we tried to generate a fastWebView-enabled PDF with ghostscript:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dFastWebView=true

The generated PDF is being served by Apache/2.4.18 which (correct me if I'm wrong) supports range requests.

Now how can I test that pdf.js actually uses range requests?

Source

rogierlommers

👍3

Most helpful comment

PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters

yurydelendik on 22 May 2017

👍3

All 18 comments

Now how can I test that pdf.js actually uses range requests?

There is no diagnostics information coming from PDF.js core yet. However browser console shall have 206 responses in the network monitor. If you don't see 206s for files more than 128k, then there is a problem with server -- inspect request and response HTTP headers for initial XHR.

Please notice some WebKit-based browser still have a defect with caching such requests, so we are disabling that for them (e.g. Safari).

Closing as answered. Provide more concrete information/example for better explanation. See also https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range

yurydelendik on 18 May 2017

Attaching screenshot for expected range request activity:

screen shot 2017-05-18 at 8 13 23 am

yurydelendik on 18 May 2017

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

rogierlommers on 19 May 2017

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

Correct.

@rogierlommers PDF.js will make an attempt to load entire PDF with first XHR and when headers come it will abort the fetch. With local connections, you might not see that since it's really fast. Try to do it at remote server. Also pay attention to the caching -- it's okay for content to be cached, but that means you might be receiving entire PDF from first XHR.

yurydelendik on 19 May 2017

(Assuming you guys are working on the same problem) See also #8425

yurydelendik on 19 May 2017

See that first 200 has only 4.0kb in length:

screen shot 2017-05-19 at 8 34 41 am

yurydelendik on 19 May 2017

Sorry for all my questions, but please have a look at attached screenshot. As you can see, I get 206s, indicating that range-requests are working fine. Right? But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

screen shot 2017-05-22 at 07 44 15

rogierlommers on 22 May 2017

But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

I don't understand what file and what is expected? By looking at 9789027673633.pdf 200 response, it downloaded 12.7kb, next 206 response asked 64.3kb. Unless your file only 12.7kb, then your next 206 requests/responses look fishy.

yurydelendik on 22 May 2017

Then we have a different understanding of this feature. My assumption was that:

if a PDF is web-optimzed
pdf.js only downloads the first x bytes
until the user selects other pages of the document
then the bytes corresponding to the other pages will be downloaded

Now my conclusion is that

pdf.js starts downloading a web-optimized pdf
if page 1 is succesfully downloaded, it starts render this page client-side
and continue downloading the remaining bytes of the document (regardless if the user has selected/requested these pages

rogierlommers on 22 May 2017

yurydelendik on 22 May 2017

👍3

Thanks; it all works fine now.

rogierlommers on 23 May 2017

Hi @yurydelendik ,

I'm having a similar issue and I had a question about your comment:

See that first 200 has only 4.0kb in length:

So what should the response body be for the initial response with the 200 code before a range request has been made by pdf.js? Can the body be empty, for example, as long as there is a response header Accept-Ranges: bytes response header? Will that trigger pdf.js to make a range request?

Thanks

dlandis on 18 Oct 2017

All http responses needs to be valid, so first response must be piped in-full until it's cancelled.

yurydelendik on 18 Oct 2017

@yurydelendik Thanks for your response.

I'm wondering if it was ever discussed just performing range requests from the outset (maybe configurable via a param) ?

I noticed the RFC says:

A client MAY generate range requests without having received this header field for the resource involved .

This would potentially help the server (depending on how it was implemented) so it wouldn't have to load the whole document for that initial request.

And then the client, in those cases, wouldn't need to cancel that initial response and then switch to range requests. Wouldn't that be simpler?

Thanks

dlandis on 18 Oct 2017

@dlandis sorry, I don't follow your thoughts. There is an option to override default behavior -- you can implement PDFDataRangeTransport with only HTTP range requests. It's not possible in general case IMHO.

yurydelendik on 18 Oct 2017

you can implement PDFDataRangeTransport with only HTTP range requests

@yurydelendik Thanks again, it sounds like that is what I need. I don't suppose there is an example?

dlandis on 18 Oct 2017

See e.g. tests at https://github.com/mozilla/pdf.js/blob/master/test/unit/api_spec.js#L1277

yurydelendik on 18 Oct 2017

@yurydelendik i have some issue in downloading the pdf using range requests. I am using spring boot application to provide download service. viewer.html makes 1st request which is cancelled since service supports range request and initiates partial request which is as expected but there are no further requests from browser, Its just one where i am expecting it to request till whole pdf is downloaded. Is there any special header that needs to be added in response so that browser sends all request to service.