We are having a hard time serving large PDFs to our customers with pdf.js. Some investigation learned us that the concept of "range requests" could fix this. Therefore we tried to generate a fastWebView
-enabled PDF with ghostscript:
gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dFastWebView=true
The generated PDF is being served by Apache/2.4.18 which (correct me if I'm wrong) supports range requests.
Now how can I test that pdf.js actually uses range requests?
Now how can I test that pdf.js actually uses range requests?
There is no diagnostics information coming from PDF.js core yet. However browser console shall have 206 responses in the network monitor. If you don't see 206s for files more than 128k, then there is a problem with server -- inspect request and response HTTP headers for initial XHR.
Please notice some WebKit-based browser still have a defect with caching such requests, so we are disabling that for them (e.g. Safari).
Closing as answered. Provide more concrete information/example for better explanation. See also https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range
Attaching screenshot for expected range request activity:
I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?
I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?
Correct.
@rogierlommers PDF.js will make an attempt to load entire PDF with first XHR and when headers come it will abort the fetch. With local connections, you might not see that since it's really fast. Try to do it at remote server. Also pay attention to the caching -- it's okay for content to be cached, but that means you might be receiving entire PDF from first XHR.
(Assuming you guys are working on the same problem) See also #8425
See that first 200 has only 4.0kb in length:
Sorry for all my questions, but please have a look at attached screenshot. As you can see, I get 206s, indicating that range-requests are working fine. Right? But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.
But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.
I don't understand what file and what is expected? By looking at 9789027673633.pdf 200 response, it downloaded 12.7kb, next 206 response asked 64.3kb. Unless your file only 12.7kb, then your next 206 requests/responses look fishy.
Then we have a different understanding of this feature. My assumption was that:
Now my conclusion is that
PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters
Thanks; it all works fine now.
Hi @yurydelendik ,
I'm having a similar issue and I had a question about your comment:
See that first 200 has only 4.0kb in length:
So what should the response body be for the initial response with the 200 code before a range request has been made by pdf.js? Can the body be empty, for example, as long as there is a response header Accept-Ranges: bytes
response header? Will that trigger pdf.js to make a range request?
Thanks
All http responses needs to be valid, so first response must be piped in-full until it's cancelled.
@yurydelendik Thanks for your response.
I'm wondering if it was ever discussed just performing range requests from the outset (maybe configurable via a param) ?
I noticed the RFC says:
A client MAY generate range requests without having received this header field for the resource involved .
This would potentially help the server (depending on how it was implemented) so it wouldn't have to load the whole document for that initial request.
And then the client, in those cases, wouldn't need to cancel that initial response and then switch to range requests. Wouldn't that be simpler?
Thanks
@dlandis sorry, I don't follow your thoughts. There is an option to override default behavior -- you can implement PDFDataRangeTransport with only HTTP range requests. It's not possible in general case IMHO.
you can implement PDFDataRangeTransport with only HTTP range requests
@yurydelendik Thanks again, it sounds like that is what I need. I don't suppose there is an example?
@yurydelendik i have some issue in downloading the pdf using range requests. I am using spring boot application to provide download service. viewer.html makes 1st request which is cancelled since service supports range request and initiates partial request which is as expected but there are no further requests from browser, Its just one where i am expecting it to request till whole pdf is downloaded. Is there any special header that needs to be added in response so that browser sends all request to service.
Most helpful comment
PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters