Pdf.js: How to cancel or destroy a getPage request with disableAutoFetch set

Created on 27 Dec 2019 · 8Comments · Source: mozilla/pdf.js

Dear pdf.js contributors,

With disableAutoFetch set, is there a way to cancel fetching on getPage() ? The same way one destroy() getDocument promise

It looks like it is possible but I found only internal functions.

Best, A.

Source

arelaxend

All 8 comments

With disableAutoFetch set, is there a way to cancel fetching on getPage() ?

Huh, calling getPage is what causes data to be requested (there's no cancelling involved); it's quite frankly difficult to understand what you're trying to ask here.

Snuffleupagus on 27 Dec 2019

👍1

it's quite frankly difficult to understand what you're trying to ask here.

Oups. With disableAutoFetch off and disableStreaming off, whenever one calls getDocument starts fetching the entire file. One can cancel the fetching by calling destroy() on the promise, it is going to stop the GET request.

const task = pdfjs.getDocument(...);
...
if (task !== undefined) {
  await task.destroy();
  delete task;
}

With disableAutoFetch set, fetching occurs just after getPage, but there is no destroy() to cancel the GET 206 range request in case one wants to. For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

Still, it looks like it is possible to cancel the _transport, but if one does that it is going to cancel all future requests.

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/api.js#L423-L424

calling getPage is what causes data to be requested (there's no cancelling involved)

Absolutely, the best way is not to call getPage if one should not. Still, this is not the point here 💯 and it is also better not to call getDocument if one should not.

Is the following a workaround ?

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/transport_stream.js#L131-L140

arelaxend on 27 Dec 2019

[...] but there is no destroy() to cancel the GET 206 request in case one wants to.

There's no way of doing what you're asking, short of destroying the loadingTask itself (and thus closing the entire document).

For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation). Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Hence what you're asking for isn't possible, nor will it be supported either unfortunately (as outlined above, and the use-case seems fairly specialized anyway).

Snuffleupagus on 27 Dec 2019

Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Ok. In my use case, I fetch say page [-1, current, 1] whenever the user moves to a page. If a user moves fast to another current page, I am going to cancelAllRequests().

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/transport_stream.js#L131-L139

Wait until all the requests are cancelled, and fetch the new [-1, current, 1] pages.
My question is: does cancelAllRequests() the best option for such scenario ?

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation).

arelaxend on 27 Dec 2019

I am going to cancelAllRequests().

As explained in https://github.com/mozilla/pdf.js/issues/11453#issuecomment-569335735 that will easily lead to all kinds of breakage, and isn't something that you should be calling manually (it's being used from WorkerTransport.destroy).

I am currently using PDFDataRangeTransport implementation for range requests.

Please note that the default range request functionality in PDF.js isn't in any way connected with PDFDataRangeTransport, so unless you're using the API along the lines below then you're not actually using PDFDataRangeTransport.

const loadingTask = getDocument({
  range: /* custom PDFDataRangeTransport here */,
  //  more parameters here
});

Snuffleupagus on 27 Dec 2019

OK. I am going to setTimeout(() => getPage(), 200); and clearTimeout() the timeouts in ref. to your first comment

calling getPage is what causes data to be requested (there's no cancelling involved)

What is the purpose of PDFDataRangeTransport ? Extending the range capabilities ? I found no examples or use cases out there

Thank you for all your tips @Snuffleupagus 👍

arelaxend on 27 Dec 2019

What is the purpose of PDFDataRangeTransport ?

It allows completely custom data delivery, that you thus can implement in what ever way you want/need in your case (it's being used in the PDF Viewer that's built-in to the Firefox browser).

While it does allow a great deal of flexibility, it's consequently a fair bit more complex than just providing a URL when calling getDocument :-)