Pdf.js: How to cancel or destroy a getPage request with disableAutoFetch set

Created on 27 Dec 2019  路  8Comments  路  Source: mozilla/pdf.js

Dear pdf.js contributors,

With disableAutoFetch set, is there a way to cancel fetching on getPage() ? The same way one destroy() getDocument promise

It looks like it is possible but I found only internal functions.

Best, A.

All 8 comments

With disableAutoFetch set, is there a way to cancel fetching on getPage() ?

Huh, calling getPage is what causes data to be requested (there's no cancelling involved); it's quite frankly difficult to understand what you're trying to ask here.

it's quite frankly difficult to understand what you're trying to ask here.

Oups. With disableAutoFetch off and disableStreaming off, whenever one calls getDocument starts fetching the entire file. One can cancel the fetching by calling destroy() on the promise, it is going to stop the GET request.

const task = pdfjs.getDocument(...);
...
if (task !== undefined) {
  await task.destroy();
  delete task;
}

With disableAutoFetch set, fetching occurs just after getPage, but there is no destroy() to cancel the GET 206 range request in case one wants to. For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

Still, it looks like it is possible to cancel the _transport, but if one does that it is going to cancel all future requests.

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/api.js#L423-L424

calling getPage is what causes data to be requested (there's no cancelling involved)

Absolutely, the best way is not to call getPage if one should not. Still, this is not the point here 馃挴 and it is also better not to call getDocument if one should not.

Is the following a workaround ?

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/transport_stream.js#L131-L140

[...] but there is no destroy() to cancel the GET 206 request in case one wants to.

There's no way of doing what you're asking, short of destroying the loadingTask itself (and thus closing the entire document).

For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation). Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Hence what you're asking for isn't possible, nor will it be supported either unfortunately (as outlined above, and the use-case seems fairly specialized anyway).

Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Ok. In my use case, I fetch say page [-1, current, 1] whenever the user moves to a page. If a user moves fast to another current page, I am going to cancelAllRequests().

https://github.com/mozilla/pdf.js/blob/c3a1c679500e1e4cfa13291f2bcd58ebbba2eb14/src/display/transport_stream.js#L131-L139

Wait until all the requests are cancelled, and fetch the new [-1, current, 1] pages.
My question is: does cancelAllRequests() the best option for such scenario ?

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation).

I am going to cancelAllRequests().

As explained in https://github.com/mozilla/pdf.js/issues/11453#issuecomment-569335735 that will easily lead to all kinds of breakage, and isn't something that you should be calling manually (it's being used from WorkerTransport.destroy).

I am currently using PDFDataRangeTransport implementation for range requests.

Please note that the default range request functionality in PDF.js isn't in any way connected with PDFDataRangeTransport, so unless you're using the API along the lines below then you're not actually using PDFDataRangeTransport.

const loadingTask = getDocument({
  range: /* custom PDFDataRangeTransport here */,
  //  more parameters here
});

OK. I am going to setTimeout(() => getPage(), 200); and clearTimeout() the timeouts in ref. to your first comment

calling getPage is what causes data to be requested (there's no cancelling involved)

What is the purpose of PDFDataRangeTransport ? Extending the range capabilities ? I found no examples or use cases out there

Thank you for all your tips @Snuffleupagus 馃憤

What is the purpose of PDFDataRangeTransport ?

It allows completely custom data delivery, that you thus can implement in what ever way you want/need in your case (it's being used in the PDF Viewer that's built-in to the Firefox browser).

While it does allow a great deal of flexibility, it's consequently a fair bit more complex than just providing a URL when calling getDocument :-)

I found no examples or use cases out there

There's the API unit-tests and also the default viewer usages here, here, here and finally here and here.

Closing as answered by the comments above.

Was this page helpful?
0 / 5 - 0 ratings