I have been attempting to implement a find bar into my pdf viewer, which uses pdf.js. I have noticed that the demo viewer makes use of pdf_find_bar.js for this functionality, however, it is not very clear how it is exactly implemented.
Are there any examples/documentation on how to implement this feature into my viewer? For a starter, how would such a feature be implemented into the examples/components/simpleviewer?
Just a day ago we merged a pull request, #10099, that dropped the dependency of the find bar on the find controller, which makes integration a bit easier.
The find bar is created and controlled in web/app.js
, specifically in https://github.com/mozilla/pdf.js/blob/master/web/app.js#L355. It requires a few HTML element to be available, for which you can pass in the elements via https://github.com/mozilla/pdf.js/blob/master/web/pdf_find_bar.js#L31-L40. The main listener is https://github.com/mozilla/pdf.js/blob/master/web/pdf_find_bar.js#L49-L51, which triggers if the user types in the find input field. It will put the find
event on the event bus at https://github.com/mozilla/pdf.js/blob/master/web/pdf_find_bar.js#L94, which is caught at https://github.com/mozilla/pdf.js/blob/ec10cae5b653b5c11530eaabd79fe39b4bb2e91f/web/app.js#L1326 and handled by https://github.com/mozilla/pdf.js/blob/ec10cae5b653b5c11530eaabd79fe39b4bb2e91f/web/app.js#L1943 which sends the command to the find controller to execute.
The find bar in https://github.com/mozilla/pdf.js/blob/master/web/pdf_find_bar.js is a simple component (class) that you can import and use as long as you provide the required options. However, since the find bar class is not exposed in the web bundle by default (see https://github.com/mozilla/pdf.js/blob/66ffdc4c5b63a135f201508264761fce17c08059/web/pdf_viewer.component.js), you'll most likely need to add it to your project yourself.
I think we're open for ideas on how to improve this to make third-party usage of our components easier. Perhaps we should integrate the PDFFindBar
class in the web bundle too?
Thank you for the thorough reply.
If I am understanding this correctly, are you saying that findBarConfig in this example is simply the html elements that are required for the find bar?
In other words, I could simply pass all of those elements into findbar as an object instead of using the logic at https://github.com/mozilla/pdf.js/blob/master/web/app.js#L353 that generates findBarConfig?
To add to this... Is it possible to use this find bar / find functionality if I am not using PDF.JS as my viewer? What I mean by this, is that I currently need full control over the implementation of each pdf page in my viewer, so I am using PDF.JS to convert the pdf, page by page, into png's which I then attach to my viewer canvas's.
I was able to implement the text layer with this method, however, I can not determine how heavily this find feature depends on the document being set via the PDFJS.pdfViewer() object.
If I am understanding this correctly, are you saying that findBarConfig in this example is simply the html elements that are required for the find bar? In other words, I could simply pass all of those elements into findbar as an object instead of using the logic at https://github.com/mozilla/pdf.js/blob/master/web/app.js#L353 that generates findBarConfig?
Yes, in fact the appConfig.findBar
is also an object as defined here: https://github.com/mozilla/pdf.js/blob/master/web/viewer.js#L131-L142
To add to this... Is it possible to use this find bar / find functionality if I am not using PDF.JS as my viewer?
Yes, that should be possible. If you have the text layer working, then the find bar and find controller should work just fine, especially since the pull request I mentioned above since that removes the need to pass in a PDFViewer
instance to the find controller. Instead, it's all event-based now, so if the text layer needs to be updated, an event is simply emitted so the text layer can handle it (and therefore also a custom text layer like yours). In short, how it works now is that the find bar emits an event when the user enters the search query, the find controller handles this event and prepares the matches and dispatches an event on the event bus for the text layer to be updated.
Hopefully this answers the questions; if not, feel free to ask! We're working more towards decoupling our components, for both internal and third-party usage. The pull request above is an example of that to reduce the number of direct dependencies, but there is also another PR open at the moment that aims to simplify the find functionality a bit more. If you see more things that can be improved to make third-party usage easier, do not hesitate to open a ticket.
Edit: The commit https://github.com/mozilla/pdf.js/pull/10123/commits/3f3ddaf541020b4d5cae27796b98d2a9cf8feb4a is just merged which makes the find bar slightly easier to use since you can provide the event bus directly and don't have to put it in the configuration object anymore.
I believe I am following along with what you are saying, however, I might still be slightly confused on the exact implementation.
From my understanding here are the steps needed to implement the find bar:
var findBar = new PDFFindBar(options)
Are these steps correct? Is anything else required to be done to the findBar object, or is it fully functional from that point?
Does PDFFindBar rely strictly on our text layer (if so, how does it locate it)? Does PDFFindBar assume that the entire text layer has already been rendered? For example, if part of the text layer has not been rendered because a page has not been accessed yet, will PDFFindBar still be able to find the text?
Are these steps correct? Is anything else required to be done to the findBar object, or is it fully functional from that point?
This sounds correct to me, but note that I did not test this myself.
Does PDFFindBar rely strictly on our text layer (if so, how does it locate it)?
The find bar dispatches an event that you can catch in your custom text layer to render the matches, so it doesn't need to locate it at all. However, the find controller does need access to the PDFDocument
(or a mock thereof) in order to be able to search the document's text content. If you don't have this, you may need to alter it to search in your custom text layer instead. The _extractText
method takes care of extracting the text content from the PDFDocument
instance, at https://github.com/mozilla/pdf.js/blob/master/web/pdf_find_controller.js#L344, but you could replace this if you already have the text content in your custom text layer.
Does PDFFindBar assume that the entire text layer has already been rendered? For example, if part of the text layer has not been rendered because a page has not been accessed yet, will PDFFindBar still be able to find the text?
The _extractText
method mentioned above takes care of getting the text content for all pages in an asynchronous manner, so you don't need to render all pages, they just need to be loaded to fetch the text content.
Given the info above I was able to get the findBar implemented and catching events. The code below is functioning and the webViewerFind() method is triggering correctly. At this point, I am still unsure how to verify that the textLayer is catching this event. After this event fires, I do not see any visual updates on my page. Is there anything else that I could be missing?
import { PDFFindBar } from "./PDFFindBar/pdf_find_bar.js";
import { PDFFindController } from './PDFFindBar/pdf_find_controller.js';
import { getGlobalEventBus } from './PDFFindBar/dom_events.js';
const dispatchToDOM = {
/** @type {boolean} */
value: false,
kind: 'viewer',
};
const eventBus = getGlobalEventBus(dispatchToDOM);
const findController = new PDFFindController({
linkService: document.pdfLinkService,
eventBus,
});
findController.setDocument('../sample.pdf');
var options = {
bar: document.getElementById('findbar'),
toggleButton: document.getElementById('viewFind'),
findField: document.getElementById('findInput'),
highlightAllCheckbox: document.getElementById('findHighlightAll'),
caseSensitiveCheckbox: document.getElementById('findMatchCase'),
entireWordCheckbox: document.getElementById('findEntireWord'),
findMsg: document.getElementById('findMsg'),
findResultsCount: document.getElementById('findResultsCount'),
findPreviousButton: document.getElementById('findPrevious'),
findNextButton: document.getElementById('findNext'),
}
var pdfFindBar = new PDFFindBar(options, eventBus);
eventBus.on('find', webViewerFind);
function webViewerFind(evt) {
findController.executeCommand('find' + evt.type, {
query: evt.query,
phraseSearch: evt.phraseSearch,
caseSensitive: evt.caseSensitive,
entireWord: evt.entireWord,
highlightAll: evt.highlightAll,
findPrevious: evt.findPrevious,
});
}
Something I still am not certain about... It seems that the pdfLinkService object needs to be attached to both the viewer and controller to allow text highlighting (as removing it from the components/simpleviewer.js demo breaks the highlighting feature). If this is correct, and I am not using the PDFViewer, how would I use the pdfLinkService?
First of all, are you using the latest code from the master
branch? We just recently merged a patch (https://github.com/mozilla/pdf.js/commit/2ed3591b2240466e194a5bec494800d511460e9d) that is most likely relevant here as it makes the find controller easier to use. Previously you needed to emit pagesinit
on the event bus in order for searching to start, which was not really intuitive and made it harder for third-party deployments. With the most recent code, just calling setDocument
is enough to allow searching to start. This may explain why you're not seeing anything coming from the find controller.
Aside from that, I think there are a few problems in this code:
getGlobalEventBus
function has dispatchToDOM
as a boolean parameter (see https://github.com/mozilla/pdf.js/blob/master/web/dom_events.js#L132), so passing an object probably won't work well. Simply use getGlobalEventBus()
since the default for dispatchToDOM
is already false
so you can remove the entire dispatchToDOM
object.setDocument
method takes a PDFDocument
object as input, so passing in a string will not work and therefore no search events will trigger from the controller. To get the PDFDocument
object, you need to read the file like https://github.com/mozilla/pdf.js/blob/master/examples/components/pageviewer.js#L40-L44, which gives you the pdfDocument
parameter that you can use. It looks like you already read the file somewhere because of your custom text layer, so most likely you can just insert the PDFDocument
object you obtained for that.Finally, to answer your question about the link service: PDFViewer
takes care of initializing the link service normally, but you don't strictly need that if you can initialize it yourself. Commit https://github.com/mozilla/pdf.js/commit/e0c811f2ede8183b482b94023a7b895d04494892#diff-485f8990604b45b959a20e90265d8044 introduces the link service in the find controller to remove some dependencies on exactly PDFViewer
to make it easier for internal and third-party usage, so the properties we use there must be set in the link service by you manually if you don't let PDFViewer
take care of it.
As far as I can tell, it should work after you make these changes. You should start seeing the updatetextlayermatches
being fired on the event bus, which your custom text layer can listen for on the event bus like we do at https://github.com/mozilla/pdf.js/blob/master/web/text_layer_builder.js#L360-L368 and call updateMatches
to render the highlights.
I just want to conclude this post by saying that this is also a valuable thread for us because your feedback inspired some of the most recent commits for the find bar/find controller, since we saw that not everything is as intuitive as one would like for third-party usage. Therefore, thank you for the feedback!
I have made the updates mentioned to the getGlobalEventBus() call and the setDocument() call, but still seem to be having issues. Below is a rough outline as to how I am implementing the text layer. Could it be possible that since I am using "renderText" instead of creating a "custom text layer" that there is nothing listening for the updateTextLayerMatches event being fired? If this should be done differently, how?
I am also not understanding what must be done to create the link service in the case that I am not using the PDFViewer (like below), nor am I 100% understanding how it should be implemented.
In the case where we do use the PDFViewer (/components/simpleviewer.js), should I be using the same link service object for the creation of both the viewer/find controller?
pdfDocumentInstance.getPage(pageNum).then(function (page)
{
//Generate Canvas + viewport and stuff
// ... blah ...
//Generate renderContext
var renderContext = {
canvasContext: myContext,
viewport: viewport
};
page.render(renderContext).then(function()
{
var imgSrc = myCanvas.toDataURL("image/png");
//Send img elsewhere to get painted to canvas
imageCallback.onCompleted(null, imgSrc);
});
page.getTextContent().then(function(textContent)
{
var textLayerId = "textLayerP"+page.pageIndex;
var textLayer = document.createElement('div');
textLayer.className = "textLayer";
textLayer.id = textLayerId;
$doc.getElementById('MAINDISPLAY-'+page.pageIndex).appendChild(textLayer);
}
pdfJSGlobal.renderTextLayer({
textContent: textContent,
container : textLayer,
viewport : page.getViewport(scale),
enhanceTextSelection : false
});
});
});
I would also like to note that as I have been implementing these changes, I have also been attempting to implement the changes into /components/simpleviewer.js to see how this could be done while using the PDFViewer as to gain a better understanding, however, I have not had luck there either.
As always, thank you for your help!
Edit: After inspecting further, I am more certain that the issue is with how I am implementing my text layer. How could/should I be implementing the text_layer_builder.js mentioned above? The only examples that I was able to find made use of a PDF.JS viewer object.
In the meantime, I proceeded to work on the components/simpleviewer.js example. I determined that in this example, PDFViewer controlled the text layer. I created an eventBus that could be accessed globally, which I then attached to the PDFFindBar and PDFFindController. I then made this find controller accessible before attaching it to the PDFViewer object. With that done, I am now able to make calls to the PDFFindController on event triggers in the components/simpleviewer.js example!
Edit2: Are the pdf_find_controller.js and pdfjsViewer.PDFFindController different objects, or are they interchangeable? I think this could have lead to some confusion with the simpleviewer.js implementation.
It's becoming hard to tell from just code snippets alone since it doesn't show all context and we can't run it. Would it be possible to set up a minimal project somewhere (JSFiddle, JSBin or your own server) with your find bar/controller and custom text layer so we can take a look (you can strip out any irrelevant information to make it a minimal example to reproduce the issue)?
You can test if updateTextLayerMatches
is being fired by making a dummy listener on the same event bus that just prints something when it gets the event. At least then you'll know if the find controller is functioning and if so then it's a problem in your custom text layer.
With that done, I am now able to make calls to the PDFFindController on event triggers in the components/simpleviewer.js example!
It sound like there is some progress there then, so I think you're indeed narrowing the problem down to updateTextLayerMatches
either not firing or not being handled by your custom text layer, which should listen to the same event bus.
Are the pdf_find_controller.js and pdfjsViewer.PDFFindController different objects, or are they interchangeable?
pdf_find_controller.js
is the ES6 source and contains the PDFFindController
class. pdfjsViewer.PDFFindController
is a direct reference to that same class, but is problably transpiled by Babel for non-ES6 browsers, depending on if you build with Babel enabled or not. In either case, functionality- and interface-wise they should be equal.
Instead of using the PDFJS.renderText method to generate my text layer, I figured out how to use the text_layer_builder.js class discussed earlier. Below is my implementation.
pdfDocumentInstance.getPage(pageNum).then(function (page)
{
//Generate Canvas + viewport and stuff
// ... blah ...
//Generate renderContext
var renderContext = {
canvasContext: myContext,
viewport: viewport
};
page.render(renderContext).then(function()
{
var imgSrc = myCanvas.toDataURL("image/png");
//Send img elsewhere to get painted to canvas
imageCallback.onCompleted(null, imgSrc);
});
page.getTextContent().then(function(textContent)
{
var textLayerId = "textLayerP"+page.pageIndex;
var textLayer = document.createElement('div');
textLayer.className = "textLayer";
textLayer.id = textLayerId;
$doc.getElementById('MAINDISPLAY-'+page.pageIndex).appendChild(textLayer);
}
var textLayerBuilder = new $doc.TextLayerBuilder({
textLayerDiv: textLayer,
eventBus: $doc.eventBus,
pageIndex: page.pageIndex,
viewport: page.getViewport(scale),
enhanceTextSelection: false,
});
textLayerBuilder.setTextContent(textContent);
textLayerBuilder.render();
});
});
Luckily, this seems to work quite nicely and generate the text layer in a similar manner as before, however, now we have access to all of the listener functions that come with text_layer_builder.js
I believe the only missing piece here is that I can not seem to determine the correct way of implementing the PDFLinkService object used when creating the PDFFindController, nor do I fully understand what PDFLinkService does. Without this linkService object, my PDFFindController does not seem to fire the correct events, as the text_layer_builder.js never catches any of them.
In the components/simpleviewer.js example, I was able to get the PDFLinkService object via pdfjsViewer.PDFLinkService();
This is no longer feasible for me, since I am not implementing the pdfjsViewer.
Would importing web/pdf_link_service.js
and then constructing it via the following be sufficient? If not, how could I implement this class?
var pdfLinkService = new PDFLinkService(document.eventBus);
pdfLinkService.setDocument(pdfDocumentInstance);
It appears to me that without setting a pdfViewer object for PDFLinkService, PDFFindController will throw errors. This happens because when it attempts to find search results, it accesses [1] this.linkService.page
. This property, and others like it, are generated by PDFLinkServices using [2] this.pdfViewer.currentPageNumber;
. From what I can tell, this means that direct implementation of the PDFLinkServices as I outlined above will most likely not work as it is dependent upon setting a PDFViewer object via pdfLinkService.setViewer(new pdfjsViewer.PDFViewer())
.
Is the only solution here to modify pdf_link_service.js to access the respective values from my viewer instead of the PDFViewer object?
Luckily, this seems to work quite nicely and generate the text layer in a similar manner as before, however, now we have access to all of the listener functions that come with text_layer_builder.js
Yes, it does look like a much better solution to me!
Is the only solution here to modify pdf_link_service.js to access the respective values from my viewer instead of the PDFViewer object?
Indeed, I think there is not really a way around it if you're not using the PDFViewer
object. Most components don't use PDFViewer
anymore since we decoupled that, but the link service is now used instead as a means of communication and performing navigation functionality for the viewer. Hence, the assumption here is that PDFViewer
is used by the third-party.
However, implementing a custom link service is not hard because it's a simple class for which you can provide a custom one that implements the interface. Notice how we have https://github.com/mozilla/pdf.js/blob/master/web/pdf_link_service.js#L411 and https://github.com/mozilla/pdf.js/blob/master/test/unit/pdf_find_controller_spec.js#L22 for example, for exactly the same reason that sometimes it's not useful to have an actual PDFViewer
instance around (for example in the unit tests where a mock link service is good enough).
The link service is a simple layer between components and the underlying PDFViewer
so they don't need access to it directly. This is purposely done to decouple the components and to make use cases like yours easier to realize by using another link service instead that you can allow to control a custom viewer. Therefore, you can just make a custom version that gets and sets the values in your viewer instead and use that link service for e.g., the find controller.
Closing since this should be answered now.