Fzf: Finding pdf files efficiently with fzf

Created on 21 Dec 2017 · 1Comment · Source: junegunn/fzf

Category
- [x] fzf binary
- [ ] fzf-tmux script
- [ ] Key bindings
- [ ] Completion
- [ ] Vim
- [ ] Neovim
- [ ] Etc.
OS
- [x] Linux
- [ ] Mac OS X
- [ ] Windows
- [ ] Windows Subsystem for Linux
- [ ] Etc.
Shell
- [x] bash
- [ ] zsh
- [ ] fish

Hi there,
I am trying to build a command to search through PDF files, using the pdftotext command. It's currently functional but I have some questions to improve it. It works like this (code provided below)

it search recursively for PDF files
each line fed to fzf has the path/filename.pdf, plus the text of the first page of the pdf with newlines replaced by "_" (this is useful for academic publications: the first page likely contains the names of authors, title of the paper, so that one can use fzf to find pdf files even if the filenames are uninformative.)
the preview window shows the first page of the pdf given by pdftotext.
upon selection with Enter, command gio open launches evince with the pdf.

I am wondering if the following is possible with fzf.

Is it possible to highlight the preview window using the current search query? To do so, I need to access the search query in the preview command. Is the search query hidden in some variable? Is there a built-in function to highlight the preview window? Can we reuse the fzf highlighting code to achieve this?
I currently use the flag -e to avoid fuzzy-finding. Fuzzy-finding finds the query as a substring in most entries, because each entry is long (one full page of an academic pdf). Is there a way to discard entries where too many gaps are too long? For instance, discard entries where the substring is obtained with several gaps of 20 letters or more. This is not so important because the scoring algorithm (https://github.com/junegunn/fzf/blob/0.15.0/src/algo/algo.go) puts the best matches first.
My current code breaks the final comand (xargs gio open) when filenames have spaces. Is there a standard way to avoid such issues?
I would like to try to replace the preview window by multiline entries. But from what I read in older issues, multi-line entries are not supported. Is that still the case?
Any suggestions for improvements to the code are very welcome -- I am actually new to awk and fzf customization.

Here is the current code:

ag -g ".pdf$" \
    | awk 'BEGIN {FS="\t"; OFS="\t"}; {command="pdftotext -f 1 -l 1 \""$0"\" - 2>/dev/null | tr \"\n\" \"_\" "; command|getline d; close(command); print $0, d}' \
    | fzf -e \
    --preview-window up:50% \
    --preview "pdftotext -f 1 -l 1 '{1}' - " \
    | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $1}' \
    | xargs gio open

(Btw, thanks for that addictive and terrific piece of software!)

Source

bellecp

Most helpful comment

I actually answered most of my questions. I am eventually using the following code, with highlighting of the preview window and caching of the pdftotext command. From other issues, it seems that multiline entries are not supported in fzf.

I find a quite useful to search for academic pdfs: one can search by filename, keywords, author names, institutions, abstract or anything that is in the first page of the pdf. It sounds good enough for now! Thanks again for this great piece of software.

output