Jabref: latex_to_unicode produces problematic filename

Created on 4 Apr 2018 · 16Comments · Source: JabRef/jabref

I encountered this when retrieving information for doi:10.1088/1752-7155/7/1/017106.
Which retrieves the authors: Patrik {\v{S}}pan{\v{e}}l and Kseniya Dryahina and David Smith.

A cleanup/rename pdf gives following filename: S}pane{l}EtAl/Spanel2013 - A quantitative study.pdf (note the braces are produced like this) which not only is wrong, but also gives an error: Could not save file. Error in field 'file': Braces don't match.

I have directory pattern set to: [authEtAl:latex_to_unicode] and file format pattern set to [bibtexkey] - [shorttitle:latex_to_unicode]. And bibtex key pattern is: [auth:latex_to_unicode][year]. Also note that the bibtex key doesn't contain any }.

cleanup-ops bug 🐛

Source

bdcaf

Most helpful comment

This is a hard one, but just goahead!

koppor on 10 Jul 2020

🎉1 👍1

All 16 comments

I tried it locally and can confirm the behaviour. In fact it seems like the file directory pattern is not interpreted correctly.

        String fileNamePattern = "[bibtexkey] - [shorttitle:latex_to_unicode]";
        String directoryPattern = "[authEtAl:latex_to_unicode]";

Rename PDF is correct . Sample file Toot.pdf -> Toot - A quantitative study.pdf
Move Files Cleanup breaks it then -> S}pan{e}lEtAl/Toot - A quantitative study.pdf

Siedlerchr on 6 Apr 2018

Okay, the problem is not the latex2unicode itself, but our Authorlist parser.

Siedlerchr on 6 Apr 2018

@JabRef/developers Does anyone of you know why the first brace is removed?
That actually is the underlying root problem here:

https://github.com/JabRef/jabref/blob/0c34fa4ba99821570dbf04e22d6f749a9c1b2456/src/main/java/org/jabref/model/entry/Author.java#L370-L386

Siedlerchr on 6 Apr 2018

Probably to sort {{JabRef}} under J and not {.

tobiasdiez on 7 Apr 2018

This should not be done in the model, but only inside the model for the UI, so we need to move this part of the code @Siedlerchr
Or at least the filename generation should not depend on it.

stefan-kolb on 16 Apr 2018

👍1

Tests:

    @Test
    public void testAuthEtAlBraces() {
        assertEquals("{\v{S}}pan{\v{e}}l",
                BibtexKeyGenerator.authEtal("Patrik {\\v{S}}pan{\\v{e}}l and Kseniya Dryahina and David Smith", "", "EtAl"));
        assertEquals("\\v{S}pan\\v{e}lEtAl",
                BibtexKeyGenerator.authEtal("Patrik \\v{S}pan\\v{e}l and Kseniya Dryahina and David Smith", "", "EtAl"));
    }

It is actually problemetic what to expect here.

Making the braces unbalanced is leading to problems in any code except sorting!
Keeping the braces produces problematic output for cases like key generation imho (not sure if key generation should only produce alphanumeric keys?!)

stefan-kolb on 25 May 2018

FWIW - as user I would expect the first. Or one using the correct unicode symbols.

bdcaf on 25 May 2018

DevCall:

AuthorClass: Strangest method to get names is taken.
Remove non-ASCII characters to ensure compatibility with pdflatex

koppor on 1 Jun 2018

Still present in 5.0 dev

Siedlerchr on 20 Apr 2019

I can try to fix this one.

k3KAW8Pnf7mkmdSMPHz27 on 9 Jul 2020

This is a hard one, but just goahead!

koppor on 10 Jul 2020

🎉1 👍1

@koppor right now directory names are allowed to contain unicode. Unless there have been complaints, shouldn't that remain the case?

I currently believe there are two issues the solution depends on

Is unicode allowed in the directory path?
Is it too much of a performance hit to call the LatexToUnicodeAdapter on all auth... patterns? (the "latex-free" string will not have been cached)

I prefer to use the LatexToUnicodeAdapter because I think it is a better user experience if both Gödel and G{\"o}del generate the same directory name, and changing the directory structure is an infrequent event. I'd guess that it would be very hard to generate the same directory name for both Gödels without using LatexToUnicodeAdapter.

k3KAW8Pnf7mkmdSMPHz27 on 10 Jul 2020

Unicode path names are perfectly valid.
you can even use emoji on windows 10.

https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#file-and-directory-names

Regarding performance. Don't know but I would also go for the latex2unicode adapter. Seems reasonable to me. Don't you have the latex free author already in the author list class?
The author patterns are just equivalent to methods for getting the authors

Siedlerchr on 10 Jul 2020

👍1

@Siedlerchr , @koppor earlier pointed out that,

Remove non-ASCII characters to ensure compatibility with pdflatex

but perhaps that is only relevant for the bibtex key? I am not entirely sure about the use case (except for organizing pdfs). Does people use it to organize plots, etc. that they later import into a .tex file?

Don't you have the latex free author already in the author list class?

No, the latex-free methods cache full "patterns" (e.g., authorsLastOnly), unfortunately not individual authors or this particular pattern.

k3KAW8Pnf7mkmdSMPHz27 on 10 Jul 2020

Actually, you could use AuthorList#getAsLastNamesLatexFree and split that string. That would remove the performance bottle-neck at the cost of having a "hacky" solution.

k3KAW8Pnf7mkmdSMPHz27 on 10 Jul 2020

Never mind. Unless the user has the exact right preferences AuthorList#getAsLastNamesLatexFree would amount to the exact same solution, with an extra split operation at the end.

Authors fields are currently not latex-free. I'd consider it an option to change that, and cache latex-free Authors instead of AuthorLists.

k3KAW8Pnf7mkmdSMPHz27 on 10 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings