Jabref: latex_to_unicode produces problematic filename

Created on 4 Apr 2018  路  16Comments  路  Source: JabRef/jabref

I encountered this when retrieving information for doi:10.1088/1752-7155/7/1/017106.
Which retrieves the authors: Patrik {\v{S}}pan{\v{e}}l and Kseniya Dryahina and David Smith.

A cleanup/rename pdf gives following filename: S}pane{l}EtAl/Spanel2013 - A quantitative study.pdf (note the braces are produced like this) which not only is wrong, but also gives an error: Could not save file. Error in field 'file': Braces don't match.

I have directory pattern set to: [authEtAl:latex_to_unicode] and file format pattern set to [bibtexkey] - [shorttitle:latex_to_unicode]. And bibtex key pattern is: [auth:latex_to_unicode][year]. Also note that the bibtex key doesn't contain any }.

cleanup-ops bug 馃悰

Most helpful comment

This is a hard one, but just goahead!

All 16 comments

I tried it locally and can confirm the behaviour. In fact it seems like the file directory pattern is not interpreted correctly.

        String fileNamePattern = "[bibtexkey] - [shorttitle:latex_to_unicode]";
        String directoryPattern = "[authEtAl:latex_to_unicode]";
  1. Rename PDF is correct . Sample file Toot.pdf -> Toot - A quantitative study.pdf
  2. Move Files Cleanup breaks it then -> S}pan{e}lEtAl/Toot - A quantitative study.pdf

Okay, the problem is not the latex2unicode itself, but our Authorlist parser.

@JabRef/developers Does anyone of you know why the first brace is removed?
That actually is the underlying root problem here:

https://github.com/JabRef/jabref/blob/0c34fa4ba99821570dbf04e22d6f749a9c1b2456/src/main/java/org/jabref/model/entry/Author.java#L370-L386

Probably to sort {{JabRef}} under J and not {.

This should not be done in the model, but only inside the model for the UI, so we need to move this part of the code @Siedlerchr
Or at least the filename generation should not depend on it.

Tests:

    @Test
    public void testAuthEtAlBraces() {
        assertEquals("{\v{S}}pan{\v{e}}l",
                BibtexKeyGenerator.authEtal("Patrik {\\v{S}}pan{\\v{e}}l and Kseniya Dryahina and David Smith", "", "EtAl"));
        assertEquals("\\v{S}pan\\v{e}lEtAl",
                BibtexKeyGenerator.authEtal("Patrik \\v{S}pan\\v{e}l and Kseniya Dryahina and David Smith", "", "EtAl"));
    }

It is actually problemetic what to expect here.

  • Making the braces unbalanced is leading to problems in any code except sorting!
  • Keeping the braces produces problematic output for cases like key generation imho (not sure if key generation should only produce alphanumeric keys?!)

FWIW - as user I would expect the first. Or one using the correct unicode symbols.

DevCall:

  • AuthorClass: Strangest method to get names is taken.
  • Remove non-ASCII characters to ensure compatibility with pdflatex

Still present in 5.0 dev

I can try to fix this one.

This is a hard one, but just goahead!

@koppor right now directory names are allowed to contain unicode. Unless there have been complaints, shouldn't that remain the case?

I currently believe there are two issues the solution depends on

  1. Is unicode allowed in the directory path?
  2. Is it too much of a performance hit to call the LatexToUnicodeAdapter on all auth... patterns? (the "latex-free" string will not have been cached)

I prefer to use the LatexToUnicodeAdapter because I think it is a better user experience if both G枚del and G{\"o}del generate the same directory name, and changing the directory structure is an infrequent event. I'd guess that it would be very hard to generate the same directory name for both G枚dels without using LatexToUnicodeAdapter.

Unicode path names are perfectly valid.
you can even use emoji on windows 10.

https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#file-and-directory-names

Regarding performance. Don't know but I would also go for the latex2unicode adapter. Seems reasonable to me. Don't you have the latex free author already in the author list class?
The author patterns are just equivalent to methods for getting the authors

@Siedlerchr , @koppor earlier pointed out that,

  • Remove non-ASCII characters to ensure compatibility with pdflatex

but perhaps that is only relevant for the bibtex key? I am not entirely sure about the use case (except for organizing pdfs). Does people use it to organize plots, etc. that they later import into a .tex file?

Don't you have the latex free author already in the author list class?

No, the latex-free methods cache full "patterns" (e.g., authorsLastOnly), unfortunately not individual authors or this particular pattern.

Actually, you could use AuthorList#getAsLastNamesLatexFree and split that string. That would remove the performance bottle-neck at the cost of having a "hacky" solution.

Never mind. Unless the user has the exact right preferences AuthorList#getAsLastNamesLatexFree would amount to the exact same solution, with an extra split operation at the end.

Authors fields are currently not latex-free. I'd consider it an option to change that, and cache latex-free Authors instead of AuthorLists.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Siedlerchr picture Siedlerchr  路  3Comments

LinusDietz picture LinusDietz  路  3Comments

Siedlerchr picture Siedlerchr  路  3Comments

jonasstein picture jonasstein  路  3Comments

oscargus picture oscargus  路  3Comments