Jabref: convert latex encoding - code in entry that crashes jabref

Created on 3 May 2020 · 22Comments · Source: JabRef/jabref

JabRef 5.1--2020-05-02--1d9957b
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

[x] I have tested the latest development version from http://builds.jabref.org/master/ and the problem persists

In a test database with >10500 entries, add an entry (ctrl n, or with button): this crashes jabref. No log message.

In a smaller database, no problem adding a new entry. I can copy and paste it into the larger database.

bug 🐛

Source

ilippert

All 22 comments

JabRef 5.1--2020-05-04--b5599c9
Windows 10 10.0 amd64
Java 14.0.1

AND

JabRef 5.1--2020-05-04--b5599c9
Linux 5.3.0-51-generic amd64
Java 14.0.1

using a database with >19,000 entries

Cannot reproduce this issue. Might be related to specific database, preferences or hardware? @ilippert : Can you reliably reproduce this problem? Or does it appear only sometimes?

AEgit on 5 May 2020

JabRef 5.1--2020-05-04--7bb1e24
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

yes, I can still reproduce this reliably.

ilippert on 5 May 2020

Can you tell us, what the ram usage is of JabRef when this happens?

calixtus on 5 May 2020

before adding new entry:
memory 1,7gb
virtual memory 107,1gb
Resident memory 1,9 gb
Shared memory mb

when adding: shared memory goes up to 250mb; memory and resident memory each go up by +100mb.

ilippert on 5 May 2020

I don't really know much about java memory usage, but a 250 mb of ram usage rise when adding one entry seems not normal...
@tobiasdiez @koppor @Siedlerchr ?

calixtus on 5 May 2020

sorry, I just checked something else: i created an entirely new bib file, copying in 10500 entries. Then adding an entry succeeded.

ilippert on 5 May 2020

ok, comparing the original database file and the newly created one, with the same entries: I see a difference of file size: 1.5mb difference file size.
i tried to compare both files - but the order of the entries are entirely differently stored.

I have now copied the group structure from the original bib file and copied into the new bib file. That new bib file is still by 1.5mb smaller than the original file. I can still add entries to this new file.

Upps, the new jabref file has, of course, changed all my timestamps. that's not good. i need the old timestamps....

ilippert on 5 May 2020

So, maybe we can close the issue at this point - as in, it was an "artefact" of that original database.

However, the original database is simply one that has grown throughout the years and versions of jabref. Maybe other users have also such naturally growing biblatex databases.

ilippert on 5 May 2020

JabRef 5.1--2020-05-04--7bb1e24
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

wait, now, having deactivated the timestamp update, creating a new file and pasting the entries results (tested in two instances) in a new database of the equivalent size as the original bib file.

And now adding a new entry results in crashing jabref.

ilippert on 5 May 2020

this bug is quite new, it started to emerge around last weekend. before i was able to add entries to the original database file without crash.

ilippert on 5 May 2020

Think, we need your database to be able to reproduce the issue. Would it be possible that you share it? Only the core developers will have access to the file - it won't be published, ...

koppor on 7 May 2020

Yes, I am happy to share, please advise how you like to receive the file

ilippert on 7 May 2020

You'll see my email address at my GitHub profile. Could you try sending it there?

koppor on 7 May 2020

👍1

I now noted, that regularly, with my Intel® Core™ i7-6700HQ CPU @ 2.60GHz × 8 system, if the said file is open, jabref needs 40-60% of my CPU. If I close the file, Jabref needs only about 2%.

ilippert on 24 May 2020

I investigated the database and identified one entry that reliably breaks jabref. However, I cannot detect what is wrong with it.

@Article{Kolb2003,
  Title                    = {Protest, \"{O}ffentlichkeitsarbeit und {L}obbying schlie{\ss}en sich nicht aus. {D}ie {M}assen als {S}chl\"{u}ssel zur {M}acht. {F}elix {K}olb vergleicht die politischen {S}trategien von {U}mweltbewegung und {G}lobalisierungskritikern, extract from `\textit{politische \"{o}kologie}' (85) 2003},
  Author                   = {Felix Kolb},
  Year                     = {2003},
  Month                    = {14. Aug.},
  Number                   = {188},
  Pages                    = {7},

  Journal                  = {Frankfurter Rundschau}
}

Without this entry, jabref seems to run more smoothly...

ilippert on 25 May 2020

Just an idea and possible workaround (?). Have you tried making the following changes:

\"{O}ffentlichkeitsarbeit to {\"{O}}ffentlichkeitsarbeit
{S}chl\"{u}ssel to {S}chl{\"{u}}ssel
\"{o}kologie} to {\"{o}}kologie} (on a side note: {\"{O}}kologie} should be upper case)

Does that make any difference?

AEgit on 25 May 2020

Maybe a parsing error in the month field? Is there maybe a max length for the title?

calixtus on 25 May 2020

Actually, it might be best to write the umlauts differently (see https://tex.stackexchange.com/questions/366546/jabref-cant-read-bib-file-created-by-jabref-3-0/434268#434268

and

https://tex.stackexchange.com/questions/57743/how-to-write-%c3%a4-and-other-umlauts-and-accented-letters-in-bibliography):

So change \"{O} to {\"O}
and
\"{u} to {\"u}
and
\"{o} to {\"o} (or {\"O} if you are allowed to correct the capitalization)

AEgit on 25 May 2020

the problem is in inserting \"{o} in

`\textit{}'

this breaks jabref.

ilippert on 26 May 2020

Thank you for triangulating this.
JabRef uses internally an extern library (latex2unicode) to convert the latex encoding. Sadly, this library seems no more in active development, so we already started to think of a teplacement. But this is going to be a larger project.
I don't know yet if there is a quick fix possible.

Refs #5547
Refs #6155

calixtus on 26 May 2020

👍1

issue topic -
Now I am on
JabRef 5.1--2020-05-25--6f34de3
Linux 5.6.13-300.fc32.x86_64 amd64
Java 14.0.1

I have moved all my old entries from my 15y old database to a new database (and in that process caught https://github.com/JabRef/jabref/issues/6399#issuecomment-633720078 with this bug https://github.com/JabRef/jabref/issues/6399#issuecomment-633743391).
Now I do not have the problem of the crash anymore - crash when adding new entry in database with 10500 entries. Therefore I am changing the title of this issue. please feel free to alter, if this does not fit.

ilippert on 26 May 2020

I can't shed much light on the underlying issue, but I don't think it should be the latex2unicode converter. Adding the following test case to LatexToUnicodeFormatterTest.java works for me,

@Test
void formatUmlautsInTextit() {
    assertEquals("\uD835\uDC5D\uD835\uDC5C\uD835\uDC59\uD835\uDC56\uD835\uDC61\uD835\uDC56\uD835\uDC60\uD835\uDC50ℎ\uD835\uDC52 \uD835\uDC5C̈\uD835\uDC58\uD835\uDC5C\uD835\uDC59\uD835\uDC5C\uD835\uDC54\uD835\uDC56\uD835\uDC52",
            formatter.format("\\textit{politische \\\"{o}kologie}"));
}

where the unicode on the left comes from yaytext.com.