Jabref: Citation preview incrementally increases memory usage with each preview

Created on 9 Feb 2017 · 32Comments · Source: JabRef/jabref

JabRef 3.8.2
windows 10 10.0 amd64
Java 1.8.0_121

RAM is used during preview, but not fully released. With more previews, less is released until run-away usage occurs. After ~4-5 previews (or ~500MB RAM, not sure which is more significant), no appreciable amount of RAM is released, with subsequent previews adding 100-500MB per preview to total RAM usage. Behaviour changes after ~1400MB RAM use, with slower incremental use and more frequent release of small amounts of ram.

Tested with different preview styles (no change); interesting to note how CPU use in IEEE preview generation spikes to ~40%, whereas APA spikes to ~20%.

NOT tested with any release versions.

Steps to reproduce [RAM use in square brackets]:

Open JabRef and bibtex database (~900 entries)
Single click on entry to show preview box, wait for preview to show
Single click on another entry, wait for preview to show

This shows in the exceptions tab of the error console:

ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6

entry-preview performance

Source

kingwarrick

Most helpful comment

@halirutan @tobiasdiez If I am not mistaken, the memory issue happens only when using citationstyles, not the default preview or the IEEE preview.

lenhard on 9 Dec 2017

👍2

All 32 comments

Similar to this:
https://github.com/JabRef/jabref/issues/2247
but for preview rather than entry editing

tobiasdiez commented on Nov 8 2016
I can confirm at least the large memory footprint. After opening a normal db (mine has 500 entries and a few groups) JabRef eats only 200 MB of RAM. Now open the entry editor and run through a few entries (using the arrow keys so that a new entry editor is generated for the entries). Result: over 1 GB of RAM usage. Btw: the same db with 3.6 only needed < 100 MB RAM.

kingwarrick on 9 Feb 2017

Yes, we cache the preview to gain speed when showing the second time.

@bartsch-dev What is our policy for cache clearance?

koppor on 10 Feb 2017

The mention did not work, so here again @bartsch-dev What is our policy on cache clearance?

lenhard on 28 Feb 2017

The cached citation gets deleted when a field of the BibEntry changes or the BibEntry itself is removed (or the selected citation style changes). https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/citationstyle/CitationStyleCache.java#L59

It gets generated again the next time it's selected by the user and the preview is active.

chriba on 28 Feb 2017

This issue will be solved if we remove the cache and finally integrate https://github.com/JabRef/jabref/pull/2250 in the master branch.

koppor on 27 Mar 2017

Sorry, can't confirm with current master 1f893cdf1076085fdb8fb1a2e8016cf6d4bd63f8 under linux with

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

I opened a 6k entry bibtex database and performed the above described steps. RAM never went beyond 425MB before the Garbage Collection kicked in.

LinusDietz on 4 Apr 2017

Thanks for testing @lynyus!

@kingwarrick Please give a it a try with the latest version of JabRef from http://builds.jabref.org/master/ Your problem seems to be solved in the most recent dev builds. Please do this with a backup of your bib file since there are a number of breaking changes regarding groups. Let us know if your experiences!

lenhard on 5 Apr 2017

Hello,

The issue persists on my machine, both linux (openjdk, openjfx) and
Windows 10

JabRef 4.0.0-dev--snapshot--2017-04-05--master--a1f4101df
Windows 10 10.0 amd64
Java 1.8.0_121

Let me know if I can help in troubleshooting further.

Warrick

On Wed, 05 Apr 2017 00:23:50 -0700
Jörg Lenhard notifications@github.com wrote:

Thanks for testing @lynyus!

@kingwarrick Please give a it a try with the latest version of JabRef
from http://builds.jabref.org/master/ Your problem seems to be solved
in the most recent dev builds. Please do this with a backup of your
bib file since there are a number of breaking changes regarding
groups. Let us know if your experiences!

kingwarrick on 5 Apr 2017

Can reproduce the high RAM and CPU usage if one of the CSL Styles is used for preview. Performing a manual garbage collection using jvisualvm reduces the amount of used heapspace - but this is not freeed for the OS.

I assume we cannot really do anything about this - potentially #2250 will solve this but I'm not Sure...

matthiasgeiger on 5 Apr 2017

Yes, it will be solved. However, currently no one has time to work on the build process. Hopefully after the alpha release. Let's see.

koppor on 5 Apr 2017

@lynyus has worked on the caching of citation styles and it should get better.

@kingwarrick Is there any upper bound of memory usage? - There should be...

koppor on 6 Apr 2017

Actually I have not. I implemented the caching in the FileAnnotationCache.

LinusDietz on 6 Apr 2017

I have not experienced an upper bound, but have seen it climb to ~1900MB before I stopped testing.

The increase is not linear, but seems to occur in bursts. Not sure if it is entry-specific; I will try to be more systematic to see if there are specific entry types that use more memory.

I saw some graphs in an issue thread that showed memory use over time (can't find it now, but it dealt with garbage collection). Can you or someone direct me on where to get/how to use that tracing/debugging program? This would be easier for me (and maybe more informative for
you) than relying on task manager and top.

Edit:
This graph: https://github.com/JabRef/jabref/issues/2247#issuecomment-259396313

On 06/04/17 06:32 AM, Oliver Kopp wrote:
>

@lynyus https://github.com/lynyus has worked on the caching of
citation styles and it should get better.

@kingwarrick https://github.com/kingwarrick Is there any upper bound
of memory usage? - There should be...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JabRef/jabref/issues/2533#issuecomment-292159243,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AWd9NoL3O955W9hoxaT_MWbKLNf58jm8ks5rtNtGgaJpZM4L8hHB.

kingwarrick on 6 Apr 2017

That's jvisualvm (just search in the start menu) which is shipped with the Oracle JDK.

matthiasgeiger on 6 Apr 2017

Hopefully this will help:
jabref-memory

kingwarrick on 6 Apr 2017

👍2

The PR https://github.com/JabRef/jabref/pull/3089 should have introduced a limit of the memory consumption. Could you please check again? I hope, that not each preview adds 100 MB of RAM. If it does, we have to really limit the number of cached entry previews. Currently, there are 1024 entries cached.

koppor on 1 Sep 2017

JabRef 4.0-dev--snapshot--2017-09-01--master--e46d5ee88
Windows 10 10.0 amd64
Java 1.8.0_144

First test (preview one entry at a time, advancing after display; no screenshot):
Memory usage increased significantly for the first 15 previews up to ~1500MB (same as prior), peaked around 1700MB briefly, with memory usage hovering around ~1550MB after preview 100.

Second test (quickly advance through 3-10 entries, triggers preview for more than one entry; screenshot attached):
Memory use increases rapidly, peaking at ~2200MB and hovering around 2000MB.
The time-to-display of the preview is much quicker after about 20 previews.

jabref-mem-use

kingwarrick on 1 Sep 2017

I can confirm that the 4.0 release has severe memory issues under Linux. Just opening my database with Jabref 4.0 makes Java use 984 MB of RAM. The first search makes this go up to 1.2 GB. Stepping through the first 20 shown entries brings this to 2.1 GB. After adding 5 articles this morning and searching some others Jabref was at 3.2 GB. That's not really practicable. How can caching something like a 300 character-string (citation preview) cost several 10s of MB of RAM?

JabRef 4.0
Linux 3.16.0-38-generic amd64
Java 1.8.0_144

sbitzer on 16 Oct 2017

To be fair on the 4.0 release, this problem was present well before that :) And of course caching a 300 character string does not take 10s of MB of RAM, obviously JabRef stores more than that (and I am in no way saying that's good).

Given that the CSL cache seems to be totally non-functionally, can we simply disable/remove it? Or limit the cache to, say, a 1 digit number of entries? I know that this will mean that displaying the previews will take longer, but the memory consumption described here is far beyond anything justifiable. I have no deeper knowledge of how the previews work internally, but maybe one of the developers who contributed to it can clarify?

@sbitzer the quickfix (mentioned above) seems to be to not use citation styles for the preview.

EDIT: I think we should do something here soon and added it to the 4.1 milestone.

lenhard on 17 Oct 2017

@halirutan do you have time to look at this issue? Your analysis in other performance related issues was superb. Of course, I can help fixing the code.

tobiasdiez on 6 Dec 2017

@tobiasdiez I already profiled this. I was working the last weeks on my Mathematica plugin which explains my absence. I have two issues on my list that I want to look at. It's this one and the disastrous behavior of the group-view for moderately nested group-trees.

When my memory serves me right, I couldn't come to a definite conclusion with this issue. It is a combination of the several string-builders. I will try to repeat the profiling to find out if there is a point where we can make a worthy impact on the memory consumption

halirutan on 6 Dec 2017

@tobiasdiez @sbitzer @lenhard @kingwarrick I cannot confirm the high RAM usage with the latest master c51ecbee9bf05802169784b99d19c624aac28d5b. I used a large database (several thousand entries) and scrolled through 500 of them, showing the preview each time in IEEE style. The preview is the biggest contributor, but it used less than 90MB of RAM. This comes down to 180kb per preview. See the highlighted line

However, one thing that I find highly suspicious is that the simple call to setting the text in the JPane with javax.swing.JEditorPane#setText in these lines uses 52 MB of the overall 86MB.

Can someone on a machine that shows the weird behavior test to comment out the actual setting of the preview string?

    public void setPreviewLabel(String text) {                  
        if (SwingUtilities.isEventDispatchThread()) {           
            final Document document = previewPane.getDocument();
//            previewPane.setText(text);                        
//            previewPane.revalidate();                         
        } else {                                                
            SwingUtilities.invokeLater(() -> {                  
//                previewPane.setText(text);                    
//                previewPane.revalidate();                     
            });                                                 
        }                                                       
        this.scrollToTop();                                     
    }

This leaves the whole machinery of JabRef creating the preview string alive but shoves the JPanel out of the memory picture. Scroll slowly bit by bit through the entries with preview open and see what your memory does.

I'm on Ubuntu 16.04 with Oracle JRE.

halirutan on 9 Dec 2017

@halirutan Thanks for the analysis. 180kb per preview sounds fine. So with the current cache size of 1024, there shouldn't be any problems since the total memory consumption is then around 200 mb. So how does these astronomical reports of > 2 gb come about?

It is understandable that the the JEditorPane has quite some overhead. In the end it is a full webbrowser and the simple string has to be parsed to a html tree and then displayed. Maybe the migration to JavaFX https://github.com/JabRef/jabref/pull/3504 helps in this regard a bit.

tobiasdiez on 9 Dec 2017

@halirutan @tobiasdiez If I am not mistaken, the memory issue happens only when using citationstyles, not the default preview or the IEEE preview.

lenhard on 9 Dec 2017

👍2

@lenhard That's true.

koppor on 10 Dec 2017

Here comes the revised performance analysis. I am not certain, but when I understand this correctly, then the cache is not the problem. The main drain of resources seems to be the CSL library. Let us first look at the memory. Note that the profiler does not track all objects which is why there is a large difference between the size in line 1 and the size that the CSL lib uses. What I tested is the creation of exactly one preview that uses one of the provided csl styles and not the default one.

This is the memory footprint:

As you can see, 98% of the memory is eaten up by creating the styled bib-entry through the CSL library. Notably, 63% of it only by constructing the CSL instance inside makeAdhocBibliography. When we look at the comment of this function, it states:

Creates an ad hoc bibliography from the given citation items. Calling this method is rather expensive as it initializes the CSL processor. If you need to create bibliographies multiple times in your application you should create the processor yourself and cache it if necessary.

This is by far no understatement. I understand that using makeAdhocBibliography is a nice way of feeding only one bib-item and creating an html-view of it, but it comes at a steep price. I have quickly looked through the API and it seems this is rather optimized for handling and rendering whole bibliographies instead of a fast preview for single items.

If possible at all, I would advise to hold our own CSL instance and reuse it. If we look at the performance, we get the same picture. Please don't be confused by the overall 25%. I left the application run idle for a moment so a lot of time was spent only waiting for me. The 25% is the time where the hard work was done.

What you should take from this is that from the 25% of the time that was needed to preview the item, 18% were spent only constructing the CSL instance. That is tremendous (50 out of 70 seconds under profiling!). The code for makeAdhocBibliography is short enough to share it:

public static Bibliography makeAdhocBibliography(String style, String outputFormat,
        CSLItemData... items) throws IOException {                                 
    ItemDataProvider provider = new ListItemDataProvider(items);                   
    CSL csl = new CSL(provider, style);                                            
    csl.setOutputFormat(outputFormat);                                             

    String[] ids = new String[items.length];                                       
    for (int i = 0; i < items.length; ++i) {                                       
        ids[i] = items[i].getId();                                                 
    }                                                                              
    csl.registerCitationItems(ids);                                                

    return csl.makeBibliography();                                                 
}

There are 3 parts in this method. First, the CSL instance is created. As you can see, the ListItemDataProvider is used to construct the CSL instance and it already contains our 1 item that we want to process. So if this is possible, we should use our own data provider that is updated with the current item, but I looked for the first time at this library and I'm not sure this works.

There are 2 other parts. The for loop just copies the ids which is hopefully fast. The last part consists of registerCitationItems and makeBibliography. Both methods come with a very notable performance and memory footprint as well as you can see in the images, but not as large as the constructor and I'm not sure how much this can be improved.

My very naive view is: They need to initialize the whole machinery and they need to parse the csl style. I understand this is hard work. But then they have everything set up and we provide CSLItemData so they only need to render it according to the style. Why are both final methods so expensive?

Anyway, my suggestion is to leave the cache alone or turn it of. More important is that we initialize the CSL engine and reuse it as much as possible.

halirutan on 11 Dec 2017

Another quick way to add a workaround is to switch to the V8 engine. See https://github.com/JabRef/jabref/pull/3180. Especially the screen videos with V8 and without V8.

koppor on 11 Dec 2017

This should be fixed in the latest development version. Could you please check the build from http://builds.jabref.org/master/. Thanks!

tobiasdiez on 19 Dec 2017

I just tried it out. Stepping through the first 50 entries of my database producing APA-style previews for them, increased the memory usage of Jabref from 915 MB to 1.4 GB. That's certainly less than before, but it's still a lot. For comparison: Doing the same with the standard entry preview raises memory usage only from 915 MB to 920 MB.

JabRef 4.1-dev--snapshot--2017-12-21--master--7819cb4aa
Linux 3.16.0-38-generic amd64
Java 1.8.0_151

sbitzer on 21 Dec 2017

How is the rendering speed? Maybe, we can drop the caching of rendered
entries completely?

Am 21.12.2017 12:50 schrieb "Sebastian Bitzer" notifications@github.com:

I just tried it out. Stepping through the first 50 entries of my database
producing APA-style previews for them, increased the memory usage of Jabref
from 915 MB to 1.4 GB. That's certainly less than before, but it's still a
lot. For comparison: Doing the same with the standard entry preview raises
memory usage only from 915 MB to 920 MB.

JabRef 4.1-dev--snapshot--2017-12-21--master--7819cb4aa
Linux 3.16.0-38-generic amd64
Java 1.8.0_151

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JabRef/jabref/issues/2533#issuecomment-353332177, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABTafqdDCcUp2nCPcqByGgd1Lwv-Ou-jks5tCkYPgaJpZM4L8hHB
.

koppor on 21 Dec 2017

The first entry preview takes a while to be generated. The others are almost instantaneous.

sbitzer on 21 Dec 2017

@sbitzer The first entry takes exactly as long as every entry needed before the patch. It initializes the CSL engine before it can render a preview. After that, we reuse it and until you change the style, every preview is fast. On a style change, the first item will trigger a re-initialization again.

@koppor I am certain the cache is not the problem and in fact, it saves us from using even more memory. Look here:

Simply running through 20 items and showing the preview uses 87% of the memory. 43% is spent in CSL.makeBibliography, 39% is spent in registering the item in the CSL engine and only 4% is spent by JabRef creating the format for CSL. I'm sorry to say this, but this is as far as I can optimize it.

The real pain under the hood is the NashornScriptEngine that is afaik used to run the JS code.

halirutan on 21 Dec 2017

Was this page helpful?

0 / 5 - 0 ratings