Bisq: Bisq using up too much memory on Linux even after closing

Created on 27 Jan 2020  路  10Comments  路  Source: bisq-network/bisq

Description

Bisq will take more and more memory until the system freezes when none remains.
Additionally if shutdown before no memory is left, not all of it is released. About 4-7GB are shown to be additionally occupied, but no process seems to hold it. Logging out and in again does not fix it. A restart does.

Version

1.2.5

Steps to reproduce

Unknown, just start the application for me.

Expected behaviour

I expect bisq to take maybe 3GB of ram and released it after closing the application.

Actual behaviour

More and more ram will be occupied until none is left and the system starts freezing. Sometimes bisq gets terminated by the system other times I just force reboot.

Device or machine

Ubuntu 18.04.3
Ryzen 5 2600X
G.Skill Aegis DIMM Kit 16GB

Additional info

The log contains a case where, after a reboot I started Bisq and got lucky. The application got terminated (I reacted too slow and could no longer move the mouse) after the ram was full. About 5.5GB of ram were shown to be still occupied after the termination.

bisq.log

bug critical priority Linux

Most helpful comment

What I think is happening on my Linux machine is the java vm thinks it has 128 GB of RAM to work with.

My OpenJDK 10 and 11 VMs have a very large default -XX:MaxRAM setting of 128 GB. It looks like "because" of this -- I need to be more careful with that word -- there are too many minor page faults, and libc malloc()s are creating a lot of extra memory segments via mmap() syscalls, but not using heap memory for alloctions via brk() calls. During tracing sessions, I saw mmap() calls, but not brk() calls.

The MaxRAM setting is described in the OpenJDK source.

OpenJDK 9:
    https://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/runtime/globals.hpp#l2030  
    product_pd(uint64_t, MaxRAM, "Real memory size (in bytes) used to set maximum heap size") 
    range(0, 0XFFFFFFFFFFFFFFFF)                    
OpenJDK 11:
    https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/hotspot/share/runtime/globals.hpp
    define_pd_global(uint64_t,MaxRAM,                    1ULL*G);

You can check your own VM's default MaxRAM setting:

OpenJDK 11
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
     uint64_t MaxRAM = 137438953472

OpenJDK 10:
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
    uint64_t MaxRAM = 137438953472

137438953472 bytes is 128 gigabytes

I can set MaxRAM=2GiB in JAVA_OPTS

        export JAVA_OPTS="-XX:MaxRAM=2147483648"

and Bisq's RES (SEE htop) stays below ~1.2 GB, instead of growing to ~4.8 GB. (But I haven't let it run for days...)


After libc malloc()s expand the extra virtual memory via mmap() syscalls, the app causes MMU violations when it tries to use the allocation memory range(s) and finds no mapping for an address or addresses.

I saw MMU errors in GC logs last night, when using -XX:+ExtendedDTraceProbes + GC logging, but only today realized there may be a connection to my VM's detaults MaxRAM=128GB setting. I had assumed it was due to the extra overhead from using -XX:+DebugNonSafepoints -XX:+PreserveFramePointer, but especially due to -XX:+ExtendedDTraceProbes.

I believe these MMU violations in turn result in page faults, and causes consumption of extra pages of physical memory -- the RES, or resident set size you can see in htop.

Using $ pmap -x $(pgrep -f BisqAppMain) | more ...

I find a memory mapped file with an RSS ~3.8 GB

       0000000707200000 3994624 3984976 3984976 rw---   [ anon ]

I don't know what this anonymous mapped file is, but setting the VM's MaxRAM to 2GB in JAVA_OPTS reduces this mmapped file's size to ~0.5 GB.

Corresponding to this huge 3.8GB memory mapped file size is htop's RES value of ~ 4.8 GB, and VIRT ~10.4 GB (when running Bisq with no MaxRAM option).


Here are measurements after starting Bisq with an empty data.dir, using MaxRAM 2 GB, and letting it run for 30 minutes (not touching the GUI).

export JAVA_OPTS="-XX:MaxRAM=2147483648"
    (1.5 GB is not enough MaxRAM and the OOM Killer crashes Bisq)

RES 1118 MB,  VIRT 6986 MB      (SEE $ htop)

The java process' largest memory mapped file is ~0.5 GB  
    (Use $ pmap -x $(pgrep -f BisqAppMain) | more)  

00000000e0000000  524288  524288  524288 rw---   [ anon ]

About 7-8 hours after starting Bisq with a 3 GB MaxRAM setting, I looked at

$htop -> RES: ~2 GB, VIRT: ~4 GB

Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->

      PID     MINFL  MAJFL CMD
     14615 777788    235     openjdk/jdk-11/bin/java -XX:MaxRAM=3221225472 -classpath ...

I unset JAVA_OPTS, restarted Bisq (with an empty data.dir), and checked minor/major page fault counts after uptime ~2.75 hrs

$htop -> RES: ~4.7 GB, VIRT: ~10.4 GB

Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->

     PID      MINFL  MAJFL CMD
    26736 1312906    14      openjdk/jdk-11/bin/java -classpath ...

The bisq instance running with the default 128 GB MaxRAM setting is almost doubling the number of minor page faults in less than half the time as the instance running with a 3 GB MaxRAM setting.


Attached below are two page fault flame graphs showing very large differences in the number of faults while tracing page_fault_user events for 120s, begun 30s after bisq was started with an empty data.dir.

bisq-page-faults-2g-maxram.svg      shows 110,393 pages 
bisq-page-faults-128g-maxram.svg    shows 954,347 pages

I had to zip them up to upload them to here because you can't upload svg files to comments, and the flamegraph tool does not produce png files.

page-fault-flame-graphs.zip
page-fault-flame-graphs.zip.md5sum.txt

Open them in a browser, and hover your mouse over and drill down to levels representing "do_page_fault" event counts for libjvm.so and java method code paths. To return to the original view, click "Reset Zoom" at the top left of page.

All 10 comments

Related to #3128, #3657.

Related to: #3128, #3657, #3918, #3917, #3787, #3786, #3686, #3677, #3343

I think this is due to one or several JavaFx bugs related to bug JDK-8188094

During profiling, I noticed that the JavaFX Application Thread had been allocated 66.6 GiB
JavaFX-App-Thread-Alloc-66Gib-2020-02-19 17-45-46
This couldn't be right, as my vm is 8 GiB, and my bisq instance had -Xmx=4g. I thought it was just a JDK Mission Control bug, but I've been reproducing the problem with a tiny javafx demo app that constantly leaks from the JavaFx app thread (the same demo app found in the OpenJDK bug report). The OS does not get the RAM back until I quit the IDE too.

I have not found a solution. I'm running Ubuntu 18 & 19(vm), using an nvidia gpu.

What I think is happening on my Linux machine is the java vm thinks it has 128 GB of RAM to work with.

My OpenJDK 10 and 11 VMs have a very large default -XX:MaxRAM setting of 128 GB. It looks like "because" of this -- I need to be more careful with that word -- there are too many minor page faults, and libc malloc()s are creating a lot of extra memory segments via mmap() syscalls, but not using heap memory for alloctions via brk() calls. During tracing sessions, I saw mmap() calls, but not brk() calls.

The MaxRAM setting is described in the OpenJDK source.

OpenJDK 9:
    https://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/runtime/globals.hpp#l2030  
    product_pd(uint64_t, MaxRAM, "Real memory size (in bytes) used to set maximum heap size") 
    range(0, 0XFFFFFFFFFFFFFFFF)                    
OpenJDK 11:
    https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/hotspot/share/runtime/globals.hpp
    define_pd_global(uint64_t,MaxRAM,                    1ULL*G);

You can check your own VM's default MaxRAM setting:

OpenJDK 11
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
     uint64_t MaxRAM = 137438953472

OpenJDK 10:
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
    uint64_t MaxRAM = 137438953472

137438953472 bytes is 128 gigabytes

I can set MaxRAM=2GiB in JAVA_OPTS

        export JAVA_OPTS="-XX:MaxRAM=2147483648"

and Bisq's RES (SEE htop) stays below ~1.2 GB, instead of growing to ~4.8 GB. (But I haven't let it run for days...)


After libc malloc()s expand the extra virtual memory via mmap() syscalls, the app causes MMU violations when it tries to use the allocation memory range(s) and finds no mapping for an address or addresses.

I saw MMU errors in GC logs last night, when using -XX:+ExtendedDTraceProbes + GC logging, but only today realized there may be a connection to my VM's detaults MaxRAM=128GB setting. I had assumed it was due to the extra overhead from using -XX:+DebugNonSafepoints -XX:+PreserveFramePointer, but especially due to -XX:+ExtendedDTraceProbes.

I believe these MMU violations in turn result in page faults, and causes consumption of extra pages of physical memory -- the RES, or resident set size you can see in htop.

Using $ pmap -x $(pgrep -f BisqAppMain) | more ...

I find a memory mapped file with an RSS ~3.8 GB

       0000000707200000 3994624 3984976 3984976 rw---   [ anon ]

I don't know what this anonymous mapped file is, but setting the VM's MaxRAM to 2GB in JAVA_OPTS reduces this mmapped file's size to ~0.5 GB.

Corresponding to this huge 3.8GB memory mapped file size is htop's RES value of ~ 4.8 GB, and VIRT ~10.4 GB (when running Bisq with no MaxRAM option).


Here are measurements after starting Bisq with an empty data.dir, using MaxRAM 2 GB, and letting it run for 30 minutes (not touching the GUI).

export JAVA_OPTS="-XX:MaxRAM=2147483648"
    (1.5 GB is not enough MaxRAM and the OOM Killer crashes Bisq)

RES 1118 MB,  VIRT 6986 MB      (SEE $ htop)

The java process' largest memory mapped file is ~0.5 GB  
    (Use $ pmap -x $(pgrep -f BisqAppMain) | more)  

00000000e0000000  524288  524288  524288 rw---   [ anon ]

About 7-8 hours after starting Bisq with a 3 GB MaxRAM setting, I looked at

$htop -> RES: ~2 GB, VIRT: ~4 GB

Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->

      PID     MINFL  MAJFL CMD
     14615 777788    235     openjdk/jdk-11/bin/java -XX:MaxRAM=3221225472 -classpath ...

I unset JAVA_OPTS, restarted Bisq (with an empty data.dir), and checked minor/major page fault counts after uptime ~2.75 hrs

$htop -> RES: ~4.7 GB, VIRT: ~10.4 GB

Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->

     PID      MINFL  MAJFL CMD
    26736 1312906    14      openjdk/jdk-11/bin/java -classpath ...

The bisq instance running with the default 128 GB MaxRAM setting is almost doubling the number of minor page faults in less than half the time as the instance running with a 3 GB MaxRAM setting.


Attached below are two page fault flame graphs showing very large differences in the number of faults while tracing page_fault_user events for 120s, begun 30s after bisq was started with an empty data.dir.

bisq-page-faults-2g-maxram.svg      shows 110,393 pages 
bisq-page-faults-128g-maxram.svg    shows 954,347 pages

I had to zip them up to upload them to here because you can't upload svg files to comments, and the flamegraph tool does not produce png files.

page-fault-flame-graphs.zip
page-fault-flame-graphs.zip.md5sum.txt

Open them in a browser, and hover your mouse over and drill down to levels representing "do_page_fault" event counts for libjvm.so and java method code paths. To return to the original view, click "Reset Zoom" at the top left of page.

There is a glibc setting on Linux that can be used to define a hard limit of the number of arenas -- thread safe memory pools used by malloc() -- that can be created. The trade off for using this is more thread contention among Bisq's allocating threads versus less memory usage.

export MALLOC_ARENA_MAX=4

Using this on my Ubuntu 18 laptop reduces the RES memory by ~300 mb, and VIRT by ~3.7 GB.

In addition, configuring the jvm's starting and max heap size reduces both RES and VIRT by another ~200 mb.

export JAVA_OPTS="-Xms820m -Xmx820m"

Used together

export MALLOC_ARENA_MAX=4       
export JAVA_OPTS="-Xms820m -Xmx820m"

RES memory is reduced by ~500 mb, and VIRT memory by ~3.9 GB.

I will test these on VMs next (Arch Linux, Debian, Fedora).

These settings, any of which disable dynamic adjustment of glibc's mmap threshold, save another 100 mb in RES and VIRT, but I haven't experimented enough to suggest anyone else try them yet.

export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_TOP_PAD_=131072
export MALLOC_MMAP_MAX_=65536

By the way, I am looking for leaks too...

On my Ubuntu 18 hosted VMs Arch Linux kernel 5.5.8, Fedora 31, and Debian 10.3, the settings

export MALLOC_ARENA_MAX=4
export JAVA_OPTS="-Xms820m -Xmx820m"

reduced resident memory use by ~400 MB, and virtual memory by ~1.9 GB.

I assume the overall reduced memory footprint on the VMs is because they don't have access to the GPU.

Tracing linux glibc memory_arena_new calls via

sudo bpftrace -e 'usdt:/proc/<PID>/root/lib/x86_64-linux-gnu/libc-2.27.so:libc:memory_arena_new { printf("PID %d: created new arena at %p, size %d\n", pid, arg0, arg1); }'

is showing Bisq malloc() calls creating 20 new arenas between startup (with emtpy data.dir) and the time the BlockParser is done.

When I started Bisq after setting

export MALLOC_ARENA_MAX=4
export MALLOC_MMAP_THRESHOLD_=131072

the same trace shows Bisq is does not create any new arenas during the same interval; it is using existing arenas.

so all in all, it seems that our memory issues on linux are mainly caused by the JVM, its implementations and (default) configuration?

what do you suggest as an action plan? get the malloc stuff into our code? do more tests?

To sum up, I agree the biggest problem on Linux is off-heap memory growth. Your adding the MaxRAM jvm option reduced resident memory > 50%

The next experiment should be adding

export MALLOC_ARENA_MAX=4
export MALLOC_MMAP_THRESHOLD_=131072

to the bisq-desktop and bisq-daemon startup scripts; _this requires no changes to the binaries_.
If testers don't see any performance drag from this after using this setting for X days, they might try to save another ~100 or so MB RAM by setting the jvm's start & max heap size:

-Xms820m  -Xmx820m

820 MB should be sufficient, and I think Linux bisq users would be happy to see a ~ 1.0 GB resident memory size after starting bisq with an empty data dir. (When we started investigating, bisq's resident memory size was growing to ~ 4.8 GB. So problem not solved, but this is a significant improvement.)

Regarding the Java Heap: I certain JavaFX 11 has leaks, and their dev team has been fixing them in a version we cannot use yet. Can't do much about that yet. There are probably other on-heap leaks, but I have not been able to identify with certainty.

Thanks a lot @ghubstan for your investigations!!!!

@ripcurlx Should we add those options to the build scripts?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cbeams picture cbeams  路  4Comments

svpi11 picture svpi11  路  6Comments

21isenough picture 21isenough  路  5Comments

wiz picture wiz  路  3Comments

ShaunHoyes picture ShaunHoyes  路  6Comments