Bisq will take more and more memory until the system freezes when none remains.
Additionally if shutdown before no memory is left, not all of it is released. About 4-7GB are shown to be additionally occupied, but no process seems to hold it. Logging out and in again does not fix it. A restart does.
1.2.5
Unknown, just start the application for me.
I expect bisq to take maybe 3GB of ram and released it after closing the application.
More and more ram will be occupied until none is left and the system starts freezing. Sometimes bisq gets terminated by the system other times I just force reboot.
Ubuntu 18.04.3
Ryzen 5 2600X
G.Skill Aegis DIMM Kit 16GB
The log contains a case where, after a reboot I started Bisq and got lucky. The application got terminated (I reacted too slow and could no longer move the mouse) after the ram was full. About 5.5GB of ram were shown to be still occupied after the termination.
Related to #3128, #3657.
Related to: #3128, #3657, #3918, #3917, #3787, #3786, #3686, #3677, #3343
I think this is due to one or several JavaFx bugs related to bug JDK-8188094
During profiling, I noticed that the JavaFX Application Thread had been allocated 66.6 GiB

This couldn't be right, as my vm is 8 GiB, and my bisq instance had -Xmx=4g. I thought it was just a JDK Mission Control bug, but I've been reproducing the problem with a tiny javafx demo app that constantly leaks from the JavaFx app thread (the same demo app found in the OpenJDK bug report). The OS does not get the RAM back until I quit the IDE too.
I have not found a solution. I'm running Ubuntu 18 & 19(vm), using an nvidia gpu.
What I think is happening on my Linux machine is the java vm thinks it has 128 GB of RAM to work with.
My OpenJDK 10 and 11 VMs have a very large default -XX:MaxRAM setting of 128 GB. It looks like "because" of this -- I need to be more careful with that word -- there are too many minor page faults, and libc malloc()s are creating a lot of extra memory segments via mmap() syscalls, but not using heap memory for alloctions via brk() calls. During tracing sessions, I saw mmap() calls, but not brk() calls.
The MaxRAM setting is described in the OpenJDK source.
OpenJDK 9:
https://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/runtime/globals.hpp#l2030
product_pd(uint64_t, MaxRAM, "Real memory size (in bytes) used to set maximum heap size")
range(0, 0XFFFFFFFFFFFFFFFF)
OpenJDK 11:
https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/hotspot/share/runtime/globals.hpp
define_pd_global(uint64_t,MaxRAM, 1ULL*G);
You can check your own VM's default MaxRAM setting:
OpenJDK 11
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
uint64_t MaxRAM = 137438953472
OpenJDK 10:
$ java -XX:+PrintFlagsFinal -version | grep MaxRAM
uint64_t MaxRAM = 137438953472
137438953472 bytes is 128 gigabytes
I can set MaxRAM=2GiB in JAVA_OPTS
export JAVA_OPTS="-XX:MaxRAM=2147483648"
and Bisq's RES (SEE htop) stays below ~1.2 GB, instead of growing to ~4.8 GB. (But I haven't let it run for days...)
After libc malloc()s expand the extra virtual memory via mmap() syscalls, the app causes MMU violations when it tries to use the allocation memory range(s) and finds no mapping for an address or addresses.
I saw MMU errors in GC logs last night, when using -XX:+ExtendedDTraceProbes + GC logging, but only today realized there may be a connection to my VM's detaults MaxRAM=128GB setting. I had assumed it was due to the extra overhead from using -XX:+DebugNonSafepoints -XX:+PreserveFramePointer, but especially due to -XX:+ExtendedDTraceProbes.
I believe these MMU violations in turn result in page faults, and causes consumption of extra pages of physical memory -- the RES, or resident set size you can see in htop.
Using $ pmap -x $(pgrep -f BisqAppMain) | more ...
I find a memory mapped file with an RSS ~3.8 GB
0000000707200000 3994624 3984976 3984976 rw--- [ anon ]
I don't know what this anonymous mapped file is, but setting the VM's MaxRAM to 2GB in JAVA_OPTS reduces this mmapped file's size to ~0.5 GB.
Corresponding to this huge 3.8GB memory mapped file size is htop's RES value of ~ 4.8 GB, and VIRT ~10.4 GB (when running Bisq with no MaxRAM option).
Here are measurements after starting Bisq with an empty data.dir, using MaxRAM 2 GB, and letting it run for 30 minutes (not touching the GUI).
export JAVA_OPTS="-XX:MaxRAM=2147483648"
(1.5 GB is not enough MaxRAM and the OOM Killer crashes Bisq)
RES 1118 MB, VIRT 6986 MB (SEE $ htop)
The java process' largest memory mapped file is ~0.5 GB
(Use $ pmap -x $(pgrep -f BisqAppMain) | more)
00000000e0000000 524288 524288 524288 rw--- [ anon ]
About 7-8 hours after starting Bisq with a 3 GB MaxRAM setting, I looked at
$htop -> RES: ~2 GB, VIRT: ~4 GB
Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->
PID MINFL MAJFL CMD
14615 777788 235 openjdk/jdk-11/bin/java -XX:MaxRAM=3221225472 -classpath ...
I unset JAVA_OPTS, restarted Bisq (with an empty data.dir), and checked minor/major page fault counts after uptime ~2.75 hrs
$htop -> RES: ~4.7 GB, VIRT: ~10.4 GB
Major & Minor page fault counts:
$ ps -o pid,min_flt,maj_flt,cmd ->
PID MINFL MAJFL CMD
26736 1312906 14 openjdk/jdk-11/bin/java -classpath ...
The bisq instance running with the default 128 GB MaxRAM setting is almost doubling the number of minor page faults in less than half the time as the instance running with a 3 GB MaxRAM setting.
Attached below are two page fault flame graphs showing very large differences in the number of faults while tracing page_fault_user events for 120s, begun 30s after bisq was started with an empty data.dir.
bisq-page-faults-2g-maxram.svg shows 110,393 pages
bisq-page-faults-128g-maxram.svg shows 954,347 pages
I had to zip them up to upload them to here because you can't upload svg files to comments, and the flamegraph tool does not produce png files.
page-fault-flame-graphs.zip
page-fault-flame-graphs.zip.md5sum.txt
Open them in a browser, and hover your mouse over and drill down to levels representing "do_page_fault" event counts for libjvm.so and java method code paths. To return to the original view, click "Reset Zoom" at the top left of page.
There is a glibc setting on Linux that can be used to define a hard limit of the number of arenas -- thread safe memory pools used by malloc() -- that can be created. The trade off for using this is more thread contention among Bisq's allocating threads versus less memory usage.
export MALLOC_ARENA_MAX=4
Using this on my Ubuntu 18 laptop reduces the RES memory by ~300 mb, and VIRT by ~3.7 GB.
In addition, configuring the jvm's starting and max heap size reduces both RES and VIRT by another ~200 mb.
export JAVA_OPTS="-Xms820m -Xmx820m"
Used together
export MALLOC_ARENA_MAX=4
export JAVA_OPTS="-Xms820m -Xmx820m"
RES memory is reduced by ~500 mb, and VIRT memory by ~3.9 GB.
I will test these on VMs next (Arch Linux, Debian, Fedora).
These settings, any of which disable dynamic adjustment of glibc's mmap threshold, save another 100 mb in RES and VIRT, but I haven't experimented enough to suggest anyone else try them yet.
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_TOP_PAD_=131072
export MALLOC_MMAP_MAX_=65536
By the way, I am looking for leaks too...
On my Ubuntu 18 hosted VMs Arch Linux kernel 5.5.8, Fedora 31, and Debian 10.3, the settings
export MALLOC_ARENA_MAX=4
export JAVA_OPTS="-Xms820m -Xmx820m"
reduced resident memory use by ~400 MB, and virtual memory by ~1.9 GB.
I assume the overall reduced memory footprint on the VMs is because they don't have access to the GPU.
Tracing linux glibc memory_arena_new calls via
sudo bpftrace -e 'usdt:/proc/<PID>/root/lib/x86_64-linux-gnu/libc-2.27.so:libc:memory_arena_new { printf("PID %d: created new arena at %p, size %d\n", pid, arg0, arg1); }'
is showing Bisq malloc() calls creating 20 new arenas between startup (with emtpy data.dir) and the time the BlockParser is done.
When I started Bisq after setting
export MALLOC_ARENA_MAX=4
export MALLOC_MMAP_THRESHOLD_=131072
the same trace shows Bisq is does not create any new arenas during the same interval; it is using existing arenas.
so all in all, it seems that our memory issues on linux are mainly caused by the JVM, its implementations and (default) configuration?
what do you suggest as an action plan? get the malloc stuff into our code? do more tests?
To sum up, I agree the biggest problem on Linux is off-heap memory growth. Your adding the MaxRAM jvm option reduced resident memory > 50%
The next experiment should be adding
export MALLOC_ARENA_MAX=4
export MALLOC_MMAP_THRESHOLD_=131072
to the bisq-desktop and bisq-daemon startup scripts; _this requires no changes to the binaries_.
If testers don't see any performance drag from this after using this setting for X days, they might try to save another ~100 or so MB RAM by setting the jvm's start & max heap size:
-Xms820m -Xmx820m
820 MB should be sufficient, and I think Linux bisq users would be happy to see a ~ 1.0 GB resident memory size after starting bisq with an empty data dir. (When we started investigating, bisq's resident memory size was growing to ~ 4.8 GB. So problem not solved, but this is a significant improvement.)
Regarding the Java Heap: I certain JavaFX 11 has leaks, and their dev team has been fixing them in a version we cannot use yet. Can't do much about that yet. There are probably other on-heap leaks, but I have not been able to identify with certainty.
Thanks a lot @ghubstan for your investigations!!!!
@ripcurlx Should we add those options to the build scripts?
Most helpful comment
What I think is happening on my Linux machine is the java vm thinks it has 128 GB of RAM to work with.
My OpenJDK 10 and 11 VMs have a very large default -XX:MaxRAM setting of 128 GB. It looks like "because" of this -- I need to be more careful with that word -- there are too many minor page faults, and libc malloc()s are creating a lot of extra memory segments via mmap() syscalls, but not using heap memory for alloctions via brk() calls. During tracing sessions, I saw mmap() calls, but not brk() calls.
The MaxRAM setting is described in the OpenJDK source.
You can check your own VM's default MaxRAM setting:
I can set MaxRAM=2GiB in JAVA_OPTS
and Bisq's RES (SEE htop) stays below ~1.2 GB, instead of growing to ~4.8 GB. (But I haven't let it run for days...)
After libc malloc()s expand the extra virtual memory via mmap() syscalls, the app causes MMU violations when it tries to use the allocation memory range(s) and finds no mapping for an address or addresses.
I saw MMU errors in GC logs last night, when using -XX:+ExtendedDTraceProbes + GC logging, but only today realized there may be a connection to my VM's detaults MaxRAM=128GB setting. I had assumed it was due to the extra overhead from using -XX:+DebugNonSafepoints -XX:+PreserveFramePointer, but especially due to -XX:+ExtendedDTraceProbes.
I believe these MMU violations in turn result in page faults, and causes consumption of extra pages of physical memory -- the RES, or resident set size you can see in htop.
Using $ pmap -x $(pgrep -f BisqAppMain) | more ...
I find a memory mapped file with an RSS ~3.8 GB
I don't know what this anonymous mapped file is, but setting the VM's MaxRAM to 2GB in JAVA_OPTS reduces this mmapped file's size to ~0.5 GB.
Corresponding to this huge 3.8GB memory mapped file size is htop's RES value of ~ 4.8 GB, and VIRT ~10.4 GB (when running Bisq with no MaxRAM option).
Here are measurements after starting Bisq with an empty data.dir, using MaxRAM 2 GB, and letting it run for 30 minutes (not touching the GUI).
About 7-8 hours after starting Bisq with a 3 GB MaxRAM setting, I looked at
I unset JAVA_OPTS, restarted Bisq (with an empty data.dir), and checked minor/major page fault counts after uptime ~2.75 hrs
The bisq instance running with the default 128 GB MaxRAM setting is almost doubling the number of minor page faults in less than half the time as the instance running with a 3 GB MaxRAM setting.
Attached below are two page fault flame graphs showing very large differences in the number of faults while tracing page_fault_user events for 120s, begun 30s after bisq was started with an empty data.dir.
I had to zip them up to upload them to here because you can't upload svg files to comments, and the flamegraph tool does not produce png files.
page-fault-flame-graphs.zip
page-fault-flame-graphs.zip.md5sum.txt
Open them in a browser, and hover your mouse over and drill down to levels representing "do_page_fault" event counts for libjvm.so and java method code paths. To return to the original view, click "Reset Zoom" at the top left of page.