Julia: Large memory leak when using threads

Created on 4 May 2019  路  27Comments  路  Source: JuliaLang/julia

With the following minimal example, I am seeing memory usage reach 10GB+ within seconds if I start Julia with JULIA_NUM_THREADS=4 (no other options).

If I only use single thread, memory usage remains around 200MB.

I've tested on both the 1.1 release and yesterday's nightly on a Linux Skylake Xeon.

Threads.@threads for i in 1:100000
    sum(collect(1:10^6))
end
multithreading

Most helpful comment

The segfault is normal. You should ignore it.

All 27 comments

Did you check if this is actually a leak by calling GC? A memory leak would mean that it doesn't get freed, not that the behavior of the GC might want to keep it around for a bit longer for performance reasons. Here, since the GC is not multithreaded, I would assume that the GC tries to not run for as long as possible when threaded, but that doesn't mean that your memory will fill up and Julia will crash. Instead it'll wait quite a bit until it either has to GC or it can GC after the multithreading, which is the behavior you would want for performance.

I tried:

julia> versioninfo()
Julia Version 1.3.0-DEV.163
Commit c40d9b099c (2019-05-04 03:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.0 (ORCJIT, znver1)
Environment:
  JULIA_NUM_THREADS = 16

julia> function foo(N)
           Threads.@threads for i in 1:N
               sum(collect(1:10^6))
           end
       end
foo (generic function with 1 method)

julia> @time foo(10^5)
 46.325609 seconds (337.58 k allocations: 736.214 GiB, 15.02% gc time)

julia> @time foo(10^5)
 42.238979 seconds (199.09 k allocations: 738.308 GiB, 9.36% gc time)

julia> GC.gc()

julia> @time foo(10^5)
 42.090349 seconds (199.11 k allocations: 738.472 GiB, 8.85% gc time)

So there seems to be a lot of GC activity. After the first run it used about 25G of RAM. After the last run, it was using over 32G.

The issue is similar to the one in this SO question asked a while back. You do not need sum or anything else than allocations to create the leak.

function f(n) 
   Threads.@threads for i=1:n
        zeros(n)
   end
end

Try with large values like f(10^5) to see the issue. Forcing GC helps _maybe_ a little.

For me, adding a call to GC.gc() within the @threads call at the end seems to keep memory usage in check, although things slow to a crawl. This is on commit 4dc15938bbb1b5f9fda9def3a85e80e3357a8193.

~Maybe this is nothing, but when I run julia-debug under GDB, removing the call at the end to GC.gc() will cause sporadic segfaults on this line:~ https://github.com/JuliaLang/julia/blob/9b5fea51ca639f62bc65e5d92aced79bdb28b651/base/locks-mt.jl#L44

~I'm not on master, though, so I'll start a new build to see if this behaviour still occurs.~

This is normal behavior (and also occurs on master).

The segfault is normal. You should ignore it.

I'm going to take a wild (uneducated) guess and say that somehow, the arrays getting allocated sometimes don't get rooted in the GC, and so effectively become lost (which is why an explicit GC.gc() after the @threads loop finished fails to collect the allocated arrays). To confirm this, I'm working on getting Valgrind to cooperate with me for this test case, but having some slight technical difficulties...

Assuming the above theory is plausible, is there any documentation on how I could go about recording GC statistics over time, e.g. to confirm that every allocation is later deallocated?

don't get rooted in the GC

No. Rooting does exactly the opposite.

Regarding whether the memory actually leaks: You need to test whether julia's memory use grows without bounds or reaches a steady state. With the default config, julia / linux / glibc is very bad at returning memory to the OS, probably due to heap fragmentation. However, the memory is still there and not leaked (glibc knows that the memory is free and will hand it out on malloc).

Has anyone tried reproducing on windows or osx?

Instead it'll wait quite a bit until it either has to GC or it can GC after the multithreading

This is incorrect; the GC does not wait for the end of @threads to run. It can run during the loop and is fully multi-threaded. The only capability we're missing is running julia code concurrently with GC, which is a very tall order. We can run non-julia (e.g. C) code concurrently with GC, and any julia code should at least handle threaded GC correctly.

I tried running https://github.com/JuliaLang/julia/issues/31923#issuecomment-489394236 on the 1.1 release binary and master and with 4 threads I see memory use holding steady at no more than about 3GB. On the second and third @time runs it goes up to 4-5GB but seems to eventually reach a steady state of about 4.3GB. Strange.

Is there any progress on this subject? I'm running into similar issues with a simulation of mine. @code_warntype doesn't show any type instabilities but memory consumption fluctuates a lot after a few iterations even on current 1. 1 release.

I didn't manage to reduce the code, but I think it is related to this issue. When I run the benchmark for YaoArrayRegister defined here (with PkgBenchmark.jl) :

https://github.com/QuantumBFS/YaoArrayRegister.jl/blob/master/benchmark/benchmarks.jl

with the following configuration

BenchmarkConfig(
        id="origin/multithreading",
        env = Dict("JULIA_NUM_THREADS"=>4),
        juliacmd=`julia -O3`
    )

I hit memory leak (but was fine on Mac OS), but if I run this with single thread (JULIA_NUM_THREADS"=>1) it works fine on Linux. My versioninfo is

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I tried this on both julia 1.1.1, and current master (e39f498aca58abb8aecac34d329d7de9a0cead02), got this issue both.

You should be able to reproduce this by running the following:

] dev YaoBase YaoArrayRegister BitBasis
] add StaticArrays LuxurySparse PkgBenchmark BenchmarkTools

and

```julia
using PkgBenchmark

benchmarkpkg("YaoArrayRegister", BenchmarkConfig(
id="origin/multithreading",
env = Dict("JULIA_NUM_THREADS"=>4),
juliacmd=julia -O3
)
"""

I'll try if I could find a reduced version, and post here later.

I tried https://github.com/JuliaLang/julia/issues/31923#issuecomment-489394236 as well, got memory keep increasing on this machine as well.

For those on linux experiencing this, try ccalling malloc_trim in the loop and see if it helps.

Doesn't seem to have an effect for me. Julia was using 26GB of RAM after running:

julia> Threads.@threads for _ in 1:10^4
            collect(1:10^6)
            ccall(:malloc_trim, Cvoid, (Cint,), 0)
        end

How much RAM does your system have?

My system has 64GB of RAM.

Every time I run the loop (without restarting Julia), it will end up with a different (seemingly random) amount of RAM used, so it does appear to be freeing memory sometimes. With this example at least, Julia's resident memory never exceeds ~32GB, so my system only starts swapping if other processes are using at least ~32GB.

For me this seems to keep the memory leak in check (with some random RAM fluctuation still present). I notice however, that robsmith11's example seems to be running really slow

@btime Threads.@threads for _ in 1:10^4 collect(1:10^6) ccall(:malloc_trim, Cvoid, (Cint,), 0) end

yields

72.530 s (19982 allocations: 74.37 GiB)

whereas

@btime @sync @distributed for _ in 1:10^4 collect(1:10^6) end

yields

5.849 s (640 allocations: 25.53 KiB)

The test were run for JULIA_NUM_THREADS=4 and nworkers()=4 on a 4 core linux machine.

Good news! Please try #32217.

I've just tried your branch, but unfortunately I don't see a big difference. After a few seconds of running my example, I see some spikes up to 46GB of memory used..

EDIT: Sorry, I just realized I cloned the wrong branch.. let me retest

EDIT2: Awesome! Now that I'm actually using your branch, memory stays under 512MB. Looks like that fixed it. :)

Also works for me, both the above example as well as in my actual code. Many thanks!

Great, that confirms this is a duplicate of #27173.

@JeffBezanson, did you mean to link to https://github.com/JuliaLang/julia/issues/27173? That issue is still open and is about a data race not a memory leak.

It's not actually a memory leak; it's growing memory use due to incorrect updating of the GC counters.

Perhaps I'm a bit slow on the uptake today, but how is this fixed but #27173 isn't?

It's not fixed yet; I closed this as a duplicate of it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

manor picture manor  路  3Comments

ararslan picture ararslan  路  3Comments

yurivish picture yurivish  路  3Comments

tkoolen picture tkoolen  路  3Comments

arshpreetsingh picture arshpreetsingh  路  3Comments