Julia: Memory leak with function redefinition

Created on 8 Jan 2019  Β·  16Comments  Β·  Source: JuliaLang/julia

This code takes less than a minute to fill the 64GB of my Linux machine:

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

julia> for i in 1:100
           @eval function foo()
               [fill(1.0, 2_000_000) for _ in 1:100]
               nothing
           end
           @time @eval foo()
           # GC.gc()   # uncomment this line to fix the issue
       end

EDIT: simplified code

GC

Most helpful comment

I spent a couple hours looking at this and it turns out this isn't actually a memory leak on the Julia side. The arrays get alloced using malloc and we appropriately free them on the next GC. However, for some reason glibc refuses to release the memory to the operating system. This can be worked around manually by ccalling malloc_trim, which cuts the memory use right back down to acceptable levels. We may want to consider doing that automatically on full GC, though looking into exactly why glibc doesn't release any of the memory here might be interesting as well.

All 16 comments

Looks like julia does know about this memory because the full GC is able to reclaim it.

It seems that it's the function redefinition which somehow causes a... lost GC root? I'm out of my depth here, but if I take out the @evals, then there's no leak. I found this by repeatedly running similar code in an IJulia cell.

Looks like julia does know about this memory because the full GC is able to reclaim it.

FWIW, if I run it for 30 iterations, then gc(), it seems to only reclaim the last foo()'s garbage.

Here's code that shows that the function hangs on to its temporary's memory even once it's returned. I define 100 identical functions, that each allocate a large temporary vector of vector:

julia> memory_usage() = parse(Int, split(read(`ps -p $(getpid()) -o rss`, String))[2]) / 1024 # in MB
memory_usage (generic function with 1 method)

julia> funs = []
       for i in 1:100    
           f = Symbol(:foo, i)
           @eval function $f()
               [fill(1.0, 2_000_000) for _ in 1:100]
               nothing
           end
           @eval push!(funs, $f)
       end

Calling the first function 20 times does not increase memory usage (in MB):

julia> mem_used = [(funs[1](); memory_usage()) for j in 1:20];

julia> using UnicodePlots; lineplot(mem_used)
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
   4000 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β €β’€β‘€β €β €β €β €β’€β‘ β €β €β €β£€β£€β‘ β €β‘„β €β €β’€β €β €β‘„β €β €β’€β €β €β‘„β €β €β’€β €β €β‘„β €β”‚ 
        β”‚β €β €β €β €β €β’Έβ‘‡β €β €β’ β ”β β €β €β €β €β €β €β €β €β ˆβ ’β ”β β €β €β ˆβ ’β ”β β €β €β ˆβ ’β ”β β €β €β ˆβ ’β”‚ 
        β”‚β €β €β €β €β €β‘Έβ’£β €β €β’Έβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        │⠀⠀⠀⠀⠀⑇Ⓒ⠀⠀⑇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        β”‚β €β €β €β €β’ β ƒβ ˆβ‘†β’ β ƒβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        │⠀⠀⠀⠀Ⓒ⠀⠀⑇Ⓒ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        β”‚β €β €β €β €β‘œβ €β €β’Έβ‘‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β‘‡β €β €β ˜β ƒβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β ’β ’β ƒβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
   1000 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 
        0                                       20

But calling the jth function does:

julia> mem_used = [(funs[j](); memory_usage()) for j in 1:20];

julia> using UnicodePlots; lineplot(mem_used)
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
   20000 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β£ β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β£€β£€β €β ”β ’β Šβ €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β’€β ”β ‰β ‰β ‰β €β €β €β €β €β €β €β”‚ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀①⠀⠒⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β£€β‘ β Žβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘ β ”β ‰β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β£€β‘ β Žβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β£€β ”β Šβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β €β €β‘ β €β ’β ’β ‰β ‰β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β”‚β €β €β €β’€β Žβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         │⠀⠀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
       0 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 
         0                                       20

This is just me thinking out loud here, but could it literally just be code size? You _are_ generating new functions every time, and those functions take a certain amount of memory, so this could just be due to the fact that you're generating new code in a loop.

That code creates 100 simple functions, calls 20 of them, then memory usage goes to 20 GB. That would be excessive for compiled code, no?

......ah. I did not realize those were the units on your graphs. ;) Yes, that does seem excessive.

I spent a couple hours looking at this and it turns out this isn't actually a memory leak on the Julia side. The arrays get alloced using malloc and we appropriately free them on the next GC. However, for some reason glibc refuses to release the memory to the operating system. This can be worked around manually by ccalling malloc_trim, which cuts the memory use right back down to acceptable levels. We may want to consider doing that automatically on full GC, though looking into exactly why glibc doesn't release any of the memory here might be interesting as well.

malloc_trim freed 450MB out of 1020MB in our application, after GC!

What I don't get is that these are large allocations, which should be done with mmap, according to the man page:

M_MMAP_THRESHOLD
For allocations greater than or equal to the limit specified
(in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
the free list, the memory-allocation functions employ mmap(2)
instead of increasing the program break using sbrk(2).
...
Balancing these factors leads to a default setting
of 128*1024 for the M_MMAP_THRESHOLD parameter.

And thus malloc_trim should be irrelevant. Is Julia overriding this parameter?

For anyone else like me who had no clue what brkis, this page was instructive. This thread suggests that the Linux man page for malloc_trim is inaccurate (it now scans and frees the whole heap, not just the top).

@Keno could you share the line for ccall of malloc_trim? I believe that will solve https://github.com/JuliaImages/Images.jl/issues/670 as well

ccall(:malloc_trim, Cvoid, (Cint,), 0)

Is this something we can call when users do gc()? Or perhaps periodically?

Is there any chance that the proposed fix could be tried? The work-around solves our immediate issue, but having to pepper the code with malloc_trim() is less than ideal.

Pinging @JeffBezanson and @vtjnash here.

Hi, I found this via Hacker News.

In the last 2 years I got some experience with this topic, from debugging memory usage of C modules in my Haskell applications (also found a bug in realloc() on the way).

What I don't get is that these are large allocations, which should be done with mmap, according to the man page:

@cstjean You skipped quoting the relevant part of the man page:

Note: Nowadays, glibc uses a dynamic mmap threshold by
default. The initial value of the threshold is 128*1024, but
when blocks larger than the current threshold and less than or
equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold
is adjusted upward to the size of the freed block. When
dynamic mmap thresholding is in effect, the threshold for
trimming the heap is also dynamically adjusted to be twice the
dynamic mmap threshold. Dynamic adjustment of the mmap
threshold is disabled if any of the M_TRIM_THRESHOLD,
M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.

Where DEFAULT_MMAP_THRESHOLD_MAX defaults to 32 MiB on 64-bit systems:

The lower limit for this parameter is 0. The upper limit is
DEFAULT_MMAP_THRESHOLD_MAX: 512*1024 on 32-bit systems or
4*1024*1024*sizeof(long) on 64-bit systems.

That means you can easily get into a situation where allocations < 32 MiB are not served with mmap.

See also the glibc docs on the topic.


@JeffBezanson From https://github.com/JuliaLang/julia/pull/32428#issue-292130996

what seems to be a glibc bug

That statement seems unfounded, is there anything that hints at this being a bug?

It seems that Julia is simply discovering the effects of memory fragmentation and the corresponding glibc malloc tunables.

On my deployed application I correspondingly use:

# malloc()s larger than this many bytes are served with their own mmap() that
# can be free()d individually.
# This overrides the glibc default (dynamic threshold growing up to 32MB)
# with a fixed value. Not giving a value keeps glibc's default.
# We found that the given value is best for our use case,
# reducing memory fragmentation by 8x, which is many GB for our use case.
M_MMAP_THRESHOLD=65536

which as stated, for my use case reduces memory fragmentation by 8x.

That may be, but frankly an allocator not using gigabytes of space that are available to it is pretty egregious no matter what the allocation pattern is.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  Β·  3Comments

musm picture musm  Β·  3Comments

iamed2 picture iamed2  Β·  3Comments

dpsanders picture dpsanders  Β·  3Comments

omus picture omus  Β·  3Comments