Julia: Memory leak with function redefinition

Created on 8 Jan 2019 · 16Comments · Source: JuliaLang/julia

This code takes less than a minute to fill the 64GB of my Linux machine:

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

julia> for i in 1:100
           @eval function foo()
               [fill(1.0, 2_000_000) for _ in 1:100]
               nothing
           end
           @time @eval foo()
           # GC.gc()   # uncomment this line to fix the issue
       end

EDIT: simplified code

Source

cstjean

Most helpful comment

I spent a couple hours looking at this and it turns out this isn't actually a memory leak on the Julia side. The arrays get alloced using malloc and we appropriately free them on the next GC. However, for some reason glibc refuses to release the memory to the operating system. This can be worked around manually by ccalling malloc_trim, which cuts the memory use right back down to acceptable levels. We may want to consider doing that automatically on full GC, though looking into exactly why glibc doesn't release any of the memory here might be interesting as well.

Keno on 23 Mar 2019

👍5

All 16 comments

Looks like julia does know about this memory because the full GC is able to reclaim it.

Keno on 8 Jan 2019

It seems that it's the function redefinition which somehow causes a... lost GC root? I'm out of my depth here, but if I take out the @evals, then there's no leak. I found this by repeatedly running similar code in an IJulia cell.

cstjean on 8 Jan 2019

Looks like julia does know about this memory because the full GC is able to reclaim it.

FWIW, if I run it for 30 iterations, then gc(), it seems to only reclaim the last foo()'s garbage.

cstjean on 9 Jan 2019

Here's code that shows that the function hangs on to its temporary's memory even once it's returned. I define 100 identical functions, that each allocate a large temporary vector of vector:

julia> memory_usage() = parse(Int, split(read(`ps -p $(getpid()) -o rss`, String))[2]) / 1024 # in MB
memory_usage (generic function with 1 method)

julia> funs = []
       for i in 1:100    
           f = Symbol(:foo, i)
           @eval function $f()
               [fill(1.0, 2_000_000) for _ in 1:100]
               nothing
           end
           @eval push!(funs, $f)
       end

Calling the first function 20 times does not increase memory usage (in MB):

julia> mem_used = [(funs[1](); memory_usage()) for j in 1:20];

julia> using UnicodePlots; lineplot(mem_used)
        ┌────────────────────────────────────────┐ 
   4000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⢀⡀⠀⠀⠀⠀⢀⡠⠤⠤⠤⣀⣀⡠⠤⡄⠀⠀⢀⠤⠤⡄⠀⠀⢀⠤⠤⡄⠀⠀⢀⠤⠤⡄⠀│ 
        │⠀⠀⠀⠀⠀⢸⡇⠀⠀⢠⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠈⠢⠔⠁⠀⠀⠈⠢⠔⠁⠀⠀⠈⠢⠔⠁⠀⠀⠈⠢│ 
        │⠀⠀⠀⠀⠀⡸⢣⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⡇⢸⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⢠⠃⠈⡆⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⢸⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⡜⠀⠀⢸⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⡇⠀⠀⠘⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠒⠒⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   1000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
        └────────────────────────────────────────┘ 
        0                                       20

But calling the jth function does:

julia> mem_used = [(funs[j](); memory_usage()) for j in 1:20];

julia> using UnicodePlots; lineplot(mem_used)
         ┌────────────────────────────────────────┐ 
   20000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⠤⠔⠒⠊⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠉⠉⠉⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠⠎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠⠎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⡠⠤⠒⠒⠉⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⢀⠎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
         └────────────────────────────────────────┘ 
         0                                       20

cstjean on 8 Feb 2019

This is just me thinking out loud here, but could it literally just be code size? You _are_ generating new functions every time, and those functions take a certain amount of memory, so this could just be due to the fact that you're generating new code in a loop.

staticfloat on 21 Mar 2019

That code creates 100 simple functions, calls 20 of them, then memory usage goes to 20 GB. That would be excessive for compiled code, no?

cstjean on 21 Mar 2019

......ah. I did not realize those were the units on your graphs. ;) Yes, that does seem excessive.

staticfloat on 21 Mar 2019

Keno on 23 Mar 2019

👍5

malloc_trim freed 450MB out of 1020MB in our application, after GC!

What I don't get is that these are large allocations, which should be done with mmap, according to the man page:

M_MMAP_THRESHOLD
For allocations greater than or equal to the limit specified
(in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
the free list, the memory-allocation functions employ mmap(2)
instead of increasing the program break using sbrk(2).
...
Balancing these factors leads to a default setting
of 128*1024 for the M_MMAP_THRESHOLD parameter.

And thus malloc_trim should be irrelevant. Is Julia overriding this parameter?

For anyone else like me who had no clue what brkis, this page was instructive. This thread suggests that the Linux man page for malloc_trim is inaccurate (it now scans and frees the whole heap, not just the top).

cstjean on 9 Apr 2019

@Keno could you share the line for ccall of malloc_trim? I believe that will solve https://github.com/JuliaImages/Images.jl/issues/670 as well

tbenst on 11 Apr 2019

ccall(:malloc_trim, Cvoid, (Cint,), 0)

Keno on 11 Apr 2019

👍2 😄1

Is this something we can call when users do gc()? Or perhaps periodically?

ViralBShah on 12 Apr 2019

Is there any chance that the proposed fix could be tried? The work-around solves our immediate issue, but having to pepper the code with malloc_trim() is less than ideal.

cstjean on 19 Jun 2019

Pinging @JeffBezanson and @vtjnash here.

ViralBShah on 26 Jun 2019

Hi, I found this via Hacker News.

In the last 2 years I got some experience with this topic, from debugging memory usage of C modules in my Haskell applications (also found a bug in realloc() on the way).

What I don't get is that these are large allocations, which should be done with mmap, according to the man page:

@cstjean You skipped quoting the relevant part of the man page:

Note: Nowadays, glibc uses a dynamic mmap threshold by
default. The initial value of the threshold is 128*1024, but
when blocks larger than the current threshold and less than or
equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold
is adjusted upward to the size of the freed block. When
dynamic mmap thresholding is in effect, the threshold for
trimming the heap is also dynamically adjusted to be twice the
dynamic mmap threshold. Dynamic adjustment of the mmap
threshold is disabled if any of the M_TRIM_THRESHOLD,
M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.

Where DEFAULT_MMAP_THRESHOLD_MAX defaults to 32 MiB on 64-bit systems:

The lower limit for this parameter is 0. The upper limit is
DEFAULT_MMAP_THRESHOLD_MAX: 512*1024 on 32-bit systems or
4*1024*1024*sizeof(long) on 64-bit systems.

That means you can easily get into a situation where allocations < 32 MiB are not served with mmap.