Sharp: How to choose the correct memory allocator?

Created on 6 Sep 2019 · 12Comments · Source: lovell/sharp

First of all, I apologise for opening a new thread related to memory issues - I know there have been quite a few already, but this question (at least in my case) still remains unanswered. For the record, I have read through #955 and a bunch of issues related to that multiple times. I still need some help here, and I didn't know which of the existing issues to revive, so here's a new one.

The setup

I have created four test sharp operations (https://github.com/juona/sharp-memory-experiment/blob/master/generatePrintFile.js):

1) Composite - takes 3 images, performs resize, extract and extend on each one of them, converts two of them into buffers using toBuffer() and performs the composite operation to produce the final result in the form of a PNG file. _This is the original function that I want to work well in my application._
2) Single - takes 1 image, performs resize, extract and extend and writes the result into a PNG file without using toBuffer().
3) Simple to file - takes a single image and writes it to a new file without any sharp operations.
4) Simple to buffer - takes a single image and calls toBuffer() on it and does nothing else, i.e. it just does nothing with the result.

The tests

Now, each of these operations would be run many times in a row under four different conditions to see what influence caching and concurrency have on the memory consumption:

1) Sharp concurrency on (sharp.concurrency(0)) and sharp cache on (sharp.cache(true));
2) Sharp concurrency on (sharp.concurrency(0)) and sharp cache off (sharp.cache(false));
3) Sharp concurrency off (sharp.concurrency(1)) and sharp cache on (sharp.cache(true));
4) Sharp concurrency off (sharp.concurrency(1)) and sharp cache off (sharp.cache(false));

Also, the previously described sharp operations would be run in chunks of 10, e.g. I would simultaneously launch 10 "composite" operations, let the program wait until they _all_ finish, record the RSS and do that again.

This would be repeated until either the loop is repeated 100 times (so I have 100 readings of RSS) or the memory reaches 2GB on my Manjaro/1GB on my Mac.

Finally, the images used were two rather large (3400x2000, 11MB) and one small PNG files, all available in the repository (link below).

The whole test suite with instructions is available here: https://github.com/juona/sharp-memory-experiment.

The important bit

I ran these tests using three different memory allocators:

1) On a Manjaro system using the default Manjaro allocator. I am not very good at computers/linux, so I don't know which one it is. :]]
2) On the same Manjaro system using jemalloc, as suggested in other threads, namely #955.
3) On a Mac (10.14.6) using its default allocator.

The results

Here are the graphs of how memory consumption changes with each iteration. The X axis represents the iteration number (so, e.g. iteration 15 means the 15th time that a chunk of 10 sharp operations was finished). The Y axis is RSS as printed by NodeJS itself in MB.

manjaro-default

mac-default

manjaro-jemalloc

As you can see, the results using the default Manjaro allocator are so bad, that we can just ignore them.

On a Mac, the operations which do not involve buffers ("single" and "simple to file") seem to only consume a little bit of memory, and the consumption is basically constant. However, the RSS changes during the "simple to buffer" test look quite odd (don't know how to explain them), and during the "composite" operation (optimising which was my initial goal) the RSS is constantly increasing.

On the Manjaro system with jemalloc enabled, the behaviour is almost identical, except that the memory usage during "simple to buffer" is significantly higher and takes only 13 rounds to reach the 2GB mark.

Questions

Would anyone who claims that they do not experience memory issues when running sharp on their system, care to try my tests out and see what happens?
Given that jemalloc does not help me, how would I know which allocator to choose to fix or at least mitigate the problem? Perhaps there is something wrong with _my_ code?
How could these results be explained? E.g. when using the default Manjaro allocator, there appears to be some sort of RSS saturation. I know others have seen it too but I still don't understand why it just does not keep increasing.
More of a note rather than a question, but IMHO the "use a different memory allocator" suggestion is rather weird. I am yet to encounter software which would officially instruct users that "if you have problems with high RSS, use a different memory allocator". And in case libvips is optimised to a specific allocator, then I'd expect the answer to question 2 to be a really easy one, right? If it's not, then isn't that an indicator of some peculiarities inside libvips?

question

Source

juona

🚀2

Most helpful comment

Thanks @juona for these detailed test cases.

jemalloc helps reduce fragmentation in long-running, multi-threaded processes, at the cost of slightly _higher_ peak memory usage (RSS), as it allocates from multiple arenas.

The logic in the composite test function appears to keep all intermediate images in scope, so they won't become eligible for garbage collection until after the whole test function is complete, which could explain the pattern you're seeing.

As John suggests, forcing V8 to run garbage collection can help better determine the real peak memory requirement for this task as it may reduce the number of new allocations. Remember it's still up to the OS to request that a process returns unused memory, so RSS can still include this freed memory when it is fragmented or smaller than the current page size.

lovell on 9 Sep 2019

👍3

All 12 comments

Here's a graph that better shows how Mac deals with simpleToBuffer:

mac-default-alloc-simpleToBuffer

juona on 6 Sep 2019

And here's another one showing that on Mac RSS does only ever increase during the composite operations:

mac-default-alloc-composite

juona on 6 Sep 2019

Hello @juona,

I tried your test in pyvips, the libvips Python binding:

import sys
import pyvips

for i in range(int(sys.argv[1])):
    print(f"loop {i} ...")

    a = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
    b = pyvips.Image.new_from_file(sys.argv[3], access="sequential")
    c = pyvips.Image.new_from_file(sys.argv[4], access="sequential")

    a = a.crop(10, 10, 1000, 1000).resize(0.9)
    b = b.crop(10, 10, 1000, 1000).resize(0.9)
    c = c.crop(10, 10, 1000, 1000).resize(0.9)

    background = pyvips.Image.black(2000, 2000)
    out = background.composite([c, b, a], "over", 
                               x=[900, 450, 0], y=[0, 450, 900])

    out.write_to_file(sys.argv[5])

Then ran with:

./composite3.py 10000 ~/pics/k2.jpg ~/pics/k4.jpg ~/pics/wtc.jpg x.png

And watched RES in top:

loop | RES (mb)
--- | ---
10 | 105
30 | 128
50 | 131
100 | 142
200 | 155
300 | 145
400 | 139
500 | 141

I don't think there's a leak in libvips itself
I would expect it to take at least 200 iterations for memory use to stabilize due to heap fragmentation

Memory management in node, especially when there are external libraries involved, is complex, with many interacting systems, most of which know little about each other :(

In your buffer case, could it be that the node GC is simply not firing? Have you tried forcing a GC?

Start node with:

$ node --expose-gc file.js

Then add this to your loop:

global.gc();

No use in production, of course, but it might help explain the memory behaviour you are seeing.

jcupitt on 9 Sep 2019

👍2

Thanks @juona for these detailed test cases.

jemalloc helps reduce fragmentation in long-running, multi-threaded processes, at the cost of slightly _higher_ peak memory usage (RSS), as it allocates from multiple arenas.

lovell on 9 Sep 2019

👍3

I've check on our production infrastructure while we do some heavy processing with Sharp (We resize pictures and generate tiles for them).

Here is an example of our graphana charts, it seems that it causes one of our service to crash (After ~3k calls to sharp.tiles() ) (Strangely, I wasn't able to reproduce it locally).
tiles-generation-chart

RaboliotLeGris on 9 Sep 2019

Just reading into this as we are seeing similar activity.

Screen Shot 2019-09-11 at 23 55 27

This is an image processing / resizing service that we are currently benchmarking for production, the only code used is:

return sharp(binary).rotate().resize({ width: size, height: size }).png().toBuffer().then(thumb => {
  return ThumbnailsBucket
    .file(`${name}@${size}x${size}`, { metadata: { width: size, height: size } })
    .save(thumb)
})

However, the invocation throughput was also increasing over time, so cannot confirm or deny a memory leak

Screen Shot 2019-09-12 at 00 00 30

robertpitt on 12 Sep 2019

@RaboliotTheGrey @robertpitt Please read and make sure you understand everything discussed in #955 - happy to answer questions after that.

lovell on 12 Sep 2019

👀1

Merci beaucoup @lovell.
It fixes the ram consumption with jemalloc.so.1 (~v3.6) on debian Stretch, jemalloc.so.2 (~v5) on debian Buster and musl on alpine.
But it seems to have an intensive usage of the hdd despite not using it at all (I can't explain it but it's not related to Sharp)

RaboliotLeGris on 18 Sep 2019

So sorry for a late reply, I was busy trying to get the feature that I needed working on time.

So to do that I rewrote the whole thing using pyvips last week and indeed there are no memory problems even with the default allocator, so I am very happy that there is a way around this. As I have to use Python anyway, it does not make a huge difference to me, but it would be nice to somehow find a way to make sharp work eually well for everybody, as the lib is really useful.

Thus I will run my tests again with manual garbage collection, as proposed, will see what happens and report back. Please don't close this yet!

Thanks.

juona on 18 Sep 2019

@juona Were you able to make any progress with this?

lovell on 24 Nov 2019

Hello,

Sadly, no. I am sorry for:

A late reply.
Not going through with this.

My attention is 100% on the project, so I don't really have time left for this. Feel free to close this, and if I free myself up enough, I'll do the tests in the future.

Thanks for your work!

juona on 3 Dec 2019

Thank you for the update, I'll close this for now but please feel free to reopen with more details if further help is required.

lovell on 3 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings