First of all, I apologise for opening a new thread related to memory issues - I know there have been quite a few already, but this question (at least in my case) still remains unanswered. For the record, I have read through #955 and a bunch of issues related to that multiple times. I still need some help here, and I didn't know which of the existing issues to revive, so here's a new one.
I have created four test sharp operations (https://github.com/juona/sharp-memory-experiment/blob/master/generatePrintFile.js):
1) Composite - takes 3 images, performs resize, extract and extend on each one of them, converts two of them into buffers using toBuffer() and performs the composite operation to produce the final result in the form of a PNG file. _This is the original function that I want to work well in my application._
2) Single - takes 1 image, performs resize, extract and extend and writes the result into a PNG file without using toBuffer().
3) Simple to file - takes a single image and writes it to a new file without any sharp operations.
4) Simple to buffer - takes a single image and calls toBuffer() on it and does nothing else, i.e. it just does nothing with the result.
Now, each of these operations would be run many times in a row under four different conditions to see what influence caching and concurrency have on the memory consumption:
1) Sharp concurrency on (sharp.concurrency(0)) and sharp cache on (sharp.cache(true));
2) Sharp concurrency on (sharp.concurrency(0)) and sharp cache off (sharp.cache(false));
3) Sharp concurrency off (sharp.concurrency(1)) and sharp cache on (sharp.cache(true));
4) Sharp concurrency off (sharp.concurrency(1)) and sharp cache off (sharp.cache(false));
Also, the previously described sharp operations would be run in chunks of 10, e.g. I would simultaneously launch 10 "composite" operations, let the program wait until they _all_ finish, record the RSS and do that again.
This would be repeated until either the loop is repeated 100 times (so I have 100 readings of RSS) or the memory reaches 2GB on my Manjaro/1GB on my Mac.
Finally, the images used were two rather large (3400x2000, 11MB) and one small PNG files, all available in the repository (link below).
I ran these tests using three different memory allocators:
1) On a Manjaro system using the default Manjaro allocator. I am not very good at computers/linux, so I don't know which one it is. :]]
2) On the same Manjaro system using jemalloc, as suggested in other threads, namely #955.
3) On a Mac (10.14.6) using its default allocator.
Here are the graphs of how memory consumption changes with each iteration. The X axis represents the iteration number (so, e.g. iteration 15 means the 15th time that a chunk of 10 sharp operations was finished). The Y axis is RSS as printed by NodeJS itself in MB.



As you can see, the results using the default Manjaro allocator are so bad, that we can just ignore them.
On a Mac, the operations which do not involve buffers ("single" and "simple to file") seem to only consume a little bit of memory, and the consumption is basically constant. However, the RSS changes during the "simple to buffer" test look quite odd (don't know how to explain them), and during the "composite" operation (optimising which was my initial goal) the RSS is constantly increasing.
On the Manjaro system with jemalloc enabled, the behaviour is almost identical, except that the memory usage during "simple to buffer" is significantly higher and takes only 13 rounds to reach the 2GB mark.
libvips is optimised to a specific allocator, then I'd expect the answer to question 2 to be a really easy one, right? If it's not, then isn't that an indicator of some peculiarities inside libvips?Here's a graph that better shows how Mac deals with simpleToBuffer:

And here's another one showing that on Mac RSS does only ever increase during the composite operations:

Hello @juona,
I tried your test in pyvips, the libvips Python binding:
import sys
import pyvips
for i in range(int(sys.argv[1])):
print(f"loop {i} ...")
a = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
b = pyvips.Image.new_from_file(sys.argv[3], access="sequential")
c = pyvips.Image.new_from_file(sys.argv[4], access="sequential")
a = a.crop(10, 10, 1000, 1000).resize(0.9)
b = b.crop(10, 10, 1000, 1000).resize(0.9)
c = c.crop(10, 10, 1000, 1000).resize(0.9)
background = pyvips.Image.black(2000, 2000)
out = background.composite([c, b, a], "over",
x=[900, 450, 0], y=[0, 450, 900])
out.write_to_file(sys.argv[5])
Then ran with:
./composite3.py 10000 ~/pics/k2.jpg ~/pics/k4.jpg ~/pics/wtc.jpg x.png
And watched RES in top:
loop | RES (mb)
--- | ---
10 | 105
30 | 128
50 | 131
100 | 142
200 | 155
300 | 145
400 | 139
500 | 141
Memory management in node, especially when there are external libraries involved, is complex, with many interacting systems, most of which know little about each other :(
In your buffer case, could it be that the node GC is simply not firing? Have you tried forcing a GC?
Start node with:
$ node --expose-gc file.js
Then add this to your loop:
global.gc();
No use in production, of course, but it might help explain the memory behaviour you are seeing.
Thanks @juona for these detailed test cases.
jemalloc helps reduce fragmentation in long-running, multi-threaded processes, at the cost of slightly _higher_ peak memory usage (RSS), as it allocates from multiple arenas.
The logic in the composite test function appears to keep all intermediate images in scope, so they won't become eligible for garbage collection until after the whole test function is complete, which could explain the pattern you're seeing.
As John suggests, forcing V8 to run garbage collection can help better determine the real peak memory requirement for this task as it may reduce the number of new allocations. Remember it's still up to the OS to request that a process returns unused memory, so RSS can still include this freed memory when it is fragmented or smaller than the current page size.
I've check on our production infrastructure while we do some heavy processing with Sharp (We resize pictures and generate tiles for them).
Here is an example of our graphana charts, it seems that it causes one of our service to crash (After ~3k calls to sharp.tiles() ) (Strangely, I wasn't able to reproduce it locally).

Just reading into this as we are seeing similar activity.

This is an image processing / resizing service that we are currently benchmarking for production, the only code used is:
return sharp(binary).rotate().resize({ width: size, height: size }).png().toBuffer().then(thumb => {
return ThumbnailsBucket
.file(`${name}@${size}x${size}`, { metadata: { width: size, height: size } })
.save(thumb)
})
However, the invocation throughput was also increasing over time, so cannot confirm or deny a memory leak

@RaboliotTheGrey @robertpitt Please read and make sure you understand everything discussed in #955 - happy to answer questions after that.
Merci beaucoup @lovell.
It fixes the ram consumption with jemalloc.so.1 (~v3.6) on debian Stretch, jemalloc.so.2 (~v5) on debian Buster and musl on alpine.
But it seems to have an intensive usage of the hdd despite not using it at all (I can't explain it but it's not related to Sharp)
So sorry for a late reply, I was busy trying to get the feature that I needed working on time.
So to do that I rewrote the whole thing using pyvips last week and indeed there are no memory problems even with the default allocator, so I am very happy that there is a way around this. As I have to use Python anyway, it does not make a huge difference to me, but it would be nice to somehow find a way to make sharp work eually well for everybody, as the lib is really useful.
Thus I will run my tests again with manual garbage collection, as proposed, will see what happens and report back. Please don't close this yet!
Thanks.
@juona Were you able to make any progress with this?
Hello,
Sadly, no. I am sorry for:
My attention is 100% on the project, so I don't really have time left for this. Feel free to close this, and if I free myself up enough, I'll do the tests in the future.
Thanks for your work!
Thank you for the update, I'll close this for now but please feel free to reopen with more details if further help is required.
Most helpful comment
Thanks @juona for these detailed test cases.
jemalloc helps reduce fragmentation in long-running, multi-threaded processes, at the cost of slightly _higher_ peak memory usage (RSS), as it allocates from multiple arenas.
The logic in the
compositetest function appears to keep all intermediate images in scope, so they won't become eligible for garbage collection until after the whole test function is complete, which could explain the pattern you're seeing.As John suggests, forcing V8 to run garbage collection can help better determine the real peak memory requirement for this task as it may reduce the number of new allocations. Remember it's still up to the OS to request that a process returns unused memory, so RSS can still include this freed memory when it is fragmented or smaller than the current page size.