Sharp: Question: memory usage

Created on 21 Apr 2017  路  9Comments  路  Source: lovell/sharp

I use the following code to generate a thumbnail image on my parse sever. It does

  • fetch the image from an URL to a buffer
  • run sharp on it to resize, save result in buffer
  • create parse file from buffer
  • save the parse file to DB
Parse.Cloud.httpRequest({url: coverartUrl})
.then(function(response) {
    var sharp = require("sharp");
    return sharp(response.buffer).resize(size, size).toFormat('jpeg').toBuffer()
}).then(function(buffer) {
    console.log("Buffer returned, creating Parse File");
    var base64Buffer = buffer.toString("base64");
    var filename = channel.get("coverart").name() + "_thumbnail.jpg";
    var file = new Parse.File(filename, { base64: base64Buffer }, "image/jpeg");
    return file.save();
}).then(function(file) {
    console.log("Thumbnail saved");
    return channel.set(coverartAttribute, file);
});

I run this code on about 30 images. This will cause nodejs to use > 500MB memory afterwards. Garbage collection does not clean the memory even after waiting some time.

I tried to fix it by explicitly setting some buffers to null. But it did not help. Any idea what I am doing wrong and what is causing the memory leak?

question

All 9 comments

Hello, I suspect "500MB" is the peak RSS for this task, which will include free memory that has yet to be released back to the OS. It looks like an original image, its thumbnail and a base64 encoded version of the thumbnail will all be in memory and ineligible for GC until the promise chain ends.

If you've not already done so, I'd recommend experimenting with the cache settings and reading #714 and #429.

until the promise chain ends.

This is my main problem: It is not cleared even after all promises are fullfilled.

I'll take a look at your suggested links.

I tried to set sharp.cache(false);. This did also not reduce the memory usage.

Got memory leak too.

This is my code:

async function resize(imageURL) {
    let thumbnailURLs = {};
    let filename = ulid();
    let storage, file, stream, resizer;

    storage = new GCS({
        projectId: process.env.GCLOUD_PROJECT || config.project_id,
        keyFileName: process.env.GCLOUD_PROJECT ? null : config.keyfile_path,
        credentials: process.env.GCLOUD_PROJECT ? null : config.credentials_path
    }).bucket(config.bucket.line);
    storage.makePublic({
        includeFiles: true
    });
    file = storage.file(filename);
    stream = file.createWriteStream({
        gzip: true,
        public: true,
        metadata: {
            cacheControl: "public, max-age=120",
            contentType: "image/jpeg",
            metadata: {
                createdAt: Date.now()
            }
        }
    });

    resizer = sharp()
        .resize(config.resize.line.width, config.resize.line.height)
        .background({ r: 255, g: 255, b: 255, alpha: 0 })
        .embed()
        .jpeg();

    await new Promise((resolve, reject) => {
        request(imageURL)
            .pipe(resizer)
            .pipe(stream)
            .on("error", err => {
                reject(err);
            })
            .on("finish", () => {
                resolve(true);
            });
    });

    thumbnailURLs.example = `https://example.com/${config.bucket.line}/${filename}`;

    return thumbnailURLs;
}

I have to run it to resize about 30000 files but it fails at about 100. I don't know what to do anymore now.

Hello, what is being described here as a "memory leak" is almost certainly peak memory usage, memory that is either still ineligible for garbage collection or has been freed but has yet to be returned to the OS.

@lovell got any advice to minimize memory usages when using sharp?

@JesusIslam, have you tried upgrading to node.js 7? I've found that V8 5.4+ is more eager to return memory to the OS, which made a big difference in a long-running app that uses sharp's stream API in a way that's similar to what you're doing.

Also, if you're on linux, try adding you app to a cgroup and specify a memory.soft_limit_in_bytes value for it.

@papandreou

I am using v7.8.0 atm.
Turned out the culprit was initializing GCS module inside the async function that is called inside a loop. I solved it by initializing only once outside of the loop. After I do that the memory leak problem got away. This makes me unable to ran it using dynamic bucket to upload the picture to, but it works for my current use case, so yea.

Not sharp's fault I think.

Thanks all, I'll close.

Was this page helpful?
0 / 5 - 0 ratings