React-pdf: Memory leak - Every time a PDF is generated, the memory used increases and never comes back down.

Created on 16 Sep 2019  路  8Comments  路  Source: diegomura/react-pdf

First, Thank you for the awesome work on this library!

Describe the bug
We notice that rendering PDFs caused a memory leak in our prod environment. We dedicated some work to isulating the issue and it seems to be inside the react-pdf, because it only happens when we call it. There's a chance I could be wrong but I havent been able to find anything else that causes this besides us calling the react-pdf library.
While running react-pdf in a node server,
Every time a PDF is rendered, the memory used by the app increases and never comes back down to what it was before the rendering occurred.

To Reproduce
Steps to reproduce the behavior including code snippet (if applies):

  1. watch the memory used by your application that uses react-pdf.
  2. Notice how the memory used by the app increases when a PDF is rendered. and it never comes back down.

You can notice the difference better once you render over 50 or 75 documents.

HERE is a quick and easy way to reproduce it.

clone this project:
https://github.com/Osuriel/react-pdf-test

the readme.me has all the steps to run it and easily monitor it.

Expected behavior
The memory should eventually go back to normal after a request is made and not increase continually.

Screenshots
before rendering pdfs:
image

after rendering 100 pdfs, and waiting for a while:
image

Desktop (please complete the following information):

  • OS: MacOS, Linux
  • "@react-pdf/renderer": "1.6.4"

All 8 comments

don't know if this has anything to do with it... but when running this on the front-end, every time that i generate a PDF, I get three errors as follows: Error: stream.push() after EOF (this is within the BlobProvider component).

I also have a problem, which is probably related to a memory leak. When I'm rendering document to the file for the first time - everything is ok. Any second and next attempt (for example, after changing the list of maped children in Page component) is unsuccesful - there is an error. Looks like the memory is not erased after first rendering.

Child already has a parent, it must be removed first.

The above error occurred in the component:
in VIEW (created by ProjectMap)
in VIEW (created by ProjectMap)
in VIEW (created by ProjectMap)
in PAGE (created by ProjectMap)
in ProjectMap
in DOCUMENT

Uncaught (in promise) abort() at Error
at jsStackTrace (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:443:15)
at stackTrace (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:453:14)
at Object.abort (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:9808:182)
at _abort (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:585:20)
at ge (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:2905:136)
at Vd (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:2443:101)
at ec (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:1928:23)
at kc (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:1951:86)
at Ig (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:3444:37)
at hm (ROOTnode_modules\yoga-layout-prebuilt\yoga-layout\build\Release\nbind.js:4785:228)

I'm using v1.6.7.

Same problem:

  Error: non-error thrown: "abort(5) at Error\n    at jsStackTrace (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:443:15)\n    at stackTrace (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:453:14)\n    at abort (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:9808:182)\n    at mD (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:9705:27)\n    at hk (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:4262:188)\n    at fk (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:4256:186)\n    at cD (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:9685:57)\n    at Bound.eval (eval at buildCallerFunction (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/build/Release/nbind.js:1445:311), <anonymous>:1:51)\n    at Bound.<anonymous> (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/dist/entry-common.js:220:23)\n    at Bound.prototype.(anonymous function) [as setMeasureFunc] (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/yoga-layout-prebuilt/yoga-layout/dist/entry-common.js:133:22)\nIf this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information."
      at Object.onerror (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/koa/lib/context.js:113:40)
      at onerror (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/koa/lib/application.js:159:32)
      at process._tickCallback (internal/process/next_tick.js:68:7)
  ---------------------------------------------
      at Application.callback (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/koa/lib/application.js:140:44)
      at Application.listen (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/koa/lib/application.js:75:43)
      at Object.<anonymous> (/Users/davidecarpini/vas/hit-radar-report-2/src/index.tsx:50:5)
      at Module._compile (internal/modules/cjs/loader.js:778:30)
      at Module._extensions..js (internal/modules/cjs/loader.js:789:10)
      at Object.nodeDevHook [as .js] (/Users/davidecarpini/vas/hit-radar-report-2/node_modules/node-dev/lib/hook.js:61:7)
      at Module.load (internal/modules/cjs/loader.js:653:32)
      at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
      at Function.Module._load (internal/modules/cjs/loader.js:585:3)

This is caused by the exported library functions used in your example does not call the necessary functions to clean up the Yoga layout objects.

Doing a deep dive on this revealed to me that in the memory snapshots, for each request the PDF classes aren't released from memory and their references point to Yoga's measureFunc object. There was an earlier fix for cleaning up these objects in memory in this issue here: #378. As we can see a part of the fix was calling this.layout.unsetMeasureFunc.

The way you're using pdf(<PDFDocument />).toBuffer() in the example bypasses the cleanup phase.

What we're supposed to use if we want to render the PDF into a buffer is ReactPDF.renderToStream which does properly call the cleanup functions as you can see from these lines of code below:

const renderToStream = async function (element) {
  const instance = pdf(element);
  const buffer = await instance.toBuffer();
  instance.container.finish(); // This cleans up objects from memory
  return buffer;
};

With that line of code changed, there is no more memory leak.

@diegomura This method is not listed anywhere in the documentation however so we should add that so that scenarios like this are avoided in the future.

Hello @diegomura , first thank you for your work !

I am using the library to generate a pdf file with dynamic content, extract to a web magazine.
First, I try to generate it in front-end, but it was really slow, and it blocked the front-end.

So, I decided to generate it in backend, with a worker process.
I am using Bull library to queue the process, Socket.io to tell the browser client when pdf is processed, and Google Cloud Storage to store the file.

Describe the bug
When it process in Heroku Worker Dyno, the memory increase, and never comes back down.

Worker code

`let workers = process.env.WEB_CONCURRENCY || 1;
let maxJobsPerWorker = 10;

function start() {

let workQueue = new Queue('generatePDF', REDIS_URL);

// page numbers array
let articlesPage = [];

// page numbers first render
let setArticlesPage = (datas) => {
    articlesPage = datas
};

workQueue.process(maxJobsPerWorker, async (job, done) => {

    try {

       // mkdir if not exist yet
        mkdirp(`${__dirname}/pdfs/${job.data.magazine_slug}`, function (err) {
            if (err) console.error(err);
        });

        console.log("-- get page numbers render --");

        await ReactPDF.renderToStream(
            <DocumentPDF numero={ job.data.numero } magazine={ job.data.magazine } articlesPage={ job.data.articlesPage }
                         magazineObj={ job.data.magazineObj } url={ job.data.url }
                         setArticlesPage={ setArticlesPage }
            />
        );

        console.log("-- Save it as a file render with page numbers (for summary) --");

        await ReactPDF.render(
            <DocumentPDF numero={ job.data.numero } magazine={ job.data.magazine }
                         magazineObj={ job.data.magazineObj }
                         url={ job.data.url }
                         setArticlesPage={ setArticlesPage }
                         articlesPage={ articlesPage }
            />,
            job.data.pdf_link
        );

        console.log("-- Upload to GCS --");

        await uploadFile(job.data.pdf_link);
        let file = storage.bucket(bucketName).file(job.data.pdf_name);
        let metaData = await file.getMetadata();
        const gcsURL = metaData[0].mediaLink;

        done(null, {
            pdf_name: job.data.pdf_name,
            pdf_url: gcsURL,
            socketId: job.data.socketId
        });

    } catch(err) {
        console.log("err => ", err);
        done(new Error('Error => ', err));
    }
});

}
throng({ workers, start });`

Screenshots
before rendering pdfs:
Capture d鈥檈虂cran 2020-01-16 a虁 11 12 36

After first render - 1 pdf (110 pages):
Capture d鈥檈虂cran 2020-01-16 a虁 11 13 13

After second render:
Capture d鈥檈虂cran 2020-01-16 a虁 11 13 44

etc.

Heroku RAM limit : 512MB

It's like renderToStream and render functions store datas in memory, but not flush it after process.

Environments

  • "@react-pdf/renderer": "^1.6.8",
  • "bull": "^3.12.1",
  • "throng": "^4.0.0"
  • Heroku Worker Dyno
  • Node 12.14.1
  • npm 6.13.4

This is caused by the exported library functions used in your example does not call the necessary functions to clean up the Yoga layout objects.

Doing a deep dive on this revealed to me that in the memory snapshots, for each request the PDF classes aren't released from memory and their references point to Yoga's measureFunc object. There was an earlier fix for cleaning up these objects in memory in this issue here: #378. As we can see a part of the fix was calling this.layout.unsetMeasureFunc.

The way you're using pdf(<PDFDocument />).toBuffer() in the example bypasses the cleanup phase.

What we're supposed to use if we want to render the PDF into a buffer is ReactPDF.renderToStream which does properly call the cleanup functions as you can see from these lines of code below:

const renderToStream = async function (element) {
  const instance = pdf(element);
  const buffer = await instance.toBuffer();
  instance.container.finish(); // This cleans up objects from memory
  return buffer;
};

With that line of code changed, there is no more memory leak.

@diegomura This method is not listed anywhere in the documentation however so we should add that so that scenarios like this are avoided in the future.

I am experiencing this memory leak using renderToStream with that line included. I think there may be an additional cause. It seems to only really affect me when the pdf's I am rendering exceed a certain size.

@will-evers So after implementing the proposed fix in our production application, we also still see leaks. So I believe you are correct, while that change certainly helps in this example, there is still a problem left unsolved in regards to the memory leak.

@willywill it looks like the Root component is never calling its finish function so its never cleaning up its children, only its sub pages. My colleague, @antoine1anthony is testing it out, but that seems like a likely candidate for a memory leak. Especially since the leak is made worse by the larger the pdf. more children > more memory

Edit: that seemed to be misinformed. I did notice a current open issue for pdfkit that has similar symptoms so it could be related to that? https://github.com/foliojs/pdfkit/issues/1081

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brandly picture brandly  路  3Comments

kishaningithub picture kishaningithub  路  4Comments

serkyen picture serkyen  路  4Comments

pavle-lekic-htec picture pavle-lekic-htec  路  4Comments

redcranesolutions picture redcranesolutions  路  4Comments