Node: fs.readFile possible memory leak

Created on 11 Jun 2018 · 16Comments · Source: nodejs/node

4.8.6:
Linux as-t430s 4.4.0-67-generic #88-Ubuntu SMP Wed Mar 8 16:34:45 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux:
fs:

fs.readFile seems to have memory leak while reading asynchronously the same file many times
fs.readFileSync seems to have very different behaviour and clean up memory efficiently

data

Image showing issue https://github.com/ip413/node-read-file/blob/master/docs/data.png
Code is here https://github.com/ip413/node-read-file/blob/master/index.js
Excel is here https://github.com/ip413/node-read-file/blob/master/docs/data.ods

The same problem occurs on 9.5.0 node version.

Source

ip413

👍1

Most helpful comment

I have been working with @gireeshpunathil on this. After some investigation, I have found out that this is not an issue at all, but just the way rss works. The resident memory shot up due to the large amount of activities in the program (a thousand concurrent reads), but there wasn’t anything in the program / system to bring it down - i.e, the rss reduces only when the OS swaps out the pages in response to memory demands in other processes.

Here is the proof:
$ cat 21266.js

var fs = require(‘fs’)
function report() {
  global.gc()
  console.log(‘after gc:’)
  setInterval(() => {
    console.log(Math.round(process.memoryUsage()[‘rss’] / (1024 * 1024)))
  }, 10000)
}
function readFile() {
  fs.readFile(‘./foo.txt’, (e, d) => {
    if(e) console.log(e)
  })
}
console.log(`process id: ${process.pid}`)
fs.readFile(‘./foo.txt’, (e, d) => {
  console.log(‘baseline:’)
  console.log(Math.round(process.memoryUsage()[‘rss’] / (1024 * 1024)))
  setTimeout(() => {
    for (var i = 0; i < 1000; i++) {
      readFile()
    }
    report()
  }, 10000)
})

$ cat gc.js

let list2 = []
setInterval(() => {
  let list1 = []
  // make 4K data, aligning to linux page size
  for(var i=0; i< 4096; i++)
    list1.push(‘x’.repeat(4096))
  list2.push(list1)
  list2.forEach((list) => {
    list.forEach((item) => {
      // make sure we ‘touch’ every pages of our memory
      item.indexOf(‘x’)
    })
  })
}, 10)
setInterval(() => {
  console.log(Math.round(process.memoryUsage()[‘rss’] / (1024 * 1024)))
}, 1000)

check the free memory in your system. If it is huge, the above program may need to be modified to create larger arrays / strings; or else we need to wait for a longer period of time.

My system has only ~2GB, so not very bad.

[root@r811 test]# cat /proc/meminfo | grep MemFree
MemFree:         1704748 kB
[root@r811 test]#

run the first program in a terminal, and wait for the rss to grow up and stabilise.

[root@r811 test]# node --expose-gc 21266
process id: 2355
baseline:
33
after gc:
128
128
128
...

then run the second program to eat up all the free memory in the system.

[root@r811 test]# node --max-old-space-size=10000 gc
133
335
577
823
1062
1297
1521
1597
^C

When the consumption approaches the free memory limit, check the status of the first program. It would have come down!

83
2
2

inference: your program causes active memory to shoot up, but the system was relatively too free to reclaim those memory.

There is no bug / misbehavior observed in node.
Hope this helps!
It would be great if you can validate this in your system.

HarshithaKP on 31 Jan 2020

❤4 👍3

All 16 comments

I'm not convinced there is a leak. Since you're kicking off a large number of requests in a tight loop, there are no available threads to service the requests, so they are placed into a queue. The queued tasks themselves consume memory. That is what you're seeing. You should instead measure memory usage after all of the requests have completed.

The synchronous methods perform the work immediately on the main thread and thus do not consume any extra resources, which is why you see stable memory usage there.

mscdex on 11 Jun 2018

FYI, neither Node 4.8.6 nor 9.5.0 are supported version of Node, and you should report issue that affect up-to-date versions of Node. You can have a look on the LTS Release Schedule to see what versions are supported by the Node community, and on the CHANGELOG table to have the list of the last version released for each semver-major (_at the time of writing, supported versions are v6.14.2 (maintenance only), v8.11.2, v9.11.1 (maintenance only) and v10.4.0_).

aduh95 on 12 Jun 2018

@ip413 could you please check if this affects master? You could also try LTS, but I'd be more interested in knowing if master has this too.

ryzokuken on 13 Jun 2018

@aduh95 - yep, sorry for that.
@ryzokuken - will check it

ip413 on 20 Jun 2018

I believe @mscdex's diagnosis is accurate. But we would need the "after requests are completed" data to prove it. I'm guessing this will be an easy-to-clean-up issue if someone wants to collect that data.

davisjam on 25 Aug 2018

@ip413 - is this still an issue?

gireeshpunathil on 2 Jan 2020

@ryzokuken - I haven't tested master, but tested last version of node 12.14.1, so probably master is affected
@gireeshpunathil - yep, I believe it is still an issue

I'm personally no longer interested in this issue, but...

I removed memwatch-next dependency to be able to test also node v12
I checked it once again and my conclusion is the same: there could be some issue with memory during many async read

Conclusions:

issue is almost the same with LTS node v6 and v12
the bigger this file is, the more memory is used
no amount of waiting will clean up memory (I assume that 1 hour is enough, and this is what I've tested)
no "cleanup of variable" takes an effect (undefined, null, delete)

Personally I don't believe is such a big leak... but I don't have explanation of this behaviour. Maybe with normal usage of file everything is fine. Probably @mscdex is right... but I can't prove it.

file-size
one-hour
read
data

ip413 on 15 Jan 2020

I am able to see the issue.

0000000004a34000 67944K rw--- [ anon ]

sections such as this, never gets unmapped.

strace also shows a similar story:

[pid 38519] 1579773078.495439 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
[pid 38520] 1579773078.495481 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
[pid 38519] 1579773078.495489 <... mmap resumed> ) = 0x7f4640000000
[pid 38520] 1579773078.495516 <... mmap resumed> ) = 0x7f4638000000

there is no matching munmap calls.