Node: fs.readFileSync can't return a string for a big file

Created on 6 Nov 2016 · 17Comments · Source: nodejs/node

Version: '7.0.0',
Platform: Windows 7 x64
Subsystem: fs, buffer

If I try to read a big file (582,170,692 bytes, ~ 555 MB) into a buffer, it is OK. If I add an encoding and try to get a string, I get an error.

> require('fs').readFileSync('ru-ru_Wiki-2007-01-03.dsl').length
582170692
> require('fs').readFileSync('ru-ru_Wiki-2007-01-03.dsl', 'utf16le').length
Error: "toString()" failed
    at Buffer.toString (buffer.js:513:11)
    at Object.fs.readFileSync (fs.js:511:41)
    at repl:1:15
    at sigintHandlersWrap (vm.js:22:35)
    at sigintHandlersWrap (vm.js:96:12)
    at ContextifyScript.Script.runInThisContext (vm.js:21:12)
    at REPLServer.defaultEval (repl.js:313:29)
    at bound (domain.js:280:14)
    at REPLServer.runBound [as eval] (domain.js:293:12)
    at REPLServer.<anonymous> (repl.js:513:10)

It seems the string does not exceed the Spec limit. Is there any other undocumented (or documented in other places) limits for the fs.readFileSync() or Buffer.toString()?

Source

vsemozhetbyt

👍5

Most helpful comment

Just in case anyone else finds themselves at this issue from Google, I ran into this while trying to synchronously (no readline, streams, etc.) read a 400mb JSON file line by line.

As suggested, I used raw buffers to solve this aided by the buffer-split package.

var bsplit = require('buffer-split');

function readLineJSON(path) {
  const buf = fs.readFileSync(path); // omitting encoding returns a Buffer
  const delim = Buffer.from('\n');
  const result = bsplit(buf, delim);
  return result
    .map(x => x.toString())
    .filter(x => x !== "")
    .map(JSON.parse);
}

AlJohri on 15 Feb 2017

👍5 🎉1

All 17 comments

I've found the de facto limit for the current v8: 268,435,440 characters (Math.pow(2, 28) - 16), 536,870,880 bytes in UTF16.

This test code is OK:

const fs = require('fs');

fs.writeFileSync('bigfile.txt', `\uFEFF${'*'.repeat(Math.pow(2, 28) - 16 - 1)}`, 'utf16le');

console.log(fs.readFileSync('bigfile.txt', 'utf16le').length);

If I add just one character, it throws the error. Should it be documented somewhere?

vsemozhetbyt on 6 Nov 2016

FWIW the limit comes from here. ChakraCore uses a much different value that is dependent upon the value of INT_MAX (on my system that would be 2147483646 -- which is ~10x larger than V8's static limit). With that in mind, I'm not sure how useful it is to document a VM-specific limit like this...

mscdex on 6 Nov 2016

👍2

I think the docs recommendation should be (if it is not already) to use raw Buffers for "any very large data".

Fishrock123 on 6 Nov 2016

👍2

(iirc toString() on that size is not exactly trivial?)

Fishrock123 on 6 Nov 2016

Just in case anyone else finds themselves at this issue from Google, I ran into this while trying to synchronously (no readline, streams, etc.) read a 400mb JSON file line by line.

As suggested, I used raw buffers to solve this aided by the buffer-split package.

var bsplit = require('buffer-split');

function readLineJSON(path) {
  const buf = fs.readFileSync(path); // omitting encoding returns a Buffer
  const delim = Buffer.from('\n');
  const result = bsplit(buf, delim);
  return result
    .map(x => x.toString())
    .filter(x => x !== "")
    .map(JSON.parse);
}

AlJohri on 15 Feb 2017

👍5 🎉1

@vsemozhetbyt … is there anything here you’d like to see? Would you want to open a docs PR yourself?

addaleax on 30 Apr 2017

I have no definite opinion what should be added and in what way. It seems there is no consensus if we should document engine-specific limits. So feel free to close till any new decisions)

vsemozhetbyt on 30 Apr 2017

We should certainly improve the error messages:

#define SB_STRING_TOO_LONG_ERROR \
  v8::Exception::Error(OneByteString(isolate, "\"toString()\" failed"))

Edit: @addaleax Just noticed your comment in the code. I could not find an open issue for this, is there? Any specific reason this has not been changed yet?

tniessen on 2 Jun 2017

I could not find an open issue for this, is there? Any specific reason this has not been changed yet?

@tniessen No, not beyond the discussion in https://github.com/nodejs/node/pull/12765. The reason this has not been changed yet is that since it’s semver-major it would target Node 9, which gives plenty of time, and the fact that at some point we’re going to have to go through our native errors anyway to upgrade them to the new error code system. (Also, most of the ToDos from that PR might be suitable for first-time contributions from people with a C++ background.)

addaleax on 2 Jun 2017

In which version of Node we should expect huge files to be supported?

Extarys on 12 Aug 2018

@Extarys As per this blog post, max String length was increased in V8 6.2, ie the last Node.js LTS version (08.11.3) already supports them.

vsemozhetbyt on 12 Aug 2018

🎉2

To be more exact: the new limit is mentioned in the "Increased max string length" section, ie 2**30 - 25 in 64-bit OS. It is 1 073 741 799 code units, or near 1 GB in ASCII or near 2 GB in UTF-16 LE (UTF-8 limit is less predictable, but near 1 GB should be OK at least).

vsemozhetbyt on 12 Aug 2018

👍1

Thanks for this update! I love that when importing big logs or something.

Extarys on 14 Aug 2018

👍1

Sorry to ping on a closed thread, but I can't Google my way out of asking this: How do I set the buffer.constants.MAX_STRING_LENGTH to the new maximum? The docs say:

<integer> The largest length allowed for a single string instance.
Represents the largest length that a string primitive can have, counted in UTF-16 code units.

This value may depend on the JS engine that is being used.

but I'm not familiar with the UTF16 code units or how to use them. Do I just write:

buffer.constants.MAX_STRING_LENGTH=2**30 - 25

I found this blog post, which uses the 2**30... syntax

loganpowell on 22 Aug 2018

You can't change buffer.constants.MAX_STRING_LENGTH value, it is read-only. You can only use it to retrieve the information about the limit which is set by the JS engine.

vsemozhetbyt on 22 Aug 2018

Ah, thank you for clarifying. Just console.log?

loganpowell on 22 Aug 2018

Or ===, >, < etc comparisons.

vsemozhetbyt on 23 Aug 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings