Sheetjs: xlsx can't read huge files of 90MB

Created on 14 Jun 2018  路  12Comments  路  Source: SheetJS/sheetjs

code:
console.log(1)
const workbook = XLSX.read(buffer, {
type: "buffer",
WTF: true
// cellDates: 'd'
});
console.log(2)

buffer size 24MB , xlsx parse buffer fine and log 1 and 2. but buffer size 90MB, xlsx cant't read buffer and only log 1 , it likes to stop at parsing xlsx files , and don't throw an error.

Most helpful comment

You can use this library to read large XLSX files - https://github.com/Claviz/xlstream

All 12 comments

i firstly read .xlsx file into buffer .

can you provide code ?

    const readable = fs.createReadStream('myExcel.xls');
    process_RS( readable, writeToDatabase);

    function process_RS(stream,callback) {

    var buffers = [];
    stream.on("data", function (data) {
         buffers.push(data);
    });

    stream.on("end", function(){

        let buffer = Buffer.concat(buffers);
        const workbook = XLSX.read(buffer, {
              type: "buffer",
             WTF: true
        });

        const sheetNames = workbook.SheetNames;
        const worksheet = workbook.Sheets[sheetNames[0]];
        const result = XLSX.utils.sheet_to_json(worksheet);
        callback(result);
  });
}

i want to parse excel file to json then i write it to database( influxdb ). i have 97 files , i try many times, if file size is not huge , my program runs normally and my database can get data. but if file size is huge like 50MB or larger, my program will stop at " const workbook = XLSX.read(buffer, {type: "buffer",WTF: true);".

i set node --max_old_space_size=8000.

This is Because , You are reading data is chunks which is true , But again reading whole Buffer at once at the Stream.on('end') which is equals to Normal file Reading , Means Not Support to huge files.
Try to do all this things in stream.on('data') function.

But the API document explain there is no Streaming Read API and say : 'The most common and interesting formats (XLS, XLSX/M, XLSB, ODS) are ultimately ZIP or CFB containers of files. Neither format puts the directory structure at the beginning of the file: ZIP files place the Central Directory records at the end of the logical file, while CFB files can place the storage info anywhere in the file! As a result, to properly handle these formats, a streaming function would have to buffer the entire file before commencing. That belies the expectations of streaming, so we do not provide any streaming read API.'

and Official gives a example code When dealing with Readable Streams as below:

    var fs = require('fs');
    var XLSX = require('xlsx');
    function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
    var buffers = [];
    stream.on('data', function(data) { buffers.push(data); });
    stream.on('end', function() {
    var buffer = Buffer.concat(buffers);
    var workbook = XLSX.read(buffer, {type:"buffer"});

    /* DO SOMETHING WITH workbook IN THE CALLBACK */
    cb(workbook);
  });
}

i think we can't parse data in stream.on('data') , js-xlsx can't get the excel header without whole file. mybe my thought is wrong.

my node version is v8.11.1, js-xls version is 0.8.22,my computer has 8GB memory .
my code did can't deal with huge files such as 90MB or larger, if someone has a good proach.

Long story short: the document-based approach taken here does not scale to very large files.

@dreamFlyingCat as stated in that comment, XLSX is a ZIP-based format. So to get at the data, we need to perform an unzip operation and have random access to the individual files within the ZIP (it's possible to stream each individual file, but we need to be able to process them in a specific order, starting from the manifest file, in order to properly understand the spreadsheet). Any sort of alleged streaming-based tool that works in the browser has to do a level or two of buffering behind the scenes.

There are insurmountable limits (a 90MB file can sometimes hide multi-GB files) that would require platform-specific features and would break compatibility with older browsers and environments.

We offer a (not currently open source) different approach in a Pro version performance build which has node-specific features (offloading hard work to a child process and processing row by row), sacrificing compatibility for performance.

Same issue on my side - opening a large XLSX file (110mb) are causing this error:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

How we can solve it in NodeJS?

Long story short: the document-based approach taken here does not scale to very large files.

@dreamFlyingCat as stated in that comment, XLSX is a ZIP-based format. So to get at the data, we need to perform an unzip operation and have random access to the individual files within the ZIP (it's possible to stream each individual file, but we need to be able to process them in a specific order, starting from the manifest file, in order to properly understand the spreadsheet). Any sort of alleged streaming-based tool that works in the browser has to do a level or two of buffering behind the scenes.

There are insurmountable limits (a 90MB file can sometimes hide multi-GB files) that would require platform-specific features and would break compatibility with older browsers and environments.

We offer a (not currently open source) different approach in a Pro version performance build which has node-specific features (offloading hard work to a child process and processing row by row), sacrificing compatibility for performance.

@SheetJSDev I have tried to reach you about this PRO version (http://sheetjs.com/pro) sending an email, but I had no answer. How to get it?

You can use this library to read large XLSX files - https://github.com/Claviz/xlstream

@jansivans i tried the library its showing fs.open is not a function ,my file path is correct but still im facing the error
do you have any solution using xlsx npm package ?

@rajatjasuja23 please open issue there and provide some steps how to reproduce your issue.

im reading 20k rows using this package in react js.
file size is 12mb im just having one sheet in excel
the browser will crash after some time
How we can solve it in react JS?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dullin picture dullin  路  3Comments

jamespan0 picture jamespan0  路  3Comments

magtuan picture magtuan  路  3Comments

sangpuion picture sangpuion  路  3Comments

lxzhh picture lxzhh  路  3Comments