Sheetjs: How do I read a CSV file?

Created on 29 Oct 2016  Â·  20Comments  Â·  Source: SheetJS/sheetjs

I want to read a CSV file and convert it to JSON. How do I do it with XSLX?

Most helpful comment

+1, annoying to have to use two libs.

All 20 comments

+1

+1

This would be really handy. XLSX.utils.csv_to_json

now I am using csv.js (called comma-separated-values in npm). it works

I am using papaparse for now.

On Thu, Dec 15, 2016 at 2:40 PM, arida notifications@github.com wrote:

now I am using csv.js (called comma-separated-values in npm). it works

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/489#issuecomment-267275503,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADTCHdSzxsRn8MltqvZKyGIuh3_wL-9Eks5rIQQHgaJpZM4KkJ4C
.

+1, annoying to have to use two libs.

CSV is a funny topic. RFC 4180 covers CSV but disagrees with Excel in a few important ways. Given that Excel predates the RFC by nearly 2 decades, I'm surprised the authors didn't make a more concerted effort to agree with Excel. For example, consider the following file:

sep=|
=1+1|2|3
=A1+B1+C1|4|5
=A2+B2+C2|6|7
s,h|e,e|t,js
="1"|="2"|="3"

Most JS CSV parsers follow the spec, so you will get strange results. This example highlights a few issues:

  • The sequence sep=<char> as the first line lets you specify the field separator. Excel does not include this line in the content, whereas an RFC compliant parser would treat that as a one-field row in the file
  • Excel permits formulae in the CSV stream. The =1+1 is calculated and the result is displayed. Excel further allows for formulae to reference other cells.

There are other nonobvious issues like codepage encoding (the mac default is codepage 10000 whereas windows default is 1252) and BOM handling.

I agree that it would be nice to also have support for CSV in this library, but the first step would be to understand exactly how Excel handles CSV files. Stay tuned.

We added a first cut in version 0.9.9 and will slowly improve it in future versions

I see that it looks like I can parse a CSV, but is there a way that I can tell XLSX the delimiter? The file I am using is ; delimited instead of , .

@jnystrom The same as you would in Excel: prepend the data with sep=;\n. In absence of the header, the parser counts the number of exposed semicolons, commas, and tab characters in the beginning of the file to guess the likely delimiter.

I dont have control over the data. This is a file that just a CSV, but separated by semi colons instead of commas. I tried running the file as is, and used sheet_to_json, and it is putting the first row (column names) as the name of the json property, and the entire row of data (with semi colons) as the value to that property. I tried changing to delimit with commas and it seems to work correctly. I have no control of the file, any ideas?

Thanks, that worked. I am using this in Node, and using buffers. My columns do have " and so do the values. Again, this all works if I change it all to be , instead of ; delimited.

Hmm https://runkit.com/embed/3efd6o91zh2j same works in node. Can you share the offending file?

I cannot share the file, but i will try to create a similar one. I did see yours worked. One difference with your code. I am initially unzipping a file, and that file is being processed by streams. I used the sample in this repo to convert stream into buffers:

const buffers = [];
stream.on("data", (data) => {
                buffers.push(data);
            });
            stream.on("end", async () => {
                console.log("got data at end", buffers.length);
                const buffer = Buffer.concat(buffers);

                const workbook = XLSX.read(buffer, { type: "buffer" });
                const sheet = workbook.Sheets[workbook.SheetNames[0]];
                const parsedData = XLSX.utils.sheet_to_json(sheet, {raw: true});
                console.log("parsed Data:", parsedData);
                console.log(`Time it took to parse: ${(Date.now() - t0)}`);
            });

@reviewher here is an example file:
https://1drv.ms/u/s!AgFOdN8fF2TMhqYscT1etb1kshMYpg

Hmm checking against http://oss.sheetjs.com/js-xlsx/ (use the input form to submit the file or drag and drop) it seems to generate the correct output:

Using the xlsx command that ships with a global install of the library, the generated XLSX file looks correct:

$ npm install -g xlsx
$ xlsx -X forPublic.csv

forPublic.csv.xlsx

I am very confused now. I am getting this, when I run it from my program:

pasted image at 2017_12_04 11_26 am

But when I change everything to , it works as expected.

You are probably running an older version -- the semicolon detection was contributed by @duzun and landed in 0.11.6. Can you update and check?

DAMN...you are right. That did seem to fix the issue with that file. THANK YOU!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lxzhh picture lxzhh  Â·  3Comments

m-ketan picture m-ketan  Â·  3Comments

happy0088 picture happy0088  Â·  3Comments

Sankrish picture Sankrish  Â·  4Comments

jamesbillinger picture jamesbillinger  Â·  4Comments