Sheetjs: Wrongly parsing csv string cell as date

Created on 3 Jan 2018 · 11Comments · Source: SheetJS/sheetjs

When reading a CSV string, some fields that should be parsed as strings are parsed as dates.

This happens when the string ends with something that resembles a date.

Use the following snippet to reproduce.

var foo = 'foo,bar\nfoo2,bar2\n"Wololo thing - ends Feb. 15, 2017","Wololo thing - ends Feb. 15, 2017 bar"';
var w = XLSX.read(foo, {type: "string", cellDates: true});
console.log(w.Sheets.Sheet1);

Expected result: the third row contains 2 strings:

{t: "s", v: "Wololo thing - ends Feb. 15, 2017"}
{t: "s", v: "Wololo thing - ends Feb. 15, 2017 bar"}

Actual result: the third row contains

{t: "d", v: Wed Feb 15 2017 00:00:00 GMT-0200 (-02), w: "2/15/17"}
{t: "s", v: "Wololo thing - ends Feb. 15, 2017 bar"}

Dates Read Bug

Source

filipelbc

👍8

Most helpful comment

I'm using xlsx.read(foo, { type: 'string', raw:true }); and I'm getting the dates in the expected plain text format

speedofsound99 on 22 May 2018

🎉8 👍6

All 11 comments

For plaintext formats like CSV and HTML, the current approach for determining the data type is testing whether the string represents a valid Date and whether the month is mentioned. V8 (chrome/nodejs) is extremely lenient in this regard. You can test in the JS console:

> new Date("Wololo thing - ends Feb. 15, 2017")
Wed Feb 15 2017 00:00:00 GMT-0500 (EST)

Realistically, we'll have to pick a small set of explicit formats to test against.

SheetJSDev on 4 Jan 2018

@brollins90 the plaintext (HTML/DSV) date resolution is in the fuzzydate function. At a high level, a string is converted to a Date and the function checks for some known corner cases. It also inspects the string to see if it's likely to be a date.

The guess is trickier because Excel lets you mix text with date formats. For example, the string Today is Jan 25 2018 can be reproduced by setting a cell to 43125 with format "Today is "mmm dd yyyy. Should the string be considered a date or a string?

SheetJSDev on 25 Jan 2018

For example, the string Today is Jan 25 2018 can be reproduced by setting a cell to 43125 with format "Today is "mmm dd yyyy. Should the string be considered a date or a string?

This seems crazy to me! Keep it as a string of course!!!

This behavior should be disabled on request. I'm importing a CSV (tried with xls too) to perform corrections and it keeps on converting values like "VIA MARTUCCI 2" to the date "3/2/01", I tried using raw: true but it simply uses the number "36952". Only with xlsx files it doesn't convert these values.

mcdado on 14 Feb 2018

👍9

I am having the same issue here. A column with IDs like "0002-001-1" gets imported as "2/1/01". I also tried using "raw: true" but I get a weird value: "36892.041666666664"

Is there no way around this?

johnlop on 18 May 2018

👍1

I'm using xlsx.read(foo, { type: 'string', raw:true }); and I'm getting the dates in the expected plain text format

speedofsound99 on 22 May 2018

🎉8 👍6

I'm running into this problem when using sheet_to_json and it doesn't appear that I can override to indicate to just bring in the string and not attempt to type it as a n or date. Parses the string fine for xls/xlsx files, but when I do a csv it casts to n and takes Dec 20 => 12/20/2001.

creedmangrum on 13 Mar 2019

👍1

To those who are getting weird number for the date row when using sheet_to_json function,
remember to include the raw: false option in sheet_to_json

const workbook = excel.readFile(file.path);

const sheet_name_list = workbook.SheetNames;
const json = excel.utils.sheet_to_json(workbook.Sheets[sheet_name_list[0]], {
  raw: false,
});

d30jeff on 2 Apr 2019

👍4 👎1

raw: false

solved for this

sibelius on 30 Jan 2020

👍2 👎1

@SheetJSDev there seems to be a lot of issues with dates and the cellDates option. I think the major issue here is that cellDates causes sheetjs to try and parse everything as a date, including guessing for string cells.

I think this behaviour is wrong. If we're reading in files with string dates (especially CSV files) then it's likely that we know best how to parse them, so leave them alone. But if we're reading in an xlsx file then we need sheetjs to convert that stupid number format into a date for us.

We only have two options at the moment: parse nothing or parse everything. But what we want is to parse everything that's clearly marked as a date (in the type information, not by any guessing), and leave everything else alone.

This should be the default behaviour. If we want sheetjs to also try to parse strings into dates we should have to specify an option for that (although I don't see the point of complicating sheetjs with such an option when it's easier to just do this afterwards). If we want to get the raw values for those silly number dates, then we should have to specific an option for that (either raw or a separate option).

There are two key points here:

SheetJS should always stick to the specified type. If it's typed as a string, leave it as a string. If it's typed as a number, leave it as a number. If there is no type information, leave it alone! Only if it's specifically typed as a date should any conversion be done.
We shouldn't need to call extra utility functions like xlsx.SSF.parse_date_code unless there was something wrong with the type information or we specifically requested raw mode.

Edit: Apparently this guessing happens even with cellDates: false. You have to set raw: true to get it to stop guessing, but then you get strings for everything, including numbers (when parsing a CSV file)

SystemParadox on 7 May 2020

Excel does its own date and value parsing from CSV. To confirm this, save the following to a CSV file and open in Excel:

2020-05-07,=YEAR(INDIRECT("A"&ROW()))
="2020-05-07",=TYPE(INDIRECT("A"&ROW()))
"2020-05-07",=TYPE(INDIRECT("A"&ROW()))
2020,=TYPE(INDIRECT("A"&ROW()))
FALSE,=TYPE(INDIRECT("A"&ROW()))
=NA(),=TYPE(INDIRECT("A"&ROW()))

Cells A1 and A3 are interpreted as dates, while cell A2 is kept as text. In fact, you'll notice that the expressions starting with "=" are actually interpreted as formulae rather than text!

raw: true lets you skip the whole value parsing logic, giving you the original text that you are free to interpret however you wish.

The root of the problem in this issue is the flexibility of V8 in date parsing. Consider:

new Date("How many days does it take for Mars to go around the sun? 687")

We can probably agree that string isn't a date, but V8 thinks it is Tuesday March 1 in the year 687

SheetJSDev on 7 May 2020

Ok so with raw:true, cellDates: true it seems to work how I expect, with the caveat that I also have to parse numbers myself.

I'm really confused by the folks saying they have to use raw: false - that's surely the opposite of what you want here?

SystemParadox on 24 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Is it possible to format a column with a currency when using json_to_sheet?

DannyRyman · 3Comments

stream.to_json never ends

Alex0007 · 3Comments

Webpack jszip warning

jamesbillinger · 4Comments

Generate excel with styles in javascript

magtuan · 3Comments

XLSX.utils.sheet_to_json returning empty array

thomasledoux1 · 3Comments