Creating this issue as recommended in #832.
Here is a jsfiddle to witness the problem: https://jsfiddle.net/gtt2srcu/
I have some problems with the datenum function, not behaving correctly when the v is not in the same timezone offset as the current time (ex. current time has DST offset and v does not have DST offset). After playing around with the code, I believe the offsetting needs to be dependent on the v value instead of current time, like this:
var basedate = new Date(1899, 11, 30, 0, 0, 0);
var basetimestamp = basedate.getTime();
var baseoffset = basedate.getTimezoneOffset();
function datenum(v/*:Date*/, date1904/*:?boolean*/)/*:number*/ {
var epoch = v.getTime();
if(date1904) epoch -= 1462*24*60*60*1000;
// Account for offset differences between the basedate and v...
var adj = (v.getTimezoneOffset() - baseoffset) * 60000;
return (epoch - basetimestamp - adj) / (24 * 60 * 60 * 1000);
}
This is because if basedate and v have a different offset, we need to compensate this effect by applying the difference on the numerical value provided to excel. The offset of the current time does not seem relevant.
It's possible that the numdate function suffers from the same problem, but I haven't digged through it.
It's actually apparent from reading raw files. Remove the .zip extension from the linked files:
$ node -pe "require('xlsx').readFile('dates.xlsx', {cellDates:true}).Sheets.Sheet1.A2"
{ t: 'd', v: 2017-02-01T04:00:00.000Z, w: '2017-02-01 00:00:00' }
$ node -pe "require('xlsx').readFile('dates.xlsb', {cellDates:true}).Sheets.Sheet1.A2"
{ t: 'd', v: 2017-02-01T05:00:00.000Z, w: '2017-02-01 00:00:00' }
@ebmike Looking back at the other issue, I don't think the concern was properly explained. There's a fundamental complexity -- transitioning between Excel's concept of local time and JS concept of universal time -- that has to be confronted somewhere. This isn't an issue when dealing with a single timezone, but you have situations where clients in Los Angeles are downloading data from servers in New York. There are three logical paths forward:
Placing the complexity in the raw JS object is arguably the best, since it's neatly tucked away from most use cases, but it complicates code working directly with the worksheet objects. (This is my personal preference)
Placing the complexity in utility interfaces like aoa_to_sheet (more knobs to control what types of Dates you are passing)
Forget Excel's behavior and just treat dates as UTC throughout. This would push timezone awareness to client code and possibly break expectations for what people will see in Excel, but arguably make for the simplest user code.
Thoughts?
I just want to underline that my main use case for now is writing to workbooks (a data export feature). The data I'm working with is provided by the application as a javascript Date instance, that is essentially a timestamp.
As expected, there are cases where sheetjs cannot infer my intentions when converting the Date objects. The library needs to convert the value to a local time, but it has many choices:
Date object is aware of timezone offsets and daylight saving time rules, so we are able to find the numerical value representing a specific timestamp in this timezone. If the recipient of the Excel file is in the same timezone as the machine that generates the file, there shouldn't be any surprises.Date can be offset to utc (by reversing the timezone + DST offsets). This approach would always produce the same output on any machine, but the recipient of the Excel file needs to know dates are in UTC.Date object could be translated to a local time in another timezone. This would allow us to generate a file for a remote user (Los Angeles vs. New York case). I don't expect sheetjs to do this kind of conversion out-of-the-box.I believe this library has made a sensible choice of defaulting to the first approach by default (ex. when using utilities like aoa_to_sheet()). When either the second or third approach is needed, the developer needs to work directly with the cell object and calculate the numerical value in the appropriate timezone.
Just to recap, my issue here is that I think I found a flaw in the current default behavior (first approach), where the DST offset is not taken into account properly.
@SheetJSDev Maybe add a third date "type" which is just an iso-date string. Not everyone needs a "time" on their dates. In fact, for our code, every datetime we encounter is broken into 2 excel columns (for date and time).
To respond to the server timezone vs client timezone comment above.... yes, thats the problem we were encountering that become much harder after the change noted in #832. I did not bring it up in the issue because I didn't think it was relevant at the time, and the issues noted in #832 would be obvious enough.
I believe I'm also being impacted by this.
I'm using XLSX.utils.table_to_book on an HTML table with dates expressed in plain text in cells as MM/DD/YYYY strings. When saving the file to XLS, the date string value is converted to a number which can be formatted in Excel as a date (but that formatting is not there initially). As an example, 10/27/2017 in my HTML table is converted to 43034.95833 when the expected value should be 43035.
After assigning the Short Date format in Excel on 4304.95833, Excel reports 10/26/2017 11:00:00 PM
When saving the table to XLSX, the columns including dates automatically have the Short Date formatting applied. And the date value in Excel is the same as above: 10/26/2017 11:00:00 PM
In all cases, the date is technically off by one hour, but only needing MM/DD/YYYY, the net effect is that all dates are off by a full day.
I think I run into the same issue with numdate, when I read a file with cellDates: true.
In my xlsx I have the following date: 01/08/18 (JJ/MM/AA), which is converted to "Wed Aug 01 2018 01:00:00 GMT+0200 (CEST)".
But when I instanciate a new date manually: new Date(2018, 7, 1), I have "Wed Aug 01 2018 00:00:00 GMT+0200 (CEST)", which actually is the result I expect.
@SheetJSDev Maybe add a third date "type" which is just an iso-date string. Not everyone needs a "time" on their dates. In fact, for our code, every datetime we encounter is broken into 2 excel columns (for date and time).
To respond to the server timezone vs client timezone comment above.... yes, thats the problem we were encountering that become much harder after the change noted in #832. I did not bring it up in the issue because I didn't think it was relevant at the time, and the issues noted in #832 would be obvious enough.
@SheetJSDev This is actually an interesting question. Didn't find a topic about this; is this discussed somewhere? Actually a iso-date string type sounds really useful to me. I went back and used strings instead of dates, because the value of the cell would still always contain the time.
edit: well, It looks like if you set the time to 00:00:00 the time is omitted. So the only thing you need to keep in mind, that the date after sheetjs is done parsing has the time 00:00:00
Having same problem as @pmalouin - I download some time series data from server, present it to user displaying dates in local time, and provide the option for the user to "download" the data as xlsx format. In the resulting xlsx dates are shifted by the difference between the current time zone offset and the time zone offset of the date itself.
@pmalouin's solution worked for me.
@SheetJSDev I am also impacted by this. I cannot read my dates.
"2019-1-31" in excel becomes "2019-1-30 23:00:00" when I read it from node.
I think this library is interpreting the "local" date as CET (GMT-1h) but it is now CEST(GMT-2).
I agree with previous statements that this is a very complex issue: dates are simply broken.
And it is all coming from Excel. The internal type they use is simply incomplete.
My thought is that the current approach, even if it managed to work out properly the actual "local" timezone including DST issues it might not fit all users.
It is not unthinkable that someone might want to open the excel in the timezone in which it was produced. Think, for example, consolidation of billing events in a global company. It does not matter where you open the excel, the important thing is were it was created.
I think it should be possible to pass a parsing parameter with the timezone to use for dates in the spreadsheet. If not provided, the local timezone should be used.
This is more or less what we have here, but would also allow to manually fix some edge cases, including specific timezone bugs.
Buy the way, I believe the EU is considering now ditching the whole summer time:
So, this kind of issue is going to be difficult to maintain.
Any news with this issue? i think #1333 should be merged
Most helpful comment
@SheetJSDev Maybe add a third date "type" which is just an iso-date string. Not everyone needs a "time" on their dates. In fact, for our code, every datetime we encounter is broken into 2 excel columns (for date and time).
To respond to the server timezone vs client timezone comment above.... yes, thats the problem we were encountering that become much harder after the change noted in #832. I did not bring it up in the issue because I didn't think it was relevant at the time, and the issues noted in #832 would be obvious enough.