Hi
Something is wrong when trying to save a dataframe with tz-aware timestamps to xlsx, using pd.Dataframe.to_excel:
df = pd.DataFrame([1], index=[pd.Timestamp('2014-05-02', tz='CET')])
df.to_excel('test.xlsx')
yields an exception on my system:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-4e18a4be2a71> in <module>()
----> 1 df.to_excel('test.xlsx')
/home/silvio/prod34/lib/python3.4/site-packages/pandas/core/frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, cols, header, index, index_label, startrow, startcol, engine, merge_cells)
1202 formatted_cells = formatter.get_formatted_cells()
1203 excel_writer.write_cells(formatted_cells, sheet_name,
-> 1204 startrow=startrow, startcol=startcol)
1205 if need_save:
1206 excel_writer.save()
/home/silvio/prod34/lib/python3.4/site-packages/pandas/io/excel.py in write_cells(self, cells, sheet_name, startrow, startcol)
771 wks.write(startrow + cell.row,
772 startcol + cell.col,
--> 773 cell.val, style)
774
775 def _convert_to_style(self, style_dict, num_format_str=None):
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
55 if len(args):
56 int(args[0])
---> 57 return method(self, *args, **kwargs)
58 except ValueError:
59 # First arg isn't an int, convert to A1 notation.
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write(self, row, col, *args)
374 # Write datetime objects.
375 if isinstance(token, date_types):
--> 376 return self.write_datetime(row, col, *args)
377
378 # Write number types.
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
55 if len(args):
56 int(args[0])
---> 57 return method(self, *args, **kwargs)
58 except ValueError:
59 # First arg isn't an int, convert to A1 notation.
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write_datetime(self, row, col, date, cell_format)
666
667 # Convert datetime to an Excel date.
--> 668 number = self._convert_date_time(date)
669
670 # Add the default date format.
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in _convert_date_time(self, dt_obj)
3265 def _convert_date_time(self, dt_obj):
3266 # Convert a datetime object to an Excel serial date and time.
-> 3267 return datetime_to_excel_datetime(dt_obj, self.date_1904)
3268
3269 def _options_changed(self):
/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/utility.py in datetime_to_excel_datetime(dt_obj, date_1904)
577
578 # Convert a Python datetime.datetime value to an Excel date number.
--> 579 delta = dt_obj - epoch
580 excel_time = (delta.days
581 + (float(delta.seconds)
/home/silvio/prod34/lib/python3.4/site-packages/pandas/tslib.cpython-34m.so in pandas.tslib._Timestamp.__sub__ (pandas/tslib.c:11918)()
TypeError: can't subtract offset-naive and offset-aware datetimes
Using Python 3.4.0 on Arch Linux, list of installed packages:
~% pip freeze
Cython==0.20.1
Jinja2==2.7.2
MarkupSafe==0.21
Pygments==1.6
SQLAlchemy==0.9.4
XlsxWriter==0.5.3
ecdsa==0.11
ipdb==0.8
ipython==2.0.0
ipython-sql==0.3.1
lxml==3.3.5
matplotlib==1.3.1
mysql-connector-python==1.1.6
nose==1.3.1
numpy==1.8.1
openpyxl==1.8.5
pandas==0.13.1
paramiko==1.13.0
patsy==0.2.1
prettytable==0.7.2
psycopg2==2.5.2
pycrypto==2.6.1
pyodbc==3.0.7
pyparsing==2.0.2
python-dateutil==2.2
pytz==2014.2
requests==2.2.1
scipy==0.13.3
six==1.6.1
sqlparse==0.1.11
statsmodels==0.6.0
tornado==3.2
xlrd==0.9.3
cc @jmcnamara
does excel support tz's in dates even?
not sure what this should do
does excel support tz's in dates even?
No. Excel doesn't have any support for timezones.
I don't really know how this should be handled.
ok....maybe should just report a better error for now
@sboehler what do you think it should do?
I'd expect either localized or UTC timestamps (both without tz information)
to be written to the file. Personally I vote for localized timestamps, as
the user by localizing his series explicitly opted for a certain timezone,
and this should be respected. If one wanted UTC output, a simple
tz_localize of the data before writing out solved the issue.
Agree?
2014-05-06 18:36 GMT+02:00 jreback [email protected]:
ok....maybe should just report a better error for now
@sboehler https://github.com/sboehler what do you think it should do?
—
Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/7056#issuecomment-42326246
.
that makes sense, so choices I think are:
convert_timezones='local'|'utc' keyword; works but is 'silent', and could be ambiguous if you have multiple-columns/indicies that may need conversionor:
The global "convert_timezones" argument is problematic with several columns of timestamps - a user might want to output multiple columns with local timestamps and UTC timestamps to address Excels lack of tz handling.
ok...
@sboehler so this is pretty straightfoward....PR?
I just noticed, the bug does not occur when I uninstall xlsxwriter! Pandas must be internally choosing the xlsx engine, depending on what's available in terms of packages, and use something else if xlsxwriter is not present.
The default seems to be to output the localized timestamp, omitting the timezone info.
@jreback As you can see from my github profile I have no experience with developing pandas, but I am happy to give it a go. This could take a few days, however...
you can specify engine='openpyxl' to get the same behavior (yes , the default is xlsxwriter as its much faster)
give it a go and let us know any issues
from #7060
That sounds right, but then the error message is not informative and it should still be possible to
output the dates as a text string with tz offset, the same way they appear when they output to the
terminal. MIgrating pandas types into excel types when outputting is nice but it's not a requirement.
It's not only a xlsxwriter problem actually, xlwt and xlsxwriter fail with this error and openpyxl
creates the file but throws away the timezone information so it actually silently corrupts the data.
This is a little broken all around.
to clarify here, @kay1793 what would an API for this be like?
e.g. need to have 2 ways of converting a timezone-aware Series/Index (e.g. raise if the underlying engine does not support, which most/all doen't ATM, or stringify to a isoformat).
I can't say about the API, stringify to iso would work perfect for my needs and can still be sorted. Coercing to UTC would also be ok if there was a big fat warning came with it.
using
engine='openpyxl'
creates the excel file. But all times have the same hour:
_01:00:00_
Could be an intermediate solution to add the warning and a note on the docs?
Hello everyone,
delta = dt_obj - epoch
A solution to this bug would be to localize epoch to the timezone of dt_obj, so that the timedelta is correct whatever the time zone is, and time-zoned datetimes can be handled, even if there's no notion of timezone in the resulting xlsx file.
what do you think about this ?
(this bug also appears with normal columns that are not indexes)
@misterjoa you actually just want _naive_ zones, IOW, .tz_localize(None).
I maybe _could_ see an option .to_excel(...., tz='local|utc') which would either by default localize or .tz_convert('UTC').tz_localize(None) for tz-aware (which we typically do in other situations, e.g. when going to a numpy array).
@jreback
thank you for your answer.
In fact, I want the user of an excel file to be shown the correct date and time in an arbitrary time zone.
To do so and avoid the current issue, I do the following before exporting to xlsx using xlsxwriter, using a column with strings containing UTC timestamp data :
(Paris is the target zone)
df['quote_calculation_time'] = pd.to_datetime(
df['quote_calculation_time'],utc=True) \
.dt.tz_localize('utc') \
.dt.tz_convert('Europe/Paris') \
.dt.tz_localize(None)
So having an option like you say would be really great
Most helpful comment
@jreback
thank you for your answer.
In fact, I want the user of an excel file to be shown the correct date and time in an arbitrary time zone.
To do so and avoid the current issue, I do the following before exporting to xlsx using xlsxwriter, using a column with strings containing UTC timestamp data :
(Paris is the target zone)
So having an option like you say would be really great