Pandas: to_excel can't handle index with time zones

Created on 6 May 2014  Â·  17Comments  Â·  Source: pandas-dev/pandas

Hi

Something is wrong when trying to save a dataframe with tz-aware timestamps to xlsx, using pd.Dataframe.to_excel:

df = pd.DataFrame([1], index=[pd.Timestamp('2014-05-02', tz='CET')])
df.to_excel('test.xlsx')

yields an exception on my system:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-4e18a4be2a71> in <module>()
----> 1 df.to_excel('test.xlsx')

/home/silvio/prod34/lib/python3.4/site-packages/pandas/core/frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, cols, header, index, index_label, startrow, startcol, engine, merge_cells)
   1202         formatted_cells = formatter.get_formatted_cells()
   1203         excel_writer.write_cells(formatted_cells, sheet_name,
-> 1204                                  startrow=startrow, startcol=startcol)
   1205         if need_save:
   1206             excel_writer.save()

/home/silvio/prod34/lib/python3.4/site-packages/pandas/io/excel.py in write_cells(self, cells, sheet_name, startrow, startcol)
    771                 wks.write(startrow + cell.row,
    772                           startcol + cell.col,
--> 773                           cell.val, style)
    774 
    775     def _convert_to_style(self, style_dict, num_format_str=None):

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
     55             if len(args):
     56                 int(args[0])
---> 57             return method(self, *args, **kwargs)
     58         except ValueError:
     59             # First arg isn't an int, convert to A1 notation.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write(self, row, col, *args)
    374         # Write datetime objects.
    375         if isinstance(token, date_types):
--> 376             return self.write_datetime(row, col, *args)
    377 
    378         # Write number types.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
     55             if len(args):
     56                 int(args[0])
---> 57             return method(self, *args, **kwargs)
     58         except ValueError:
     59             # First arg isn't an int, convert to A1 notation.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write_datetime(self, row, col, date, cell_format)
    666 
    667         # Convert datetime to an Excel date.
--> 668         number = self._convert_date_time(date)
    669 
    670         # Add the default date format.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in _convert_date_time(self, dt_obj)
   3265     def _convert_date_time(self, dt_obj):
   3266         # Convert a datetime object to an Excel serial date and time.
-> 3267         return datetime_to_excel_datetime(dt_obj, self.date_1904)
   3268 
   3269     def _options_changed(self):

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/utility.py in datetime_to_excel_datetime(dt_obj, date_1904)
    577 
    578     # Convert a Python datetime.datetime value to an Excel date number.
--> 579     delta = dt_obj - epoch
    580     excel_time = (delta.days
    581                   + (float(delta.seconds)

/home/silvio/prod34/lib/python3.4/site-packages/pandas/tslib.cpython-34m.so in pandas.tslib._Timestamp.__sub__ (pandas/tslib.c:11918)()

TypeError: can't subtract offset-naive and offset-aware datetimes

Using Python 3.4.0 on Arch Linux, list of installed packages:

~% pip freeze
Cython==0.20.1
Jinja2==2.7.2
MarkupSafe==0.21
Pygments==1.6
SQLAlchemy==0.9.4
XlsxWriter==0.5.3
ecdsa==0.11
ipdb==0.8
ipython==2.0.0
ipython-sql==0.3.1
lxml==3.3.5
matplotlib==1.3.1
mysql-connector-python==1.1.6
nose==1.3.1
numpy==1.8.1
openpyxl==1.8.5
pandas==0.13.1
paramiko==1.13.0
patsy==0.2.1
prettytable==0.7.2
psycopg2==2.5.2
pycrypto==2.6.1
pyodbc==3.0.7
pyparsing==2.0.2
python-dateutil==2.2
pytz==2014.2
requests==2.2.1
scipy==0.13.3
six==1.6.1
sqlparse==0.1.11
statsmodels==0.6.0
tornado==3.2
xlrd==0.9.3
API Design Bug Error Reporting IO Excel Timezones

Most helpful comment

@jreback

thank you for your answer.

In fact, I want the user of an excel file to be shown the correct date and time in an arbitrary time zone.

To do so and avoid the current issue, I do the following before exporting to xlsx using xlsxwriter, using a column with strings containing UTC timestamp data :

(Paris is the target zone)

df['quote_calculation_time'] = pd.to_datetime(
    df['quote_calculation_time'],utc=True) \
    .dt.tz_localize('utc') \
    .dt.tz_convert('Europe/Paris') \
    .dt.tz_localize(None)

So having an option like you say would be really great

All 17 comments

cc @jmcnamara

does excel support tz's in dates even?

not sure what this should do

does excel support tz's in dates even?

No. Excel doesn't have any support for timezones.

I don't really know how this should be handled.

ok....maybe should just report a better error for now

@sboehler what do you think it should do?

I'd expect either localized or UTC timestamps (both without tz information)
to be written to the file. Personally I vote for localized timestamps, as
the user by localizing his series explicitly opted for a certain timezone,
and this should be respected. If one wanted UTC output, a simple
tz_localize of the data before writing out solved the issue.

Agree?

2014-05-06 18:36 GMT+02:00 jreback [email protected]:

ok....maybe should just report a better error for now

@sboehler https://github.com/sboehler what do you think it should do?

—
Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/7056#issuecomment-42326246
.

that makes sense, so choices I think are:

  • raise with helpful message (unambiguous)
  • provide a convert_timezones='local'|'utc' keyword; works but is 'silent', and could be ambiguous if you have multiple-columns/indicies that may need conversion
  • could do a local translation and add another column with the tz

or:

  • local translation without tz-information, relying on Excel users to know what they're doing (which is generally assumed, given Excel's poor support for date/time handling). If Excel is ever tz-aware, the obvious migration is to add back the tz information.

The global "convert_timezones" argument is problematic with several columns of timestamps - a user might want to output multiple columns with local timestamps and UTC timestamps to address Excels lack of tz handling.

ok...

@sboehler so this is pretty straightfoward....PR?

I just noticed, the bug does not occur when I uninstall xlsxwriter! Pandas must be internally choosing the xlsx engine, depending on what's available in terms of packages, and use something else if xlsxwriter is not present.

The default seems to be to output the localized timestamp, omitting the timezone info.

@jreback As you can see from my github profile I have no experience with developing pandas, but I am happy to give it a go. This could take a few days, however...

you can specify engine='openpyxl' to get the same behavior (yes , the default is xlsxwriter as its much faster)

give it a go and let us know any issues

from #7060

That sounds right, but then the error message is not informative and it should still be possible to   
output the dates as a text string with tz offset, the same way they appear when they output to the 
terminal. MIgrating pandas types into excel types when outputting is nice but it's not a requirement.

It's not only a xlsxwriter problem actually, xlwt and xlsxwriter fail with this error and openpyxl 
creates the file but throws away the timezone information so it actually silently corrupts the data. 
This is a little broken all around.

to clarify here, @kay1793 what would an API for this be like?

e.g. need to have 2 ways of converting a timezone-aware Series/Index (e.g. raise if the underlying engine does not support, which most/all doen't ATM, or stringify to a isoformat).

I can't say about the API, stringify to iso would work perfect for my needs and can still be sorted. Coercing to UTC would also be ok if there was a big fat warning came with it.

using
engine='openpyxl'
creates the excel file. But all times have the same hour:
_01:00:00_

Could be an intermediate solution to add the warning and a note on the docs?

Hello everyone,

delta = dt_obj - epoch

A solution to this bug would be to localize epoch to the timezone of dt_obj, so that the timedelta is correct whatever the time zone is, and time-zoned datetimes can be handled, even if there's no notion of timezone in the resulting xlsx file.

what do you think about this ?

(this bug also appears with normal columns that are not indexes)

@misterjoa you actually just want _naive_ zones, IOW, .tz_localize(None).

I maybe _could_ see an option .to_excel(...., tz='local|utc') which would either by default localize or .tz_convert('UTC').tz_localize(None) for tz-aware (which we typically do in other situations, e.g. when going to a numpy array).

@jreback

thank you for your answer.

In fact, I want the user of an excel file to be shown the correct date and time in an arbitrary time zone.

To do so and avoid the current issue, I do the following before exporting to xlsx using xlsxwriter, using a column with strings containing UTC timestamp data :

(Paris is the target zone)

df['quote_calculation_time'] = pd.to_datetime(
    df['quote_calculation_time'],utc=True) \
    .dt.tz_localize('utc') \
    .dt.tz_convert('Europe/Paris') \
    .dt.tz_localize(None)

So having an option like you say would be really great

Was this page helpful?
0 / 5 - 0 ratings