Pandas: pd.HDFStore issues PerformanceWarnings for string columns

Created on 31 Jul 2013  路  7Comments  路  Source: pandas-dev/pandas

I get the following performance warning

C:\Python27\lib\site-packages\pandas\io\pytables.py:1795: PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block2_values]

warnings.warn(ws, PerformanceWarning)

for the following DataFrame

MSCI.ACWI float64
MarketCap float64
alpha float64
gics_code float64
investable bool
issuer_country object
universe bool
weight_benchmark float64
weight_portfolio float64
weight_active float64
dtype: object

The culprit is issuer_country column, which is a string column. I find it surprising. Is there a way to get around this?

Most helpful comment

This can happen if you have NaN in your string column (or integers/floats) or non-strings. Its 'mixed'. The reason for this warning is that when using a storer (e.g. store['key'] = value the data is pickled, hence it might not be efficient. An alternative is to you the table format (e.g. store.append('key',value) which deals with these types of data (e.g. NaN in a string column).

All 7 comments

This can happen if you have NaN in your string column (or integers/floats) or non-strings. Its 'mixed'. The reason for this warning is that when using a storer (e.g. store['key'] = value the data is pickled, hence it might not be efficient. An alternative is to you the table format (e.g. store.append('key',value) which deals with these types of data (e.g. NaN in a string column).

Ok. I do this:

store = pd.HDFStore(OUTPUT_FILE_NAME_PORTFOLIO, 'w')
store[OUTPUT_NAME_PORTFOLIO] = portfolio_data
store.close()

How should I change it?

store.append(OUTPUT_NAME_PORTFIOLO, portfolio_data

alternately

portfolio.to_hdf(OUTPUT_FILE_NAME_PORTFOLIO,OUTPUT_NAME_PORTFOLIO,mode='w',table=True)

to read

pd.read_hdf(OUTPUT_FILE_NAME_PORTFOLIO,OUTPUT_NAME_PORTFOLIO)

(and you can also provide a where if you'd like)

Works like magic! Thanks a lot.

http://pandas.pydata.org/pandas-docs/dev/io.html#storer-format and the following section explain the different storage mechanism. I use tables almost exclusively, because I append and then query. But storer's are great for quick and dirty.

pls close if your questino was answered thanks.

Was this page helpful?
0 / 5 - 0 ratings