Pandas: PerformanceWarning: what is actually the problem I can change?

Created on 16 May 2013 · 14Comments · Source: pandas-dev/pandas

I get several PerformanceWarnings when I store my dataframe in a hdfstore:

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

What I can't get from this is what column gives me these problems, at least I don't have any "block0" columns :-) It would be nice if this warnings can give me an indicator what i can actually do about this warnings.

Source

jankatins

👍5

Most helpful comment

the warning is just to alert the user that u r basically pickling those fields rather than storing then in a c-type
u can filter the warnings as well

import warnings
warnings.filterwarnings('ignore',category=pandas.io.pytables.PerformanceWarning)

jreback on 21 May 2013

👍9

All 14 comments

You are storing Stores (meaning not a Table), which means that PyTables is pickling some type of data. Several options. Split out the data to separate nodes (that node will still have the warning, but the rest will be faster), or you can save it as a Table (which should support it a little better). Can you show me a sample of the data and df.dtypes?

jreback on 16 May 2013

also...update to master, I just added #3623 which should make the warnings slightly more informative

jreback on 16 May 2013

Here is some code which produces these warnings:

from data_names import (hdf_store_name, hdf_aaa, csv_aaa)
aaa = pandas.read_csv(csv_aaa, encoding="iso-8859-15", skiprows=0, sep=";", dtype={"zz id": np.int32})
[... some data cleaning...]

# open and close because there were some errors when the hdf stores was initially created and 
# immediately written to. Not sure if that is necessary anymore.
store = pandas.HDFStore(hdf_store_name)
store.close()
store = pandas.HDFStore(hdf_store_name)
store[hdf_aaa] = aaa
store.close()

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block2_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block2_items]

  warnings.warn(ws, PerformanceWarning)

aaa.dtypes

title                                            object
a                                               object
b                                               float64
c                                               float64
d                                               float64
e                                               float64
f                                                float64
g                                               float64
h                                               object
i                                                object
j                                                int32
k                                               int32
l                                                int32
m                                              int32
n                                               int32
o                                               int32
p                                               int32
dtype: object

The objects are strings of variable length (some are paragraph length).

Performance is not a problem (~seconds? or less than a second, even for my biggest data file, which has ~300k rows), so I don't mind the time it takes, just the warnigns which make my IPython notebook longer and harder to read the important parts.

jankatins on 17 May 2013

the open/close twice should not be necessary

can u post

df._data.blocks?

jreback on 17 May 2013

not sure if u can but would help if u post your data file (a link on say Dropbox)
can do privately if u want

jreback on 17 May 2013

are some of your object columns actually unicode? this could definitly trigger this

jreback on 17 May 2013

print journals._data.blocks
[FloatBlock: [SNIP2_2009, SJR2_2009, SNIP2_2010, SJR2_2010, SNIP2_2011, SJR2_2011], 6 x 32059, dtype float64, IntBlock: [sjr2_2011_top10_overall, sjr2_2011_top10_nano, sjr2_2011_top10_business, sjr2_2011_top10_BusinessManagementAccounting, sjr2_2011_top10_MaterialsScience, articles_count, sjr2_2011_top10], 7 x 32059, dtype int32, ObjectBlock: [title, ISSN, BusinessManagementAccounting, MaterialsScience], 4 x 32059, dtype object]
type(journals.iloc[0,0]) # This is the "title" column
unicode

jankatins on 17 May 2013

Try getting rid of the unicode

In [27]: x = 'foo'

In [28]: type(x)
Out[28]: str

In [29]: type(x.decode('utf-8'))
Out[29]: unicode

you may need something like

df['column_with_unicode'] = df['column_with_unicode'].apply(lamda x: x.decode('utf-8'))

FYI very soon (with the release of PyTables 3.0) I think we will be able to support unicode

jreback on 17 May 2013

Then I will simple wait until that happens. Right now the performance is no problem, just the annoying warnings :-)

jankatins on 21 May 2013

the warning is just to alert the user that u r basically pickling those fields rather than storing then in a c-type
u can filter the warnings as well

import warnings
warnings.filterwarnings('ignore',category=pandas.io.pytables.PerformanceWarning)

jreback on 21 May 2013

👍9

closing for now, @JanSchulz reopen/new issue if you have questions/concerns

jreback on 21 May 2013

Hi @jreback , im on pytables 3 (tables==3.2.0) and am still facing the same issue as @JanSchulz - warnings when i try to save my 'df' as 'h5'. My data frame does contain unicode. Any thing i can do to avoid them ?

jetpackdata on 30 Jul 2015

make sure you are storing with format='table'

py3 handles the Unicode

pls show code and version if this doesn't work

jreback on 30 Jul 2015

👍2

I found a weird case when I ran the same command the second time then that warning disappeared:

PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

f.to_hdf("dataset_test.h5", key="test")

P.S. I ran it in interactive mode, version: python==3.6.7, pandas==0.23.4
P.P.S Hmm I guess this is its behavior. Not sure though.