Pandas: DataFrame.to_hdf fails in Python 3.4

Created on 9 Jan 2015 · 7Comments · Source: pandas-dev/pandas

This may be more of a PyTables issue but it affects Pandas users too

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2, 3]})
>>> df.to_hdf('myfile.hdf5', '/data', append=True)
  File "/home/mrocklin/Software/anaconda/envs/py34/lib/python3.4/site-packages/tables/attributeset.py", line 381, in _g__setattr
    elif name == "FILTERS" and self._v__format_version >= (2, 0):
TypeError: unorderable types: NoneType() >= tuple()

It appears that self._v__format_version is None. Not sure why.

>>> import sys
>>> sys.version
'3.4.2 |Anaconda 2.1.0 (64-bit)| (default, Oct 21 2014, 17:16:37) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]'
>>> pd.__version__
'0.14.1'
>>> import tables
>>> tables.__version__
'3.1.1'

This works fine with append=False

>>> df.to_hdf('myfile.hdf5', '/data', append=False)
>>>

IO HDF5 Usage Question

Source

mrocklin

Most helpful comment

I don't understand the above explanation and I'm having a similar issue.

with pd.HDFStore(filename) as store:
    store.put('/', df,
                   mode='w',  # if I comment this I still have the error
                   append=True,  # True or False, the error is here
                   format='table',
                   data_columns=True,
                   encoding='utf-8',
                   dropna=False)

This line is also giving me the error mentioned. If I comment out the format='table' line, it works. @jreback, an you elaborate?

cachitas on 31 Mar 2015

👍2

All 7 comments

You are violating the guarantees of HDF5. The only allowed file access is thru those constructors. You will get this error if the file is previously constructed and not HDF5.

I suppose a better error message is in order. I would raise this on the PyTables side. This only happens on 3.4.

The obvious solution is when you are not appending always to open with mode='w'

In [12]: !rm foo.h5

In [13]: df = pd.DataFrame({'a': [1, 2, 3]})

In [14]: df.to_hdf('foo.h5', '/data', append=True)

In [15]: !rm foo.h5

In [16]: with open('foo.h5','wb') as fh:
    pass
   ....: 

In [17]: df.to_hdf('foo.h5', '/data', append=True)
TypeError: unorderable types: NoneType() >= tuple()

In [18]: df.to_hdf('foo.h5', '/data', append=True, mode='w')

jreback on 10 Jan 2015

👍1

Alrighty. Thanks for the explanation. That helps.

On Fri, Jan 9, 2015 at 3:16 PM, jreback [email protected] wrote:

Closed #9219 https://github.com/pydata/pandas/issues/9219.

—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/issues/9219#event-216818788.

mrocklin on 10 Jan 2015

I don't understand the above explanation and I'm having a similar issue.

with pd.HDFStore(filename) as store:
    store.put('/', df,
                   mode='w',  # if I comment this I still have the error
                   append=True,  # True or False, the error is here
                   format='table',
                   data_columns=True,
                   encoding='utf-8',
                   dropna=False)

This line is also giving me the error mentioned. If I comment out the format='table' line, it works. @jreback, an you elaborate?

cachitas on 31 Mar 2015

👍2

well in order to have an hdf5 you must create it with the hdf5 constructors. you cannot create it by doing say open('file.h5') and thrn trying to write hdf5 to it.

jreback on 31 Mar 2015

I don't understand the answer, maybe what I am trying to do cannot be achieved in this way. I am using h5py to create the HDF5 file and its internal structure. A normal workflow for me would be:

Create a new HDF5 file:

import h5py
import pandas as pd
f = h5py.File('file.hdf5', 'a')

Create a group and dataset:

grp = f.create_group('One_Group')
dset = f.create_dataset('One_Group/One_Dset', (100,), dtype='S10')

Every time I have new data, append it to the corresponding dataset:

df = pd.read_table('data.csv', sep='\t')
df.to_hdf('file.hdf5', '/One_Group/One_Dset', append=True, mode='a')

Output of the last line:

D:\...\lib\site-packages\tables\leaf.py in __init__(self, parentnode, name, new, filters, byteorder, _log)
    255             # Get filter properties from parent group if not given.
    256             if filters is None:
--> 257                 filters = parentnode._v_filters
    258             self.__dict__['filters'] = filters  # bypass the property
    259 

AttributeError: 'Array' object has no attribute '_v_filters'

Any idea of what is wrong?

iipr on 27 Mar 2017

@iipr HDFStore is based upon pytables, see the docs here

it can read h5py files, but not write them, and certainly not by opening a h5py in that way. You can open a PyTables file directly and do what you are doing. But really no need for that as to_hdf (and HDFStore) can take care of the file IO directly.

In the future this may be possible (as h5py and PyTables are combining the formats), but not today.

jreback on 27 Mar 2017

I see, thanks a lot. You are right, I read this some time ago, but I didn't know if the combination of h5py and PyTables was already happening or not.

I will probably start working with PyTables to see if it goes fine for my project then.

iipr on 28 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings