Pandas: DOC: HDFStore documentation needs listing all methods

Created on 19 May 2018  路  4Comments  路  Source: pandas-dev/pandas

Problem description

The docs (https://pandas.pydata.org/pandas-docs/stable/api.html#hdfstore-pytables-hdf5) doesn't make any mention of HDFStore methods like .get_storer, which I found quite useful for finding number of records without loading the whole file to memory, but I don't see any documentation of it although it pops up in sample code in Cookbook and IOtools pages.

I am bumping into these methods only on forums, and am not able to find official documentation about them. I found a bigger listing of available methods here: https://stackoverflow.com/a/37986144/4355695 Copying them below for posterity sake. Even the poster there has only linked to the Cookbook page as docs.

Also, there needs to be disambiguation about what works for fixed format and what works for table format. Many code examples are showing table-specific commands and not specifying that this won't work if your data was originally stored in fixed format. The default format in pd.to_hdf() method is fixed . Again, this .to_hdf method too is not listed under the main HDF5 part of the docs, API > Input/Output > HDFStore: PyTables (HDF5). But we find it under Serialization / IO / Conversion under Series and DataFrame sections. And I'm not sure how it differs from HDFStore.put.

Then, HDFStore.append will store in table format and will work with existing objects only if they are stored in table formats. But my project's data already got all stored in the default fixed format and now I can't use append on them. Also, I want to find out if it's possible to get the listing of columns without loading the whole file like it was with .get_storer. But it seems like that too is only available for table format.

store = pd.HDFStore('h5File.h5')

store.append_to_multiple
store.close
store.copy
store.create_table_index
store.filename
store.flush
store.get
store.get_node
store.get_storer
store.groups
store.is_open
store.items
store.iteritems
store.keys
store.open
store.put
store.remove
store.root
store.select
store.select_as_coordinates
store.select_as_multiple
store.select_column

Expected Output

  • Documentation of each method/function possible with HDFStore. Possibly a separate page for HDFStore.
  • Specifying what works for the default fixed format and what works for table format.
  • Specifying which method loads the whole object into memory and which method doesn't.
  • Mentioning of similar methods in each other's docs (like: pd.to_hdf and HDFStore.put ought to mention each other and specify the differences if any)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 32
OS: Linux
OS-release: 4.13.0-41-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: en_IN
LOCALE: en_IN.ISO8859-1

pandas: 0.22.0
pytest: None
pip: 10.0.1
setuptools: 20.7.0
Cython: 0.27.3
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 3.5.0
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Docs IO HDF5

Most helpful comment

Here's the get_storer method:

https://github.com/pandas-dev/pandas/blob/bc37ea2e05019a89adaa48159b220483598d1898/pandas/io/pytables.py#L1119

The rest of the HDFStore implementation pieces will all be in that module, so feel free to improve the docstrings of any of those methods that you've come across. You should update docstrings to follow the pandas docstring guide:

https://python-sprints.github.io/pandas/guide/pandas_docstring.html

To get those docstrings to render they need to be added to the API:

https://github.com/pandas-dev/pandas/blob/master/doc/source/api.rst#hdfstore-pytables-hdf5

So to sum up:

  1. Update the docstrings for the items in question to follow the pandas docstring standard AND
  2. Add them to the API file to ensure they get rendered

There's a section in the pandas contributing guide geared towards documentation improvements, so be sure to give that a look as well:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#contributing-to-the-documentation

Hope that helps but let me know if you have any questions

All 4 comments

PRs to improve documentation are always welcome!

@WillAyd I'd love to help out, but could use some guidance to how to begin. I'm just copying code examples and using only a few of these methods at my end so don't know all the arguments, options for all the functions. I'm guessing one can look at the function definitions in the source code? Where can I find those? I could start a page in markdown with entries.. or start a shared google doc or so...

Here's the get_storer method:

https://github.com/pandas-dev/pandas/blob/bc37ea2e05019a89adaa48159b220483598d1898/pandas/io/pytables.py#L1119

The rest of the HDFStore implementation pieces will all be in that module, so feel free to improve the docstrings of any of those methods that you've come across. You should update docstrings to follow the pandas docstring guide:

https://python-sprints.github.io/pandas/guide/pandas_docstring.html

To get those docstrings to render they need to be added to the API:

https://github.com/pandas-dev/pandas/blob/master/doc/source/api.rst#hdfstore-pytables-hdf5

So to sum up:

  1. Update the docstrings for the items in question to follow the pandas docstring standard AND
  2. Add them to the API file to ensure they get rendered

There's a section in the pandas contributing guide geared towards documentation improvements, so be sure to give that a look as well:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#contributing-to-the-documentation

Hope that helps but let me know if you have any questions

@WillAyd thanks so much for the detailed guidance. Will get on this in the coming days. It would be great to have a chance to give back to pandas. And if I can learn the ropes of these cool documentation practices (automation of them, more) along the way then it will be a great value add for my own work.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

idanivanov picture idanivanov  路  3Comments

amelio-vazquez-reina picture amelio-vazquez-reina  路  3Comments

ericdf picture ericdf  路  3Comments

BDannowitz picture BDannowitz  路  3Comments

scls19fr picture scls19fr  路  3Comments