Joss-reviews: [REVIEW]: h5preserve

Created on 14 Feb 2018  Â·  23Comments  Â·  Source: openjournals/joss-reviews

Submitting author: @aragilar (James Tocknell)
Repository: https://github.com/h5preserve/h5preserve
Version: v0.15.0
Editor: @arfon
Reviewer: @mattpitkin
Archive: 10.5281/zenodo.1179292

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/2f5fa0cbfc075383d7764d271889b6ea"><img src="http://joss.theoj.org/papers/2f5fa0cbfc075383d7764d271889b6ea/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/2f5fa0cbfc075383d7764d271889b6ea/status.svg)](http://joss.theoj.org/papers/2f5fa0cbfc075383d7764d271889b6ea)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mattpitkin, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @arfon know.

### Conflict of interest

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the repository url?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Version: Does the release version given match the GitHub release (v0.15.0)?
  • [x] Authorship: Has the submitting author (@aragilar) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [x] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Authors: Does the paper.md file include a list of authors with their affiliations?
  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
accepted published recommend-accept review

All 23 comments

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @mattpitkin it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
Attempting PDF compilation. Reticulating splines etc...

@mattpitkin - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: http://joss.theoj.org/about#reviewer_guidelines

Any questions/concerns please let me know.

Hi @aragilar - could you include explicit instructions for installation in the README and documentation? E.g., something like:

install the software using:

pip install h5preserve

or by cloning this repository and running:

python setup.py install --user

Could you also add a requirements file?

Having downloaded h5preserve using pip, I'm trying the example shown here, and in both Python 2.7.12 and 3.5.2 I get the following error:

In [6]: my_cool_experiment = Experiment(np.array([1,2,3,4,5]), 10)

In [7]: with open("my_data_file.hdf5", new_registry_list(registry)) as f:
   ...:     f["cool experiment"] = my_cool_experiment
   ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-f3958ebfc497> in <module>()
----> 1 with open("my_data_file.hdf5", new_registry_list(registry)) as f:
      2     f["cool experiment"] = my_cool_experiment

TypeError: file() argument 2 must be string, not RegistryContainer

If I edit it to use with open("my_data_file.hdf5", "w") as f: I get the following error:

In [10]: with open("my_data_file.hdf5", "w") as f:
    ...:     f["cool experiment"] = my_cool_experiment
    ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-63e87c258186> in <module>()
      1 with open("my_data_file.hdf5", "w") as f:
----> 2     f["cool experiment"] = my_cool_experiment

TypeError: 'file' object does not support item assignment

Do you know what the problem could be?

@mattpitkin That error due to me not checking that the code runs, it should be h5open not open (I've updated the docs), that example also used an older version of the api, where dictionaries could be used (I dropped that as there were a bunch of problems with it).

I've also added more explicit instructions and referred to https://packaging.python.org/ as needed (I was avoiding explicit instructions, as I've noticed far too many projects refer to old methods e.g. python setup.py install, eazy_install).

I haven't added a requirements.txt file for the dependencies of h5preserve as h5preserve is a library, not an application, so it shouldn't have a requirements.txt as per https://caremad.io/posts/2013/07/setup-vs-requirement/. If you meant for the tests, tox handles all that.

Thanks for these additions. Now that you've removed explicit Python 2 support could you update the version on PyPI, so that the badge on the README page no longer lists Python 2.7?

Regarding the automated testing, I'm not really familiar with tox, but what does it mean when the output says

================================ 94 passed, 43 xfailed in 0.78 seconds ================================

Is it ok/expected that 43 xfail?

Just a minor typo thing - the links to the repo in the "Contributing" doc page point to github.org rather than github.com, so they don't work.

Fixed the urls, and added a test to the CI to make sure they don't break, and uploaded a new release to PyPI (so everything should show python 3 only).

The xfail come from pytest, it means the test is expected to fail (and hence "passes"), if they passed then there'd be a problem (I think most of those xfailing are stubs for unit tests I haven't had time to implement yet, most of the code is covered by integration tests).

I just have a couple more question/requests?

Is it generally the case that you would define the registry.loader and registry.dumper functions outside of the class they're referring to, or could/should you have them as methods of the class just called, e.g., dump and load? The latter seems to make more sense to me, but maybe I'm thinking of these methods in the wrong way.

Would it be possible to give explicit examples of creating your own RegistryContainer, and using the GroupContainer, as only the DatasetContainer is used in the current example? Also, would you be able to shown an example of versioning in action, where you create two different versions, as I'm not entirely sure what it means at the moment?

A couple of minor typos to fix in the paper.md file:

  • change "...data structures produce by some modelling..." to "...data structures produced by some modelling..."
  • change "...design flaws in pickle have brought up..." to "...design flaws in pickle have been brought up..."

Can you add the doi = {10.1109/MCSE.2011.37} to the van_der_walt_numpy_2011 reference in the paper.bib file?

@mattpitkin I've added the paper changes, and I've added more examples, do they make sense to you?

Whilst there's nothing stopping you from making the dumpers/loaders static methods on the class, there a few reasons not to:

  1. Sometimes it's hard to add methods to classes (or where it's inconvenient to): e.g. namedtuples, or other people's classes
  2. Subclasses will use the same method for loading/dumping, unless you override it: this is in effect the same problem that pickle has (it has options for controlling dumping via a bunch of methods: https://docs.python.org/3/library/pickle.html#pickle-inst), you lose control over what you dump/load (which is why you can choose not to use the built-in dumpers/loaders).
  3. Versioning becomes much harder: you either end up with n different functions, which is the same as the existing API, or you have a whole bunch of if statements, which you can screw up.

I'll try to translate that into an advice section, but once you try to implement the system as suggested in https://www.youtube.com/watch?v=7KnfGDajDQw, then you do come to the conclusion that separate functions just works better (though it took me two previous iterations of the code to learn this, it's definitely not obvious).

@aragilar - thanks, the examples look good to me, so I'm happy to sign things off.

@arfon - I'm happy for this to be accepted.

@aragilar - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

@whedon set 10.5281/zenodo.1179292 as archive

OK. 10.5281/zenodo.1179292 is the archive.

@mattpitkin many thanks for your review here ✨

@aragilar - your paper is now accepted into JOSS and your doi is https://doi.org/10.21105/joss.00581 :zap: :rocket: :boom:

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippet:

[![DOI](http://joss.theoj.org/papers/10.21105/joss.00581/status.svg)](https://doi.org/10.21105/joss.00581)

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Was this page helpful?
0 / 5 - 0 ratings