Joss-reviews: [REVIEW]: Category Encoders: a scikit-learn-contrib package of transformers for encoding categorical data

Created on 12 Dec 2017  ยท  26Comments  ยท  Source: openjournals/joss-reviews

Submitting author: @wdm0006 (William McGinnis)
Repository: https://github.com/scikit-learn-contrib/categorical-encoding
Version: v1.2.5
Editor: @jakevdp
Reviewer: @desilinguist
Archive: 10.5281/zenodo.1157110

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/d57818316816a19a80112892c3d12ed7"><img src="https://joss.theoj.org/papers/d57818316816a19a80112892c3d12ed7/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/d57818316816a19a80112892c3d12ed7/status.svg)](https://joss.theoj.org/papers/d57818316816a19a80112892c3d12ed7)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@desilinguist, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @jakevdp know.

Conflict of interest

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the repository url?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Version: Does the release version given match the GitHub release (v1.2.5)?
  • [x] Authorship: Has the submitting author (@wdm0006) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [x] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Authors: Does the paper.md file include a list of authors with their affiliations?
  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
accepted published recommend-accept review

Most helpful comment

@arfon, we're ready to accept this!

All 26 comments

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @desilinguist it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews ๐Ÿ˜ฟ

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
Attempting PDF compilation. Reticulating splines etc...
https://github.com/openjournals/joss-papers/blob/joss.00501/joss.00501/10.21105.joss.00501.pdf

Hello,

I had a general question about the nature of these papers. How should I handle extra authors? I was the original author and am sole maintainer so I put myself there, should I solicit other contributors to add themselves? Should contributors going forward add themselves? Etc.

Thank you, -Will

I had a general question about the nature of these papers. How should I handle extra authors? I was the original author and am sole maintainer so I put myself there, should I solicit other contributors to add themselves? Should contributors going forward add themselves? Etc.

We generally try not to police authorship here but in the past, package owners/maintainers have opened an issue on the repository (https://github.com/scikit-learn-contrib/categorical-encoding) to ask if other contributors would like to be authors on the paper.

@desilinguist - friendly reminder to get to this review when you get a chance ๐Ÿ˜„

Iโ€™ll get it done this week, just back from vacation :)
On Sun, Jan 7, 2018 at 9:56 PM Arfon Smith notifications@github.com wrote:

I had a general question about the nature of these papers. How should I
handle extra authors? I was the original author and am sole maintainer so I
put myself there, should I solicit other contributors to add themselves?
Should contributors going forward add themselves? Etc.

We generally try not to police authorship here but in the past, package
owners/maintainers have opened an issue on the repository (
https://github.com/scikit-learn-contrib/categorical-encoding) to ask if
other contributors would like to be authors on the paper.

@desilinguist https://github.com/desilinguist - friendly reminder to
get to this review when you get a chance ๐Ÿ˜„

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/501#issuecomment-355876109,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ3kOPZHjfromcI_28eRPLW6H78X6qoks5tIYPRgaJpZM4Q_eng
.

Ok, cool, I'll open an issue then, that's a good idea.

This is a very useful and well thought out piece of software, speaking as someone who has had to deal with categorical features a lot for various datasets. It's extremely helpful to have a single package that not only contains a large variety of methods to convert such categorical features into numerical ones but is also compatible with the leading Python machine learning package that is most widely used around the world.

I think the authors have addressed almost all aspects of the review except for a few minor shortcomings that can be easily addressed:

  1. I think the statement of need can be improved. For experienced ML folks, it may be obvious why you would want to convert categorical features into numerical ones. However, it may be better to talk about that a bit more explicitly for more novice ML users and to also show a theoretical example, if possible.

  2. Speaking of examples, I was hoping to find an easy to try actual example in the README and the front page of the documentation instead of just a pointer to the examples folder. In addition, the scripts in the examples folder actually don't show how to use these encoders with a Pandas dataframe which is a very nice feature to have and the authors should show that off more explicitly. I had to actually read the source code for some of the encoders to see examples of usage with dataframes which were embedded in the docstring (e.g., BinaryEncoder). Perhaps an actual Examples section in the documentation illustrating all different ways in which the package can be used (pandas input, pipeline support, pandas output, etc.) would serve much better?

  3. It would also be nice to see a section in the README about how to contribute.

Other than improving the documentation and examples, I have no other recommendations for this paper. I think it will prove quite useful to many folks.

Thanks for the feedback @desilinguist, I'll work on improving the readme and the statement of need.

Thanks for the prompt review, @desilinguist! @wdm0006 โ€“ let us know when the comments are addressed.

@desilinguist I've updated the README and added a little bit to the paper, please let me know if the changes are sufficient.

Your changes look good @wdm0006! Thanks for making those changes. I have no more comments and I think the submission can be accepted now.

Thanks @desilinguist and @wdm0006!

@whedon commands

Here are some things you can ask me to do:

# List all of Whedon's capabilities
@whedon commands

# Assign a GitHub user as the reviewer of this submission
@whedon assign @username as reviewer

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

# Change editorial assignment
@whedon assign @username as editor

# Set the software archive DOI at the top of the issue e.g.
@whedon set 10.0000/zenodo.00000 as archive

# Open the review issue
@whedon start review

๐Ÿšง ๐Ÿšง ๐Ÿšง Experimental Whedon features ๐Ÿšง ๐Ÿšง ๐Ÿšง

# Compile the paper
@whedon generate pdf

@whedon generate pdf

Attempting PDF compilation. Reticulating splines etc...
https://github.com/openjournals/joss-papers/blob/joss.00501/joss.00501/10.21105.joss.00501.pdf

@arfon, we're ready to accept this!

@wdm0006 - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

@arfon fantastic! I'll work on that today. Thanks all

Ok, @arfon, I've registered with zenodo, and have added the DOI badge to the README:

https://zenodo.org/record/1157110#.WmYiuFQ-fdQ

If there's anything else I need to do please just let me know. Thanks, -Will

@whedon set 10.5281/zenodo.1157110 as archive

OK. 10.5281/zenodo.1157110 is the archive.

@desilinguist - many thanks for your review here and to @jakevdp for editing this one โœจ

@wdm0006 - your paper is now accepted into JOSS and your DOI is https://doi.org/10.21105/joss.00501 โšก๏ธ ๐Ÿš€ ๐Ÿ’ฅ

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippet:

[![DOI](http://joss.theoj.org/papers/10.21105/joss.00501/status.svg)](https://doi.org/10.21105/joss.00501)

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider volunteering to review for us sometime in the future. You can add your name to the reviewer list here: http://joss.theoj.org/reviewer-signup.html

Was this page helpful?
0 / 5 - 0 ratings