Joss-reviews: [REVIEW]: Confidence Intervals for Random Forests in Python

Created on 25 Nov 2016  ·  45Comments  ·  Source: openjournals/joss-reviews

Submitting author: @kpolimis (Kivan Polimis)
Repository: https://github.com/scikit-learn-contrib/forest-confidence-interval
Version: .1
Editor: @jakevdp
Reviewer: @DannyArends
Archive: 10.5281/zenodo.1044209

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/b40f03cc069b43b341a92bd26b660f35"><img src="http://joss.theoj.org/papers/b40f03cc069b43b341a92bd26b660f35/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/b40f03cc069b43b341a92bd26b660f35/status.svg)](http://joss.theoj.org/papers/b40f03cc069b43b341a92bd26b660f35)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

Conflict of interest

  • [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General checks

  • [x] Repository: Is the source code for this software available at the repository url?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Version: Does the release version given match the GitHub release (.1)?
  • [x] Authorship: Has the submitting author (@kpolimis) made major contributions to the software?

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [x] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: Have any performance claims of the software been confirmed?

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)?
  • [x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Authors: Does the paper.md file include a list of authors with their affiliations?
  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)?
accepted published recommend-accept review

Most helpful comment

@arfon I merged https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/64 and created a DOI of the archive with Zenodo (https://doi.org/10.5281/zenodo.1044209)

All 45 comments

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS.

For a list of things I can do to help you, just type:

@whedon commands

@DannyArends - 👋 this is the actual review issue. Please update the checklist above as you make progress.

Review of scikit-learn-contrib/forest-confidence-interval

I checked the initial boxes, the boxes I left unchecked have issues and are described point by point below:

[edit] Now submitted the issues to the Github repository, added links to here [/edit]

License:
LICENCE file is not an OSI approved licence, see scikit-learn-contrib/forest-confidence-interval#27

Installation:
I couldn't install one of the dependencies, without root rights, please describe that this might be needed in the documentation. See scikit-learn-contrib/forest-confidence-interval#28

Functionality:
The software seems to provide the functionality as described

Performance:
The software makes no performance claims, a performance comparison against
the original R-code would be interesting to add. See scikit-learn-contrib/forest-confidence-interval#29

A statement of need:
There is no clear statement of need is not provided in the readme. See scikit-learn-contrib/forest-confidence-interval#30

Example usage:
1) The examples use two standard datasets from another package, however it is not explained in which
format the users own data needs to be presented before it can be used.
Please provide links to the relevant documentations.

2) The two examples do not show before and after plots of the dataset, as
such it is not clear why the user should use this routine.

3) There is no description of what settings can be used in the calculation of inbag and unbiased variance and how these influence the results

See scikit-learn-contrib/forest-confidence-interval#31

Functionality documentation:
This package provides 2 functions to the user, which have been documented. However there are also 2 undocumented internal functions, which should be documented. . See scikit-learn-contrib/forest-confidence-interval#32

Automated tests:
There is no mention of tests, or how to run the tests in the documentation. Also the level of test coverage for this package is unknown. See scikit-learn-contrib/forest-confidence-interval#33

Community guidelines:
Please provide guidelines in the README file for others to:
1) Contribute to the software
2) Report issues or problems with the software
3) Seek support

See scikit-learn-contrib/forest-confidence-interval#34

Authors:
The submitting author: Kivan Polimis, has no institute listed, while the others from the same university have an institute listed. Is this an oversight or intentional ?

See scikit-learn-contrib/forest-confidence-interval#35

A statement of need:
There is no clear statement of need is not provided in the paper. See scikit-learn-contrib/forest-confidence-interval#30

References:
Both references (wager_confidence_2014 and wager_randomforestci_2016) are missing a DOI. See scikit-learn-contrib/forest-confidence-interval#36

👍 many thanks @DannyArends. @kpolimis please let us know here when you've had a chance to respond/address the reviewer feedback.

@kpolimis please let us know here when you've had a chance to respond/address the reviewer feedback.

Friendly bump @kpolimis.

:wave: @kpolimis, happy new year. When do you think you might have a chance to respond/address to the reviewer's feedback here?

Hi Arfon,

Thanks for the note, happy new year to you as well. I apologize for not
being more responsive during the review process. We have started adding
additional computations from the original Wager et al. 2014 paper into the
package and I was focused on implementing those changes. I appreciate the
reviewer's time and will try to respond to their feedback within the next
two weeks.

Best,

Kivan

On Sat, Jan 7, 2017 at 7:32 PM, Arfon Smith notifications@github.com
wrote:

👋 @kpolimis https://github.com/kpolimis, happy new year. When do you
think you might have a chance to respond/address to the reviewer's feedback
here?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-271127200,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoG7QbuiAA3AToW6xtnkN3bFhm1Vqks5rQFjlgaJpZM4K8eZl
.

OK thanks for the update @kpolimis. Please let us know when you've wrapped up these changes.

:wave: @kpolimis - have you had a chance to review this feedback yet?

@kpolimis I pinged you by E-mail

Hi Arfon,

I've made the changes suggested by the reviewer in the paper and README.
The attached document contains the reviewer's original comments and my
responses are highlighted in yellow. Let me know if you have any questions.
Thanks for your support during this process.

Best,

Kivan

On Sun, Mar 12, 2017 at 12:52 PM, Arfon Smith notifications@github.com
wrote:

👋 @kpolimis https://github.com/kpolimis - have you had a chance to
review this feedback yet?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-285970847,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoPk5rC3Ft3GchWeBYoDHwTQtIgjjks5rlE0OgaJpZM4K8eZl
.

The attachment is not visible. Also can you take a look at the LICENSE file. Is it really a BSD-3? Github should recognize the license contents to display it like in the icon on the top right of https://github.com/lomereiter/sambamba

I've copied the BSD-3 clause License and changed the first line only, not
sure why the license is not being displayed like the repo you sent. Which
attachment are you speaking about?

On Mon, Mar 27, 2017 at 2:44 PM, Pjotr Prins notifications@github.com
wrote:

The attachment is not visible. Also can you take a look at the LICENSE
file. Is it really a BSD-3? Github should recognize the license contents to
display it like in the icon on the top right of
https://github.com/lomereiter/sambamba


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-289595665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoNnJmdDCjwG6HrXcaHbr4Ldn-Rjnks5rqC3ZgaJpZM4K8eZl
.

Thanks. It is probably not showing because the update is not on the master branch. The attachment you mentioned above with remarks in yellow...

The original software on which yours is based had an MIT license. I think it would be more appropriate to have that then to change it to BSD-3. Even so, the difference is minimal (BSD-3 forces you to show a copyright notice with a binary), so I think it is OK.

@DannyArends can you take a look at the improvements and state that you are happy with them? Use the following branch: https://github.com/kpolimis/forest-confidence-interval/tree/paper

Thanks for the feedback. I've updated the paper (attached) and revised the
License and README to reflect an MIT license. Lastly, my response to the
review is attached as a .docx. Let me know if you have any more questions.

Best,

Kivan

On Mon, Mar 27, 2017 at 6:26 PM, Pjotr Prins notifications@github.com
wrote:

Thanks. It is probably not showing because the update is not on the master
branch. The attachment you mentioned above with remarks in yellow...

The original software on which yours is based had an MIT license. I think
it would be more appropriate to have that then to change it to BSD-3. Even
so, the difference is minimal (BSD-3 forces you to show a copyright notice
with a binary), so I think it is OK.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-289634903,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoLxfy9Wp28XezsbVQMaMHQFbV8-7ks5rqGHhgaJpZM4K8eZl
.

👋 @DannyArends - would you mind taking another look at this submission?

Hey all,

I can look into the submission again this week.
However, I cannot see the attached response to reviewers _"Lastly, my response to the review is attached as a .docx"_

I will go through the issues I submitted and see if they are fixed

Gr,
Danny

Re-Review of scikit-learn-contrib/forest-confidence-interval

Still some issues with the version in the paper branch, also why haven't changes been pushed to the master branch of the scikit-learn-contrib repository ?

I have checked the boxes of the things which are now fixed, the remaining open issues are listed below.

Performance:
The software makes no performance claims, a performance comparison against the original R-code would be interesting to add. See scikit-learn-contrib/forest-confidence-interval#29

A statement of need:
There is no clear statement of need is not provided in the readme. See scikit-learn-contrib/forest-confidence-interval#30

Example usage:
The examples use two standard datasets from another package, however it is not explained in which format the users own data needs to be presented before it can be used. Please provide links to the relevant documentations.

The two examples do not show before and after plots of the dataset, as such it is not clear why the user should use this routine.

There is no description of what settings can be used in the calculation of inbag and unbiased variance and how these influence the results

See scikit-learn-contrib/forest-confidence-interval#31

Functionality documentation:
This package provides 2 functions to the user, which have been documented. However there are also 2 undocumented internal functions (_core_computation and _bias_correction), which should be documented. See scikit-learn-contrib/forest-confidence-interval#32

Community guidelines:
Please provide guidelines in the README file for others to:

Seek support

See scikit-learn-contrib/forest-confidence-interval#34

References:
Both references (wager_confidence_2014 and wager_randomforestci_2016) are missing a DOI. See scikit-learn-contrib/forest-confidence-interval#36

@kpolimis I think these are good suggestions and are easy to fix. You don't have to state anything about performance. We prefer to see comments in the issue tracker or your source tree so they remain visible over time. Docx may not be the best format to use for sharing. Not everyone likes using Microsoft formats.

Let's have a fast turnaround. This paper is overdue!

Thanks for the feedback, everyone. I'm defending my dissertation this
Friday and will try to turn around these revisions by 6/16.

Best,

Kivan

On Jun 6, 2017 11:52 PM, "Pjotr Prins" notifications@github.com wrote:

Let's have a fast turnaround. This paper is overdue!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-306703844,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoIPb9E61Xh1-chujv3iY3_XRHrLwks5sBki2gaJpZM4K8eZl
.

@kpolimis great and good luck!

@kpolimis - friendly reminder on this.

I've just emailed the submitting author to remind him about this submission. If he doesn't reply in the next two weeks we will reject this submission.

tried to address outstanding review issues in the following PR: https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/56

needed to rebase and created new PR for review: https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/58

@DannyArends can you take a final look?

Re-Re-Review of scikit-learn-contrib/forest-confidence-interval

I have checked the boxes of the things which are now fixed, but again there are some (5) remaining open major issues which according to me should be fixed before publication. Additionally 1 minor issue which may be ignored by the authors.

Major issues

Example usage:
1) One of the examples use a data set from the UC Irvine Machine Learning Repository, however the data set used was not produced by UC Irvine Machine Learning Repository, so please cite the paper which collected the data used:

"Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243."

In the paper this data set is WRONGLY attributed, please update the citation in the paper as well.

2) Again, the examples do not show before and after plots of the data set, as such it is not clear why the user should use this routine.

3) Additionally, there is no description of what settings can be used in the calculation of inbag and unbiased variance and how these influence the results. Additionally no explanation is given why the Y-error bars need to be SQRT transformed before plotting. The examples provided are meaningless to me (and users) without better documentation and explanation of the code used in the examples.

References:
4) A reference to the CAR MPG dataset is WRONGLY attributed to: @llichman_uci_2013

The UCI page that you link to provides the correct citation for the dataset:

"Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243"

5) There are no DOIs in the bib file belonging to the article, this is a requirement of the JOSS journal

Minor issue:

Performance:
The software makes no performance claims, a performance comparison against the original R-code would be interesting to add. See scikit-learn-contrib/forest-confidence-interval#29

Thank you @DannyArends for your precise review. I think this was your last iteration :) @kpolimis those final comments are valuable adjustments. If you fix these we publish!

On Tue, Nov 7, 2017 at 11:07 AM, Danny Arends notifications@github.com
wrote:

Re-Re-Review of scikit-learn-contrib/forest-confidence-interval

I have checked the boxes of the things which are now fixed, but again
there are some (6) remaining open major issues which according to me should
be fixed before publication. Additionally 1 minor issue which may be
ignored by the authors.

Major issues:

Example usage:

  1. One of the examples use a data set from the UC Irvine Machine
    Learning Repository, however the data set used was not produced by UC
    Irvine Machine Learning Repository, so please cite the paper which
    collected the data used:

"Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In
Proceedings on the Tenth International Conference of Machine Learning,
236-243."

In the paper this data set is WRONGLY attributed, please update the
citation in the paper as well.

1.

Again, the examples do not show before and after plots of the data
set, as such it is not clear why the user should use this routine.

Sorry - I am not sure I understand what is meant by before and after. Do
you mean a plot of the same data without error bars and then a plot with
error bars?

Does this PR address this issue?
https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/62

1.

Additionally, there is no description of what settings can be used in
the calculation of inbag and unbiased variance and how these influence the
results. Additionally no explanation is given why the Y-error bars need to
be SQRT transformed before plotting. The examples provided are meaningless
to me (and users) without better documentation and explanation of the code
used in the examples.

>

1.

Functionality documentation:

4) As mentioned before only 1/2 of the functions have documentation, the
internal functions are left undocumented for some reason beyond my
comprehension. Please document all of the functions in the code as required
by the JOSS journal

Wasn't this addressed in this PR:
https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/59?

References:

5) A reference to the CAR MPG dataset is WRONGLY attributed to:
@llichman_uci_2013

The UCI page that you link to provides the correct citation for the
dataset:

"Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In
Proceedings on the Tenth International Conference of Machine Learning,
236-243"

  1. There are no DOIs in the bib file belonging to the article, this is
    a requirement of the JOSS journal

Minor issue:

Performance:
The software makes no performance claims, a performance comparison against
the original R-code would be interesting to add. See
scikit-learn-contrib/forest-confidence-interval#29
https://github.com/scikit-learn-contrib/forest-confidence-interval/issues/29


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-342449981,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHPNufAqrg-hQJsPy4_auL-sAlYtJzhks5s0DoKgaJpZM4K8eZl
.

Dear @arokem

"Does this PR address this issue? scikit-learn-contrib/forest-confidence-interval#62"

  • Yes, that is enough.

However why is there a sqrt in the example code ?

line 59: np.sqrt(spam_V_IJ_unbiased[idx_ham])

"Wasn't this addressed in this PR: scikit-learn-contrib/forest-confidence-interval#59?"

  • Yes it was addressed, that's why I removed it from the review list, and removed it from my comment. However I now realize that the email send might have contained the old text. Since I copy pasted it and removed the issues that were fixed.

@DannyArends are you happy?

Re: "Additionally, there is no description of what settings can be used in the calculation of inbag and unbiased variance and how these influence the results. Additionally no explanation is given why the Y-error bars need to be SQRT transformed before plotting. The examples provided are meaningless to me (and users) without better documentation and explanation of the code used in the examples."

plotting with standard deviations addressed in: https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/63

DOIs are still missing from the citations, but all in all I'm Happy.

I checked all the remaining boxes.

Excellent. Thank you so much all of you! @arfon we can R&R.

@kpolimis - could you please merge this PR https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/64 and then make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

@arfon I merged https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/64 and created a DOI of the archive with Zenodo (https://doi.org/10.5281/zenodo.1044209)

@whedon set 10.5281/zenodo.1044209 as archive

OK. 10.5281/zenodo.1044209 is the archive.

@DannyArends - many thanks for your review here! ✨

@kpolimis - your paper is now accepted into JOSS and your DOI is https://doi.org/10.21105/joss.00124 ⚡️ 🚀 💥

thank you @DannyArends and @arfon for your help/suggestions throughout this process.

Hi Arfon,

I just checked the version of the paper that appears online (
http://www.theoj.org/joss-papers/joss.00124/10.21105.joss.00124.pdf) and it
is a previous edition of the paper and not the version from Zenodo (
https://zenodo.org/record/1044209#.WgTE-GhSzD4). Is it possible to update
the JOSS paper to the latest version on master/Zenodo?

Thanks,

Kivan

On Thu, Nov 9, 2017 at 8:21 AM, Arfon Smith notifications@github.com
wrote:

Closed #124 https://github.com/openjournals/joss-reviews/issues/124.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#event-1334307704,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG8CoDvPMCyo6Kkb7RcjgLsmhEtlT7GRks5s0yZ6gaJpZM4K8eZl
.

Sure. That should be fixed now.

On Thu, Nov 9, 2017 at 4:19 PM, Kivan Polimis notifications@github.com
wrote:

Hi Arfon,

I just checked the version of the paper that appears online (
http://www.theoj.org/joss-papers/joss.00124/10.21105.joss.00124.pdf) and
it
is a previous edition of the paper and not the version from Zenodo (
https://zenodo.org/record/1044209#.WgTE-GhSzD4). Is it possible to update
the JOSS paper to the latest version on master/Zenodo?

Thanks,

Kivan

Kivan Polimis
Postdoctoral Researcher
Università Commerciale Luigi Bocconi
Carlo F. Dondena Centre for Research on Social Dynamics and Public Policy
address: 5-a2-04, Via Guglielmö Röntgen
https://maps.google.com/?q=5-a2-04,+Via+Guglielm%C3%B6+R%C3%B6ntgen&entry=gmail&source=g
1 - 20136 Milano, Italy
B6ntgen,+1,+20136+Milano+MI,+Italy/@45.4507127,9.1852656,
17z/data=!4m17!1m7!3m6!1s0x4786c404480a59b9:0xe5f0cfe57a3877b!2sVia+
Guglielm%C3%B6+R%C3%B6ntgen,+1,+20136+Milano+MI,+Italy!3b1!
8m2!3d45.4507127!4d9.1874543!4m8!1m0!1m5!1m1!1s0x4786c404480a59b9:
0xe5f0cfe57a3877b!2m2!1d9.1874543!2d45.4507127!3e3?hl=en>
office: +39.02.5836.5378 <+39%2002%205836%205378>
email: kivan.[email protected]
kivanpolimis.com

On Thu, Nov 9, 2017 at 8:21 AM, Arfon Smith notifications@github.com
wrote:

Closed #124 https://github.com/openjournals/joss-reviews/issues/124.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
issues/124#event-1334307704>,
or mute the thread
AG8CoDvPMCyo6Kkb7RcjgLsmhEtlT7GRks5s0yZ6gaJpZM4K8eZl>
.

>


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/124#issuecomment-343294882,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGjvLl9cNQdkIiOdLAfu6LOoNmHMw1ydks5s02xggaJpZM4K8eZl
.

Thanks for the interesting submission @kpolimis! And @DannyArends for the great review!

Was this page helpful?
0 / 5 - 0 ratings