Joss-reviews: [REVIEW]: Multilocus sequence typing by blast from de novo assemblies against PubMLST

Created on 14 Nov 2016 · 20Comments · Source: openjournals/joss-reviews

Submitting author: @andrewjpage (Andrew Page)
Repository: https://github.com/sanger-pathogens/mlst_check
Version: v2.1.1630910
Editor: @pjotrp
Reviewer: @harmn
Archive: 10.6084/m9.figshare.4285097.v1

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056"><img src="http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056/status.svg)](http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

Conflict of interest

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v2.1.1630910)?
[x] Authorship: Has the submitting author (@andrewjpage) made major contributions to the software?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: Have any performance claims of the software been confirmed?

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Paper PDF: 10.21105.joss.00118.pdf

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)?

accepted published recommend-accept review

Source

whedon

All 20 comments

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS.

For a list of things I can do to help you, just type:

@whedon commands

whedon on 14 Nov 2016

👋 @harmn - this is where the review for this submission take place. @pjotrp can help you with any questions you might have.

arfon on 16 Nov 2016

I can confirm that the program works as described, on real world data and meets all of the requirements stated here.

happykhan on 18 Nov 2016

Thanks @happykhan
If you need some alternative suggestions for reviewers, heres some researchers who have written MLST code & are active on GitHub (and I have never published with):

@lskatz (CDC - Centers for Disease Control) https://github.com/lskatz/lyve-MLST
@aunderwo (PHE - Public Health England) https://github.com/phe-bioinformatics/MOST
@jacarrico (Universidade de Lisboa, Portugal) https://github.com/jacarrico/MLSTtrimm & ReMatCh

andrewjpage on 22 Nov 2016

Relax @andrewjpage. @harmn has agreed to review.

pjotrp on 23 Nov 2016

indeed, looking for some time to get on it.

Harm

On 23 Nov 2016, at 10:35, Pjotr Prins <[email protected]notifications@github.com> wrote:

Relax @andrewjpagehttps://github.com/andrewjpage. @harmnhttps://github.com/harmn has agreed to review.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/openjournals/joss-reviews/issues/118#issuecomment-262468060, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABZuOe8q1GPU3GY44i4oq1y0rbqx0WCeks5rBAjTgaJpZM4Kxz06.

harmn on 24 Nov 2016

I reviewed mlst_check: it installs and runs. Some issues/comments below, in general the documentation is a bit limited.

Installation instructions
Installation works fine on Ubuntu 16.04.1 LTS, but it does require root access/sudo rights or a docker installation. Might be good to mention that.

Example usage
I could not find the sample files described in the README.md file for the docker container (/data/sample1.fa etc.). I used the Salmonella files in "mlst_check/example/input_data" to test the software.

Documentation
The used methods are poorly described (as far as I can tell), for instance I had to look in the docstrings to find out what is used to find the best hit: "The best hit has the greatest number of matching bases. If two hits have the same number of matching bases, the one with the greater percentage identity is selected."

I had to search the code for the meaning of the tilde in the ST column, "Add tilde to matches which are not 100%". Would be good to add that to the README.md.

What is exactly expected in the input FASTA files (I assume complete genomes)?
What is the output if your sequences have no hits with the selected scheme?

Automated tests
There are quite a number test scripts, but I could not find any description of the tests. I ran all and found no problems.

References
Page et al. does not have a doi (should be http://dx.doi.org/10.1099/mgen.0.000083)

harmn on 26 Nov 2016

@andrewjpage Do you mind addressing mentioned issues?

pjotrp on 27 Nov 2016

@harmn Thanks for the comprehensive review.
@pjotrp Yes I will address all of the issues, but it will take me a few days.

andrewjpage on 28 Nov 2016

Hi @harmn
Thanks for reviewing this submission.

Installation instructions
Installation works fine on Ubuntu 16.04.1 LTS, but it does require root access/sudo rights or a docker installation. Might be good to mention that.

I have clarified that root access is needed, and also added instructions for installing with HomeBrew/LinuxBrew (which doesn't need root).

Example usage
I could not find the sample files described in the README.md file for the docker container (/data/sample1.fa etc.). I used the Salmonella files in "mlst_check/example/input_data" to test the software.

I have added test files to the docker container so that the example command works out of the box.

Documentation
The used methods are poorly described (as far as I can tell), for instance I had to look in the docstrings to find out what is used to find the best hit: "The best hit has the greatest number of matching bases. If two hits have the same number of matching bases, the one with the greater percentage identity is selected."

I had to search the code for the meaning of the tilde in the ST column, "Add tilde to matches which are not 100%". Would be good to add that to the README.md.

What is exactly expected in the input FASTA files (I assume complete genomes)?
What is the output if your sequences have no hits with the selected scheme?

I have extended README documentation to clarify the methods used, the meaning of the various notations and extended the examples.

Automated tests
There are quite a number test scripts, but I could not find any description of the tests. I ran all and found no problems.

By default the tests produce minimal output when running, but can be run in a more verbose mode 'dzil test --test-verbose' which gives a text string for each test. I have improved the messages coming out, and added more descriptive text to blocks of tests to make it clearer whats going on. Also I have set TravisCI to run the tests in verbose mode so that its easier to see.

References
Page et al. does not have a doi (should be http://dx.doi.org/10.1099/mgen.0.000083)

Thanks this has been corrected.

andrewjpage on 30 Nov 2016

@andrewjpage nice work
@pjotrp happy to accept this

harmn on 1 Dec 2016

👍1

@arfon ready to accept.

pjotrp on 4 Dec 2016

👍1

@andrewjpage - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

arfon on 4 Dec 2016

I've added it to Figshare as: https://dx.doi.org/10.6084/m9.figshare.4285097.v1

andrewjpage on 5 Dec 2016

@whedon set 10.6084/m9.figshare.4285097.v1 as archive

arfon on 5 Dec 2016

OK. 10.6084/m9.figshare.4285097.v1 is the archive.

whedon on 5 Dec 2016

Many thanks for the review @harmn and for editing this one @pjotrp.

@andrewjpage - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00118 ⚡️ 🚀 💥

arfon on 5 Dec 2016

🎉1

There are unformatted citations in the paper @arfon @pjotrp
[Seeman2016; Jolley2010]

genomematt on 15 Dec 2016

Sorry Matt, I've fixed it now in the original markdown file.

andrewjpage on 15 Dec 2016

Thanks for the heads up @genomematt - this should be fixed now.

arfon on 15 Dec 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[REVIEW]: IBCAO_py: A matplotlib library for using the International Bathymetric Chart of the Arctic Ocean with cartopy and matplotlib

whedon · 12Comments

[REVIEW]: gwdegree: A Shiny App to Aid Interpretation of Geometrically-Weighted Degree Estimates in Exponential Random Graph Models

whedon · 8Comments

[PRE REVIEW]: Kindel: indel-aware consensus for nucleotide sequence alignments

whedon · 12Comments

[REVIEW]: PyGDH ("pigged"): Python Grid Discretization Helper

whedon · 11Comments

[REVIEW]: cartography: Create and Integrate Maps in your R Workflow

whedon · 12Comments