Joss-reviews: [REVIEW]: yenpathy: An R Package to Quickly Find K Shortest Paths Through a Weighted Graph

Created on 26 Sep 2019  Â·  39Comments  Â·  Source: openjournals/joss-reviews

Submitting author: @toph-allen (Toph Allen)
Repository: https://github.com/ecohealthalliance/yenpathy
Version: 1.0.0
Editor: @kyleniemeyer
Reviewer: @corybrunson, @szhorvat
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/998c20393d946000551fea78959c5343"><img src="https://joss.theoj.org/papers/998c20393d946000551fea78959c5343/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/998c20393d946000551fea78959c5343/status.svg)](https://joss.theoj.org/papers/998c20393d946000551fea78959c5343)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@corybrunson & @szhorvat, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kyleniemeyer know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @corybrunson

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the repository url?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Contribution and authorship: Has the submitting author (@toph-allen) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [ ] Functionality: Have the functional claims of the software been confirmed?
  • [ ] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • [ ] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @szhorvat

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the repository url?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Contribution and authorship: Has the submitting author (@toph-allen) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [ ] Functionality: Have the functional claims of the software been confirmed?
  • [ ] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • [ ] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • [ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
review withdrawn

All 39 comments

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @corybrunson, @szhorvat it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...

👋 @toph-allen @corybrunson @szhorvat the actual review will take place in this issue

@kyleniemeyer Since it appears that this package was born partly because R/igraph could not satisfy a need, I should disclose that I am an active contributor to the C core of igraph (not the R interface).

I would have liked to see this functionality be contributed directly into the igraph C core (this is not meant as a criticism though!), and we _might_ add similar functionality to igraph after getting inspired by yenpathy, so that it could be used by igraph interfaces in other languages as well.

@toph-allen

Minor language comments on the paper:

In "Summary", "Langauge" -> "Language"

"Csardi" -> "Csárdi"

In the Performance section, you use quotation marks on "simple" in _"simple" paths_. This is in fact standard graph theory terminology.

Added comments on the types of graphs that the function handles: https://github.com/ecohealthalliance/yenpathy/issues/5

@szhorvat Thanks for the text edits and the detailed comments on the issue! I'll take a look in greater depth in the next day or so.

Hi @toph-allen, thanks for building the package! I was unfamiliar with Yen's algorithm and am enjoying seeing it play out.

The software paper is rich with examples but begins with a minimal justificatory paragraph. This is appropriate to the light weight of the package. I have a minor suggestion: to change "a method to find the _k_ shortest paths through a network" to "a method to find the shortest several paths through a network, up to any number _k_". (To be fair, my own awkward reading at first was as a specialist in a different area, where important parameterized concepts are often called "_k_-things", and i was wondering what a "_k_-shortest path" was.)

Hi again @toph-allen, i'm having trouble reproducing the benchmark analysis at the end of the software paper, specifically the dramatic increase in shortest_paths() runtime with mode = "all". In case it's due to a differently sampled random graph, is there a standard or empirical graph you've demonstrated the comparison on?

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(yenpathy)
set.seed(42)
network <- sample_smallworld(1, 20, 4, .05)
bench::mark(
  shortest_paths(network, from = 1, to = 10),
  k_shortest_paths(network, from = 1, to = 10, k = 1),
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression                                            min median
#>   <bch:expr>                                          <bch> <bch:>
#> 1 shortest_paths(network, from = 1, to = 10)          209µs  231µs
#> 2 k_shortest_paths(network, from = 1, to = 10, k = 1) 526µs  574µs
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>,
#> #   `gc/sec` <dbl>
network <- sample_smallworld(1, 8, 3, .05)
bench::mark(
  shortest_paths(network, from = 1, to = 5, mode = "all"),
  k_shortest_paths(network, from = 1, to = 5, k = 1),
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression                                                min median
#>   <bch:expr>                                              <bch> <bch:>
#> 1 shortest_paths(network, from = 1, to = 5, mode = "all") 207µs  224µs
#> 2 k_shortest_paths(network, from = 1, to = 5, k = 1)      424µs  444µs
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>,
#> #   `gc/sec` <dbl>

Created on 2019-10-01 by the reprex package (v0.2.1)

Additionally, i think it would better demonstrate your point to use k = 6 or some other non-1 number (depending on network) in the second benchmark test, since this is the use case described in the text.

@kyleniemeyer apologies; i meant to move my previous comment to an issue on the repo, to avoid clutter here in the review thread, and i'll do so now.

@corybrunson no worries, it's fine either way

@corybrunson I think your suggested change for the first paragraph is definitely clearer, and will push that change up. I'm going to discuss your other comments on the issues in the repo. Thanks!

@toph-allen cool, glad it suits you. I may not respond to updates this workweek but i'll be back to the review this weekend.

@kyleniemeyer

What is the procedure to follow when a reviewer requests changes? Should I write up the review as-is for now (to complete it on time), or should I wait until the authors address the concerns?

Since the review is open, I assume that the review will keep going continuously until all concerns are addressed (even if this may take a long time). Let me know if this is not how it works.

@szhorvat yes, the review process is meant to be iterative and keep going until concerns are addressed—but this isn't meant to work as slowly as a traditional process where the authors get all reviewer comments at once, prepare a rebuttal, and repeat, but instead more like a conversation.

Feel free to leave comments here or in the software repo as you work through your review. Hope that answers your question.

@corybrunson wrote:

(To be fair, my own awkward reading at first was as a specialist in a different area, where important parameterized concepts are often called "k-things", and i was wondering what a "k-shortest path" was.)

I know exactly what you mean. An easy fix (to avoid this misunderstanding) would be to use n instead of k.

The first stage of my review is complete. I am now waiting for the comments to be addressed.

Summary

This R package provides a function to find the k shortest paths between two given vertices of a graph. It relies on an existing C++ library. The main contribution is exposing the functionality to R, and making it convenient to use from R.

Ease of use and integration with R

I do not use R often, and I do not feel qualified to judge how well integrated the function is. However, @corybrunson has already written multiple comments on this.

Installation was easy on macOS (the only platform where I tested it), but it did require compiling from source. It may be useful to mention what is required for a successful installation (e.g. Xcode being installed on macOS). I am also wondering how easy it would be to get it working on Windows, where a compiler isn't typically available.

Functionality

A typical issue with software that works with graphs is that the authors did not consider all types of graphs which could be given as input. I opened two issues regarding this:

These should be addressed in some way before publishing the package.

Documentation

The paper contains more examples than the documentation. However, users will read the docs, not the paper. I recommend expanding the docs with these examples.

It would be useful to illustrate common tasks such as:

  • getting the path in terms of edges (with their weights)
  • getting the total length for each path

On a related note:

Have you considered returning a data structure which does not simply contain the paths as lists of vertices, but also lists of edges (e.g. edge indices = dataframe row indices), as well as the total length of each path? R/igraph functions typically return all the information that was already computed.

Minor comments

  • It would be useful to mention the time complexity of this implementation (in terms of the number of vertices and edges).

  • In the docs,

    edge_penalty | A constant to be added to THE WEIGHT OF each edge

  • igraph should be spelt as "igraph", not "iGraph"

  • When solving combinatorial problems that may have a very large number of solutions, it is often useful to have some sort of "streaming" or "iterator" interface. I.e., once we found k paths and examined them, we may decide that we want the k+1th path. A "streaming interface" would make it possible to do this without having to re-run the entire function. (Of course, this is a minor point, and I do not expect that you will implement this.)

@szhorvat Thanks for the review! It's helpful to receive comments from someone with more experience in this field than I do.

We'll make the minor copy changes you suggest.

Responding to your other comments:

  • In R, the general convention is for the top-level README.md to contain a high-level overview, with more elaborate examples going in the package vignettes. This package contains one vignette which is at parity with the paper. We could add another vignette with the examples you suggest. We can also refer users to the examples from the README.md.
  • We'll investigate the bug reports you filed.
  • We considered adding additional functionality to return the length of the paths. During our internal development, we tried to keep the functionality as light-weight and simple as possible for the sake of getting it done fast, but returning the length of the paths seems like it would be a useful feature.
  • I can definitely see how a streaming interface would be useful, but it would probably require significant changes to the C++ code, which we have till now avoided rewriting significantly.

[Edit] Although using "n shortest paths" might eliminate some confusion, "k shortest paths" seems to be the standard way of referring to this problem, and using _n_ might cause more confusion.

@toph-allen

This package contains one vignette which is at parity with the paper.

Thanks for the explanation! Could you let me know how to access it? I was using the R Studio interface to browse the documentation and I do not see it there (for other packages I do). I also tried

library(yenpathy)
vignette(package='yenpathy')

but it says no vignettes found.

I am not experienced with R.

Hi @szhorvat and @toph-allen, something i required some time to figure out is that R package vignettes may only become available when a package is installed with a specific flag for it:

# install and build vignettes
remotes::install_github("ecohealthalliance/yenpathy", build_vignettes = TRUE)
# list available vignettes
vignette(package = "yenpathy")
# open specific vignette
vignette("using-yenpathy")

Edit: I would recommend adding build_vignettes = TRUE to the installation instructions on the README.

Thanks @corybrunson, it works this way. Indeed it would be nice to add it to the installation instructions.

I added the vignette-building argument and viewing instructions to README.Rmd.

I also used pkgdown to create a GitHub Pages site which includes the README, R documentation, and vignettes, which you can view here, and added a link to this site to the package's README.md.

@toph-allen @kyleniemeyer I am just giving advance notice that I may not be able to respond or comment here during most of November (travelling in a place where GitHub is blocked).

Sorry for my going awol on this thread — I had a few deadlines come up which have made working on this tough. I’m going to try to begin addressing some of the issues in the next week.

On Oct 21, 2019, at 8:36 AM, Szabolcs Horvát notifications@github.com wrote:

@toph-allen https://github.com/toph-allen @kyleniemeyer https://github.com/kyleniemeyer I am just giving advance notice that I may not be able to respond or comment here during most of November (travelling in a place where GitHub is blocked).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/1766?email_source=notifications&email_token=AAXZGPHZ5DNJOYVXT6KZUULQPWO6NA5CNFSM4I23M522YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB2FIWY#issuecomment-544494683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXZGPDABORQDDHJGBKE2JTQPWO6NANCNFSM4I23M52Q.

Hi @toph-allen how is your work coming along? Just a quick check in here.

@kthyng Thanks for checking in. I've made some minor progress, but some unexpected work deadlines have meant that making changes to Yenpathy have had to take a back burner. I'll do my best to have some commits pushed this week.

@toph-allen Ok. If you want to keep working now, great. Another option is that we can pause this submission for now and then you can ping here when you're ready to pick it back up again. Let us know either way, please.

@kthyng Thanks for clarifying. After discussing with the package's other contributor, we've decided that it's best to pause the submission. We'll check back in the new year, when things have calmed down a little, and let you know when we're ready to continue work.

Thanks again for being flexible. 😅

It's now been a couple more months - is there any news on this?

👋 @toph-allen - It's now been another month - is there any news on this?

Hi @danielskatz. Sorry for my slow response; thank you for your patience.

I'll be able to respond to the issues in the next week. I'll reply to the individual issue threads on the yenpathy repo, and will post here when it's done. I'll raise any other questions I have here.

Dear authors and reviewers

We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.

Thanks in advance for your understanding.

_Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team._

Hi @toph-allen, it's been a little while and I just wanted to check in. Have you been able to make progress on the issues raised?

Sorry, this email slipped down in my inbox. We decided to withdraw Yenpathy from submission to JOSS — it seems like it's a little under the scope of what qualifies for the journal, and the excellent feedback from the reviewers suggest that to bring it up to snuff, we'd have to do a fair amount of reworking of how it approaches the problem.

The project started life as a kinda hacky way to accomplish something we needed — that is, find an arbitrary number of shortest paths quickly through a large directed graph. When I wrote it, I was able to make a lot of assumptions about the structure of the graph, and didn't need to be as rigorous handling different kinds of edge cases.

Adding analysis of graphs passed in to the package's functions is outside of the scope of our original intent. Furthermore, to properly debug the code, I'd need to work in more depth with the C++ library we're wrapping, and that's beyond my abilities. I think it makes more sense for the package to stand "as-is".

Thanks again to JOSS folks and the reviewers — even if the package isn't getting submitted to the journal, your feedback and even just preparing it for submission has improved the quality of the package and its documentation.

@toph-allen Thanks for the update, and I do understand.

Thanks also to @corybrunson and @szhorvat for your feedabck!

@whedon withdraw

Paper withdrawn.

Was this page helpful?
0 / 5 - 0 ratings