Submitting author: @MAnalytics (Monsuru Adepeju)
Repository: https://github.com/MAnalytics/akmedoids
Version: v0.1.5
Editor: @karthik
Reviewer: @karthik, @njtierney
Archive: Pending
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status badge code:
HTML: <a href="https://joss.theoj.org/papers/6b1b3bbe3259b3a3a42c4024de638a08"><img src="https://joss.theoj.org/papers/6b1b3bbe3259b3a3a42c4024de638a08/status.svg"></a>
Markdown: [](https://joss.theoj.org/papers/6b1b3bbe3259b3a3a42c4024de638a08)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
@karthik & @njtierney , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @karthik know.
✨ Please try and complete your review in the next six weeks ✨
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @karthik, @njtierney it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary:
OK DOIs
- 10.1007/s00180-009-0178-4 is OK
- 10.1007/s10940-014-9228-3 is OK
- 10.1111/j.1745-9125.2004.tb00541.x is OK
- 10.1007/s10940-016-9301-1 is OK
- 10.4324/9781315089256-19 is OK
- 10.1177/1088767909337701 is OK
- 10.1093/bjc/azx022 is OK
- 10.1016/0167-9473(92)90042-e is OK
- 10.1080/03610917408548446 is OK
- 10.1016/0377-0427(87)90125-7 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@whedon generate pdf
Just touching base here, I'll review this article this week!
Just touching base here, I'll review this article this week!
Thank you very much. I appreciate it.
An update: I am in the process of review right now, but I have a few suggestions, collating these together will take me a bit longer than anticipated, I'll touch base later this week.
An update: I am in the process of review right now, but I have a few suggestions, collating these together will take me a bit longer than anticipated, I'll touch base later this week.
@karthik @njtierney I am looking forward to this. Regards.
Here are some general heuristic checks. Particularly problematic issues are using 1:nrow
and importing full packages.
── Good Practice akmedoids
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 0% of code lines are covered by test cases.
R/akmedoids.clust.R:31:NA
R/akmedoids.clust.R:35:NA
R/akmedoids.clust.R:36:NA
R/akmedoids.clust.R:40:NA
R/akmedoids.clust.R:43:NA
... and 658 more lines
✖ use '<-' for assignment instead of '='. '<-' is the
standard, and R users and developers are used it and it is easier
to read your code for them if you use '<-'.
R/akmedoids.clust.R:77:8
R/akmedoids.clust.R:83:6
R/akmedoids.clust.R:245:14
R/akmedoids.clust.R:272:23
R/akmedoids.clust.R:295:23
... and 22 more lines
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters
wide. Try make your lines shorter than 80 characters
R/akmedoids.clust.R:3:1
R/akmedoids.clust.R:4:1
R/akmedoids.clust.R:5:1
R/akmedoids.clust.R:6:1
R/akmedoids.clust.R:7:1
... and 106 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
result 1:0 if the expression on the right hand side is zero. Use
seq_len() or seq_along() instead.
R/akmedoids.clust.R:77:10
R/akmedoids.clust.R:81:22
R/akmedoids.clust.R:89:34
R/dataImputation.R:109:8
R/outlierDetect.R:67:11
... and 16 more lines
✖ not import packages as a whole, as this can cause name
clashes between the imported packages. Instead, import only the
specific functions you need.
✖ avoid 'T' and 'F', as they are just variables which are
set to the logicals 'TRUE' and 'FALSE' by default, but are not
reserved words and hence can be overwritten by the user. Hence,
one should always use 'TRUE' and 'FALSE' for the logicals.
R/statPrint.R:NA:NA
Please add a language field to your DESCRIPTION file
Please check your package for spelling mistakes. Not all of these are mistakes but I see a few.
Potential Spelling mistakes
DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
WORD FOUND IN
’Whitespaces’ wSpaces.Rd:19
akmedoids-vignette.Rmd:365
agentlans elbowPoint.Rd:21
al akmedoids.clust.Rd:39
statPrint.Rd:27,57
description:2
README.md:7
akmedoids-vignette.Rmd:358,428,526,567,605
alphaLabel alphaLabel.Rd:19
analysing README.md:3
Appl akmedoids.clust.Rd:40
behavioural akmedoids-vignette.Rmd:126
Calinski akmedoids.clust.Rd:18,24,41
description:6
Caliński README.md:104
categorise statPrint.Rd:27
akmedoids-vignette.Rmd:536
centroids akmedoids.clust.Rd:27
charater statPrint.Rd:19
Commun akmedoids.clust.Rd:41
complementarity akmedoids-vignette.Rmd:504
Comput akmedoids.clust.Rd:40
corresp dataImputation.Rd:17,18,19,20,26
customised statPrint.Rd:43
akmedoids-vignette.Rmd:578
denomin rates.Rd:14
doi description:2,7,10
et akmedoids.clust.Rd:39
statPrint.Rd:27,57
description:2
README.md:7
akmedoids-vignette.Rmd:358,428,526,567,605
GBTM akmedoids-vignette.Rmd:126
Genolini akmedoids.clust.Rd:39
description:2
ggplot description:9
github elbowPoint.Rd:21
Harabasz akmedoids.clust.Rd:24,41
README.md:104
Harabatz akmedoids.clust.Rd:18
description:6
http akmedoids.clust.Rd:39
imput akmedoids-vignette.Rmd:194
initialisation akmedoids.clust.Rd:14
akmedoids-vignette.Rmd:126
jss description:2
jstatsoft akmedoids.clust.Rd:39
judgement akmedoids-vignette.Rmd:504
kmeans README.md:7
akmedoids-vignette.Rmd:126
Kmedoids title:1
kml akmedoids.clust.Rd:39
knitr akmedoids-vignette.Rmd:428
maximisation akmedoids-vignette.Rmd:126
md NEWS.md:2
medoid statPrint.Rd:25,27
akmedoids-vignette.Rmd:526,534,536,567
minimised akmedoids-vignette.Rmd:126
minimises akmedoids-vignette.Rmd:126
modelling akmedoids-vignette.Rmd:126
normalise akmedoids-vignette.Rmd:262
optimality akmedoids.clust.Rd:18
param akmedoids.clust.Rd:21
png akmedoids-vignette.Rmd:428
propotion props.Rd:14
quartile statPrint.Rd:27
recognise akmedoids-vignette.Rmd:194
recognises akmedoids.clust.Rd:12
dataImputation.Rd:12
outlierDetect.Rd:13
props.Rd:12
removeRowsN.Rd:12
statPrint.Rd:23
akmedoids-vignette.Rmd:163
removeRowNA NEWS.md:13
Rousseeuw akmedoids.clust.Rd:40
description:5
README.md:102
sep akmedoids-vignette.Rmd:428
showplots NEWS.md:12
spacebar wSpaces.Rd:19
akmedoids-vignette.Rmd:365
statPrints NEWS.md:12
traj rates.Rd:12,14
vartheta akmedoids-vignette.Rmd:526
ve akmedoids-vignette.Rmd:567
victimisation akmedoids-vignette.Rmd:126
visualisation statPrint.Rd:43
description:8
akmedoids-vignette.Rmd:578
visualise akmedoids-vignette.Rmd:362
Wickham description:9
www akmedoids.clust.Rd:39
Your tests are on a folder called test
NOT tests
but uses testthat
infrastructure. How are you running the tests? Also your tests are not passing and barely cover any of the code. Please fix your tests and coverage before we can proceed.
@karthik Thanks for the efforts. I will address all these issues asap.
Dear @karthik @njtierney
This is to inform you that I have now addressed all the issues raised above.
I know you must be very busy, especially during this time period, I will greatly appreciate it if you can promptly review these corrections so that we can proceed to the next stage.
Thank you very much.
MA
Summary of revisions:
Some of these issues still remain. Here are a few examples.
https://github.com/MAnalytics/akmedoids/blob/master/R/akmedoids.clust.R#L153
https://github.com/MAnalytics/akmedoids/blob/master/R/akmedoids.clust.R#L167
https://github.com/MAnalytics/akmedoids/blob/master/R/dataImputation.R#L175
@karthik I really appreciate your prompt review.
The highlighted issues have now been resolved. Thanks, MA
There are still several more of these issues. Please look through your code where you do 1:nrow or 1:ncol. If the dimensions are 0 you will get silent errors that will go uncaught.
https://github.com/MAnalytics/akmedoids/blob/master/R/statPrint.R#L126
https://github.com/MAnalytics/akmedoids/blob/master/R/statPrint.R#L210
https://github.com/MAnalytics/akmedoids/blob/master/R/statPrint.R#L261
https://github.com/MAnalytics/akmedoids/blob/master/R/statPrint.R#L321
@karthik I have gone through all the functions and corrected these issues accordingly.
Thanks, MA
Running googpractice::gp()
I found a few missed cases:
I note that 1:length
is used here:
https://github.com/MAnalytics/akmedoids/blob/master/tests/testthat/test_rates.R#L19
FYI There are minor also some style changes that can be fixed
;
But I can resume my review from here, these are minor things. I would recommend running googpractice::gp()
@njtierney Thank you very much for the suggestion of googpractice::gp( )
I have now used the package to fix all the issues.
Thanks.
@njtierney @karthik All long code issues have also been fixed. In other words, all code lines are now 80 or less character wide.
I hope to hear from you soonest.
Thanks, MA
@karthik @njtierney Please, can I have a rough idea of when the review will be completed? We are using this work to cite your journal in another publication.
Hope to hear from you soon.
MA
@MAnalytics I appreciate your enthusiasm and desire to see this review completed asap. But please let us complete our review as we manage other commitments during the pandemic. We'll update the thread when ready. There is no need to reply constantly on this thread until action is needed. Thanks for your patience.
Hi all! I am the rotating associate editor in chief this week. I see that nothing has happened on this review for over a month. I'd like to check in and see where we are at. My brief perusal of this review issue looks like items have been brought up by reviewers @karthik and @njtierney and responded to be the author @MAnalytics. However, many review boxes remain unchecked. What is the next step in this review?
Thanks for checking in @kthyng. It's been on our list to finish this. We'll complete the reviews this week. Appreciate the nudge. :pray:
@MAnalytics Can you please add these files
Non-standard files/directories found at top level:
'cran-comments.md' 'paper.bib' 'paper.md'
to your ..Rbuildignore
Please fix devtools installation instructions in your vignette. Currently it is:
devtools::install_github("manalytics/packages/akmedoids")
(remove packages
).
@whedon check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1007/s00180-009-0178-4 is OK
- 10.1007/s10940-014-9228-3 is OK
- 10.1111/j.1745-9125.2004.tb00541.x is OK
- 10.1007/s10940-016-9301-1 is OK
- 10.4324/9781315089256-19 is OK
- 10.1177/1088767909337701 is OK
- 10.1093/bjc/azx022 is OK
- 10.1016/0167-9473(92)90042-e is OK
- 10.1080/03610917408548446 is OK
- 10.1016/0377-0427(87)90125-7 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Please also cite R in the text "A full demonstration is provided in the package vignette of how to deploy Akmedoids
to examine long-term relative exposure to crime in R
"
R citation: https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R
@MAnalytics Once these minor issues are fixed and @njtierney signs off, I'm good to finish up the final steps.
@karthik @njtierney Thanks.
I have now addressed these highlighted issues.
MA.
@whedon check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1007/s00180-009-0178-4 is OK
- 10.1007/s10940-014-9228-3 is OK
- 10.1111/j.1745-9125.2004.tb00541.x is OK
- 10.1007/s10940-016-9301-1 is OK
- 10.4324/9781315089256-19 is OK
- 10.1177/1088767909337701 is OK
- 10.1093/bjc/azx022 is OK
- 10.1016/0167-9473(92)90042-e is OK
- 10.1080/03610917408548446 is OK
- 10.1016/0377-0427(87)90125-7 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@njtierney Please, what step should I take next? Thanks.
@MAnalytics I have checked in with Nick and he will respond in the next few days.
@karthik @njtierney Thank you very much. I look forward to it.
Thank you for your patience everyone, here is my review.
I have a few concerns regarding the functionality of the software and how easy it is to use for a data analysis. I appreciate that perhaps these points might at first appearance appear to be superficial, however I think that making these improvements will make the user experience better, and improve the package overall - I myself would love to use some of this work in my longitudinal data analysis, however at the moment I think that there are some improvements that can be made to the code.
I am very happy to provide more pointers/advice/detail if you have any questions about my review or comments on the code.
Thank you again for your submission, I enjoyed reading it, and appreciate your strong effort in making this project open source.
[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[ ] Contribution and authorship: Has the submitting author ([@MAnalytics]) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
geoMADE
, but this does not appear to be an account (https://github.com/geoMADE), so I assume that these commits are from @MAnalytics and are named as geoMADE somehow.[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
One common theme I've seen in this package is that there is a lot of code being printed out to the console when functions are being run, and there is no way to silence this. For example:
```{r example-clust}
example_clust <- akmedoids.clust(all_traj, k=c(3,10), crit = "Silhouette")
[1] "solution of k = 3 determined!"
[1] "solution of k = 4 determined!"
[1] "solution of k = 5 determined!"
[1] "solution of k = 6 determined!"
[1] "solution of k = 7 determined!"
[1] "solution of k = 8 determined!"
[1] "solution of k = 9 determined!"
[1] "solution of k = 10 determined!"
[1] "Suggested optimal solution contains 3 clusters. See the plot for further examination!"
These print messages are useful, but there should be a `verbose` or `quiet` argument to suppress these. This information should be also stored in the output of the object.
This function also opened a graphic window and plot. There is no way for the user to suppress this. The output should be stored in the object, and if you really want to show the plot to uses, have an option to print the plot when running the function, perhaps with an argument like `show_plot = TRUE`.
Reading the documentation of the package, the function returns different output based on the value of K. The documentation reads:
> #' @return If \code{k} is a vector of two numbers (see param.
#' \code{k} details above), the output is a graphical plot of
#' the quality scores of the cluster solutions. If \code{k} is
#' an exact integer number of clusters, the function returns
#' trajectory labels indicating the group membership of the
#' corresponding trajectory in the \code{traj} object.
I think that this makes the software harder to use, as there is an implied workflow that could be made simpler and easier to use.
It seems then that the next step is to run the same code again but with `k` = 3. I am not sure why this needs to be rerun, shouldn't this be stored as the output? For example, a vector of classes for each value of `k`.
Running the function again with `k = 3`, I get the following:
```{r example-clust-3}
example_clust_3 <- akmedoids.clust(all_traj, k=c(3), crit = "Silhouette")
example_clust_3
Finally to summarise the information we run:
```{r}
clustr <- akmedoids.clust(all_traj, k=3)
clustr <- as.vector(clustr$memberships)$alphabetic_Labels
descrStats <- statPrint(clustr, all_traj,
id_field = FALSE, showplots=FALSE)
print(descrStats)
I believe that this can be improved in a few ways.
Firstly, writing
```r
clustr <- akmedoids.clust(all_traj, k=3)
clustr <- as.vector(clustr$memberships)$alphabetic_Labels
should be avoided as you are overwriting the object.
Secondly, the user should not need to perform special data manipulation (such as accessing a nested list with (as.vector(clustr$memberships)$alphabetic_Labels
) in order to summarise the data.
This could be improved by writing a special print
method for the software. This would mean that printing clustr
out to the console, e.g.,
clustr <- akmedoids.clust(all_traj, k=3)
clustr
Outputting the clustr
object in the console, it could return the information that statPrint
returns. You can read more about creating custom print
methods here - https://adv-r.hadley.nz/s3.html, https://blog.earo.me/2019/11/03/practical-guide-to-s3/.
It seems that there is a workflow implied:
k
with akmedoids.clust
k
akmedoids.clust
againstatPrint
I think that this could be improved in the following way.
I suggestion is the output of this function should contain possible memberships for
I also note that statPrint
is noted as performing two tasks - (1) generating statistics of groups, and (2), creating plots.
In my opinion the output of statPrint
and akmedoids.clust
functions should be separated into two functions, or at least provide options for the plots to be turned off. If they are turned into functions, I recommend:
I have some other comments on the functions in this package:
akmedoids::alphaLabel()
akmedoids::dataImputation()
akmedoids::elbowPoint()
akmedoids::outlierDetect()
akmedoids::props()
akmedoids::rates()
akmedoids::removeRowsN()
na.omit()
?akmedoids::wSpaces()
trimws
function in R?akmedoids::statPrint()
print
methods, which can be problematic when trying to include them in documentsno way to silence console output - suggest adding a quiet
or verbose
option, and to use message
when printing messages to the console.
rates
, akmedoids.clust
functions[X] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Yes.
The statement in the README says:
identify clusters of trajectories with similar long-term linear trends over time, providing an improved cluster identification as compared with the classic kmeans algorithm
Yes
Although the author provides instructions in the README, but these contain no output of the function. This should be updated to make the README a
README.Rmd
function. Seeusethis::use_readme_rmd()
as an approach on doing this.
In addition, the README simulates some example data for use - this should be encapsulated as an example dataset in the package that can be loaded.
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
example documentation has unnecessary traj <- traj
line in "examples"
The datasets are not described in the helpfiles, and state that they are matrices when they are data frames. This data should be documented more thoroughly (see https://r-pkgs.org/data.html#documenting-data for details on how to document data, and https://github.com/jforbes14/eechidna/blob/master/R/data.R for an example documentation of data).
[X] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
There are tests
[X] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[X] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[X] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[X] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[X] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@njtierney Thank you very much for taking the time to provide the above reviews. I will address the issues highlighted asap. Thanks once again.
Dear @njtierney (and @karthik),
We greatly appreciate you taking the time to review our paper.
We have addressed all the issues raised by @njtierney, and we hope that the paper will now be accepted for publication. Please, find our response to each of the issues below.
Kind Regards,
Monsuru (@MAnalytics)
#-----------------------------------
Thank you for your patience everyone, here is my review.
I have a few concerns regarding the functionality of the software and how easy it is to use for a data analysis. I appreciate that perhaps these points might at first appearance appear to be superficial, however I think that making these improvements will make the user experience better, and improve the package overall - I myself would love to use some of this work in my longitudinal data analysis, however at the moment I think that there are some improvements that can be made to the code.
I am very happy to provide more pointers/advice/detail if you have any questions about my review or comments on the code.
Thank you again for your submission, I enjoyed reading it, and appreciate your strong effort in making this project open source.
Review checklist for [@njtierney]
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the repository url?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [ ] Contribution and authorship: Has the submitting author ([@MAnalytics]) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
## ------------------------------------------
1. Reviewer's comment:
## ------------------------------------------
- @MAnalytics appears to have been a major contributor to the software, although there are a lot of commits from
geoMADE
, but this does not appear to be an account (https://github.com/geoMADE), so I assume that these commits are from @MAnalytics and are named as geoMADE somehow.## ------------------------------------------
Response:
## ------------------------------------------@MAnalytics is the same account as @geoMADE - the account name was changed some time ago.
## ------------------------------------------
2. Reviewer's comment:
## ------------------------------------------Functionality
- [x] Installation: Does installation proceed as outlined in the documentation?
- [ ] Functionality: Have the functional claims of the software been confirmed?
One common theme I've seen in this package is that there is a lot of code being printed out to the console when functions are being run, and there is no way to silence this. For example:
example_clust <- akmedoids.clust(all_traj, k=c(3,10), crit = "Silhouette")
returns
[1] "solution of k = 3 determined!" [1] "solution of k = 4 determined!" [1] "solution of k = 5 determined!" [1] "solution of k = 6 determined!" [1] "solution of k = 7 determined!" [1] "solution of k = 8 determined!" [1] "solution of k = 9 determined!" [1] "solution of k = 10 determined!" [1] "Suggested optimal solution contains 3 clusters. See the plot for further examination!"
These print messages are useful, but there should be a
verbose
orquiet
argument to suppress these. This information should be also stored in the output of the object.This function also opened a graphic window and plot. There is no way for the user to suppress this. The output should be stored in the object, and if you really want to show the plot to uses, have an option to print the plot when running the function, perhaps with an argument like
show_plot = TRUE
.Reading the documentation of the package, the function returns different output based on the value of K. The documentation reads:
' @return If \code{k} is a vector of two numbers (see param.
' \code{k} details above), the output is a graphical plot of
' the quality scores of the cluster solutions. If \code{k} is
' an exact integer number of clusters, the function returns
' trajectory labels indicating the group membership of the
' corresponding trajectory in the \code{traj} object.
I think that this makes the software harder to use, as there is an implied workflow that could be made simpler and easier to use.
It seems then that the next step is to run the same code again but with
k
= 3. I am not sure why this needs to be rerun, shouldn't this be stored as the output? For example, a vector of classes for each value ofk
.Running the function again with
k = 3
, I get the following:example_clust_3 <- akmedoids.clust(all_traj, k=c(3), crit = "Silhouette") example_clust_3
Finally to summarise the information we run:
#get the optimal cluster solution clustr <- akmedoids.clust(all_traj, k=3) clustr <- as.vector(clustr$memberships)$alphabetic_Labels ##get group statistics descrStats <- statPrint(clustr, all_traj, id_field = FALSE, showplots=FALSE) print(descrStats)
I believe that this can be improved in a few ways.
Firstly, writing
clustr <- akmedoids.clust(all_traj, k=3) clustr <- as.vector(clustr$memberships)$alphabetic_Labels
should be avoided as you are overwriting the object.
Secondly, the user should not need to perform special data manipulation (such as accessing a nested list with (
as.vector(clustr$memberships)$alphabetic_Labels
) in order to summarise the data.This could be improved by writing a special
clustr
out to the console, e.g.,clustr <- akmedoids.clust(all_traj, k=3) clustr
Outputting the
clustr
object in the console, it could return the information thatstatPrint
returns. You can read more about creating customIt seems that there is a workflow implied:
- Test multiple different values of
k
withakmedoids.clust
- explore the plots of the optimal solution based on a certain criterion.
- decide on ideal value of K
- Run clustering for the optimally identified value of
k
- using
akmedoids.clust
again
- summarise this information using
statPrint
I think that this could be improved in the following way.
- Provide a function for exploring different values of K
- Provide a function for identifying the cluster labels
- Provide good print methods, so the output that is given to the console could be delivered via the print method.
I suggestion is the output of this function should contain possible memberships for
I also note that
statPrint
is noted as performing two tasks - (1) generating statistics of groups, and (2), creating plots.In my opinion the output of
statPrint
andakmedoids.clust
functions should be separated into two functions, or at least provide options for the plots to be turned off. If they are turned into functions, I recommend:
- One function for identifying clusters or summary statistics that return a data frame output or a list.
- And one function that focuses on creating graphics.
## ------------------------------------------
Response:
## ------------------------------------------Thank you very much for taking the time to explore these functions and suggesting ways to improve their functionalities.
First, in order to improve the usability of the package, we make the following changes to the akmedoids.clust
and statPrint
functions, and how they can be deployed.
Change of names: We changed the names of akmedoids.clust
and statPrint
functions to more intuitive names: akClustr
and print_akStats
(a special print function) functions, respectively.
Workflow: The above two functions can now be deployed to perform clustering and visualize the results, respectively, following two simple steps as follows:
(a) Use akClustr
function to generate the solutions for different values of k
. For example,
```
aksolution <- akClustr(trajectry, k = c(3,7), verbose = FALSE, quality_Plot=FALSE)
To print the results out to the console, a user can just type:
aksolution
(This will print the solutiond for all values of `k` and also indicates which amongst them is the optimal solution – i.e. `$optimal_k`).
As seen above, the function now includes `verbose` argument to suppress the messages during processing, and a `quality_Plot` argument to control the plot of quality criteria. The output of `akClustr` function is exported as object of class `akObject`.
(***b***) Given an object of class `akObject` (as in the above), use `print_akStats` function to generate the properties (i.e. descriptive and change statistics) of any specified `k`-cluster solution (i.e. optimal or not). In the example above, the properties of the `k =4` solution for example can be printed as:
print_akStats(aksolution, k = 4, type="lines", y.scaling="fixed", show_Plots = TRUE)
``Note that the function now includes
show_Plots` argument for controlling the clusters plots.
## ------------------------------------------
3. Reviewer's comment:
## ------------------------------------------
I have some other comments on the functions in this package:
akmedoids::alphaLabel()
- I'm not sure if the output of this needs to be a list, it could instead just be the vector of characters.
- Is it possible to provide an example of the use of this function?
## ------------------------------------------
Response:
## ------------------------------------------
The output is actually a vector of characters.
The goal of akmedoids::alphaLabel()
function is to allow the transformation of any numeric cluster labels (particularly of external packages, such as kml
) into alphabetic labels. The goal is to allow a comparison of results from such packages (e.g. kml
) with the results from our own package. We have now included an @example usage of this type of scenario in the alphaLabel()
function, by showing how the numeric cluster labels of kml::affectIndiv()
function can be transformed into alphabetic labels. We deployed a real-life Time-at-risk for the adjudicated Toronto Youth
datasets (Nielsen, J. 2018
).
The alphaLabel()
function was originally used within our akClustr
function (see line 331 of akClustr
function).
## ------------------------------------------
4. Reviewer's comment:
## ------------------------------------------
akmedoids::dataImputation()
- see note in "common issues" about printing messages
## ------------------------------------------
Response:
## ------------------------------------------A
verbose
argument has now been added to theakmedoids::dataImputation()
function to control the output messages.
## ------------------------------------------
5. Reviewer's comment:
## ------------------------------------------
akmedoids::elbowPoint()
- In example documentation, the function is applied to the plot (which is great!) but is not demonstrated in other contexts. I suggest providing an example outside of plotting as well.
## ------------------------------------------
Response:
## ------------------------------------------We have now modified the @example of the
elbowPoint()
function by using a real-life data set (i.e.Time-at-risk for the adjudicated Toronto Youth
datasets). We deployakClustr
function to demonstrate how theelbowPoint()
function helps theSilhouette
criterion to determine the optimality ofk
values.## ------------------------------------------
6. Reviewer's comment:
## ------------------------------------------
akmedoids::outlierDetect()
- This prints out several messages and objects, see below note on messages.
## ------------------------------------------
Response:
## ------------------------------------------
A verbose
argument is added to control the print out messages.
## ------------------------------------------
7. Reviewer's comment:
## ------------------------------------------
akmedoids::props()
- states output is a matrix, but is a data.frame
## ------------------------------------------
Response:
## ------------------------------------------Thanks. Corrected accordingly.
## ------------------------------------------
8. Reviewer's comment:
## ------------------------------------------
akmedoids::rates()
- This function doesn't return a matrix, instead returning a vector of common IDs. I am not sure what these mean, as their behaviour is not documented in the helpfile.
## ------------------------------------------
Response:
## ------------------------------------------In our checks the function seems to be working fine. Please kindly see the @example provided under the function.
## ------------------------------------------
9. Reviewer's comment:
## ------------------------------------------
akmedoids::removeRowsN()
- Could the documentation clarify how is this different to
na.omit()
?## ------------------------------------------
Response:
## ------------------------------------------We have now added the following statement in the description of the
removeRowsN()
function:
"... . The function is also able to remove records with
Infentries, distinguishing it from the popular
na.omit()function in
R".
>
## ------------------------------------------
10. Reviewer's comment:
## ------------------------------------------*
>
akmedoids::wSpaces()
- The example for this function doesn't contain whitespace characters
- Could the documentation explain how this is different to
trimws
function in R?## ------------------------------------------
Response:
## ------------------------------------------
We have added a new sample datatraj_wSpaces.rda
which contains whitespaces in order to demonstrate the functionality of this function. Please, see the @example. Note: thewSpaces()
function is found to perform the same task as thetrimws()
function.## ------------------------------------------
11. Reviewer's comment:
## ------------------------------------------
akmedoids::statPrint()
- It is very neat that this function can generate a nice looking ggplot, however it would be great if that plot was instead a separate function
## ------------------------------------------
Response:
## ------------------------------------------
Thanks for the comment. Please, see above. Please, also note thatakmedoids::statPrint()
has now been changed toakmedoids::print_akStats()
## ------------------------------------------
12. Reviewer's comment:
## ------------------------------------------Common issues
- plots all open new graphics devices rather than using standard
no way to silence console output - suggest adding a
quiet
orverbose
option, and to usemessage
when printing messages to the console.
- e.g., the
rates
,akmedoids.clust
functions## ------------------------------------------
Response:
## ------------------------------------------
The Common issues have all been addressed (please, see above)
- [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Yes.
Documentation
- [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
The statement in the README says:
identify clusters of trajectories with similar long-term linear trends over time, providing an improved cluster identification as compared with the classic kmeans algorithm
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Yes
## ------------------------------------------
13. Reviewer's comment:
## ------------------------------------------
- [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Although the author provides instructions in the README, but these contain no output of the function. This should be updated to make the README a
README.Rmd
function. Seeusethis::use_readme_rmd()
as an approach on doing this.In addition, the README simulates some example data for use - this should be encapsulated as an example dataset in the package that can be loaded.
## ------------------------------------------
Response:
## ------------------------------------------Thank you for the suggestion.
We have now provided a README.Rmd
, addressing the concerns above. We also update the package vignette accordingly. Specifically, we detailed the following in the README.Rmd
document:
## ------------------------------------------
14. Reviewer's comment:
## ------------------------------------------
- [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- example documentation has unnecessary
traj <- traj
line in "examples"- The datasets are not described in the helpfiles, and state that they are matrices when they are data frames. This data should be documented more thoroughly (see https://r-pkgs.org/data.html#documenting-data for details on how to document data, and https://github.com/jforbes14/eechidna/blob/master/R/data.R for an example documentation of data).
## ------------------------------------------
Response:
## ------------------------------------------
- Unnecessary lines have been removed.
- We have now followed the example in https://github.com/jforbes14/eechidna/blob/master/R/data.R and provide detailed documentation of all the sample data sets included in the package. Please, see
./R/data.R
Other additional revision to note:
population.rda
as popl.rda
as there is a clash with the population.rda
in one of the dependency packages.
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
There are tests
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
Thanks for your reponse, @MAnalytics !
I will reply shortly.
@njtierney Ok, Thanks
Thank you for your response! I'm looking forward to using this in my own work.
I greatly appreciate your effort in addressing my points, for readability, I will address the relevant points freshly below:
print_akStats
has the arguments, "akObject", "k", "reference", "N.quant",, "show_Plots", "type", "y.scaling". These arguments should be eitherThis isn't required to be addressed for the review, but it might be helpful for your own work - your function, print_akStats
can be written in a way so that when you print an object created by akClustr
, it will do the nice printing when output into the console. Again, not needed for this review, but worth knowing about.
if(show_Plots==TRUE){
#plot
flush.console()
dev.new(width=3, height=3)
print(plt)
}
This code could instead be:
if(show_Plots==TRUE){
print(plt)
}
as there isn't a need to flush the console or create a specific device
I also recommend that this plotting function is separated, perhaps as a plot_akStats
function. This can then be called inside your print method if you prefer, but having just the function to plot the information would be really nice.
akmedoids::rates()
I stated in my previous review:
- This function doesn't return a matrix, instead returning a vector of common IDs. I am not sure what these mean, as their behaviour is not documented in the helpfile.
Your response:
In our checks the function seems to be working fine. Please kindly see the @example provided under the function.
When I run the example code, I get this output:
$common_ids
[1] "E01012628" "E01004768" "E01004803" "E01004804" "E01004807" "E01004808"
[7] "E01004788" "E01004790" "E01004805"
$noncommon_ids_from_trajectories
[1] "E01004806"
$noncommon_ids_from_denominator
[1] "E01004809" "E01004899"
$rates_data
location_ids X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009
1 E01012628 3.60 2.86 1.00 2.29 4.67 4.00 1.00 4.00 2.00
2 E01004768 4.00 6.40 3.63 2.29 2.00 3.56 0.80 2.18 0.00
3 E01004803 2.86 2.48 4.53 3.35 3.75 3.30 3.29 4.34 2.00
4 E01004804 2.29 1.60 0.00 4.71 0.89 1.26 2.40 2.29 2.91
5 E01004807 5.09 1.92 5.14 1.55 0.94 1.69 2.40 1.12 0.70
6 E01004808 3.76 2.42 5.00 3.61 4.80 2.21 3.09 3.56 1.85
7 E01004788 0.67 1.36 1.82 1.90 2.40 1.84 1.33 2.35 2.00
8 E01004790 6.40 3.64 5.33 2.46 0.57 2.67 3.00 0.47 0.44
9 E01004805 6.00 0.00 2.00 4.00 2.00 0.00 2.00 8.00 0.00
In the documentation, can you describe what, $common_ids
$noncommon_ids_from_trajectories
, and $noncommon_ids_from_denominator
are?
pkgdown
R package for automatically generating websites for R packages. See https://pkgdown.r-lib.org/ for more details.@njtierney, Thank you very much for your efforts. I will address these issues asap.
Dear @njtierney (and @karthik)
We thank you once again for your efforts. Please, find below our responses.
@MAnalytics
review response
Thank you for your response! I'm looking forward to using this in my own work.
I greatly appreciate your effort in addressing my points, for readability, I will address the relevant points freshly below:
----------------------------------------------------------------------
Function arguments should have consistent names
- This package uses camelCase for it's functions names, (akClustr, alphaLabel, etc)
Some of the function arguments use mixed names, such as the function,
print_akStats
has the arguments, "akObject", "k", "reference", "N.quant",, "show_Plots", "type", "y.scaling". These arguments should be either
- all camelCase (akObject, yScaling, nQuant), or
- all lower case separated by an underscore (e.g., ak_object, y_scaling)
- It should not be a mix of changing case and separator, as this is hard for the user to predict (e.g., N.quant, show_Plots)
----------------------------------------------------------------------
Thank you for the observations. We have now decided to use lowercase in all the cases, i.e. both the names and arguments of the functions are now changed to lowercases.
>
Print methods
This isn't required to be addressed for the review, but it might be helpful for your own work - your function,
print_akStats
can be written in a way so that when you print an object created byakClustr
, it will do the nice printing when output into the console. Again, not needed for this review, but worth knowing about.
Thank you very much for your suggestion. We will explore the print
functionalities later.
Separate functions for plotting
- The plot methods currently open up a new device when plots are shown, from this code:
if(show_Plots==TRUE){ #plot flush.console() dev.new(width=3, height=3) print(plt) }
This code could instead be:
if(show_Plots==TRUE){ print(plt) }
as there isn't a need to flush the console or create a specific device
I also recommend that this plotting function is separated, perhaps as a
plot_akStats
function. This can then be called inside your print method if you prefer, but having just the function to plot the information would be really nice.----------------------------------------------------------------------
>
The plot argument has been modified accordingly.
As suggested, we have now added a plot_akstats
function for plotting. This function is called inside the print_akstats
function to provide a limited plot functionalities. We then recommended (see @param show_plots) that users deploy plot_akstats
for more plot options. The vignette, README.Rmd, test_units, etc. are modified accordingly.
>
regarding
akmedoids::rates()
I stated in my previous review:
- This function doesn't return a matrix, instead returning a vector of common IDs. I am not sure what these mean, as their behaviour is not documented in the helpfile.
Your response:
In our checks the function seems to be working fine. Please kindly see the @example provided under the function.
When I run the example code, I get this output:
$common_ids [1] "E01012628" "E01004768" "E01004803" "E01004804" "E01004807" "E01004808" [7] "E01004788" "E01004790" "E01004805" $noncommon_ids_from_trajectories [1] "E01004806" $noncommon_ids_from_denominator [1] "E01004809" "E01004899" $rates_data location_ids X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 1 E01012628 3.60 2.86 1.00 2.29 4.67 4.00 1.00 4.00 2.00 2 E01004768 4.00 6.40 3.63 2.29 2.00 3.56 0.80 2.18 0.00 3 E01004803 2.86 2.48 4.53 3.35 3.75 3.30 3.29 4.34 2.00 4 E01004804 2.29 1.60 0.00 4.71 0.89 1.26 2.40 2.29 2.91 5 E01004807 5.09 1.92 5.14 1.55 0.94 1.69 2.40 1.12 0.70 6 E01004808 3.76 2.42 5.00 3.61 4.80 2.21 3.09 3.56 1.85 7 E01004788 0.67 1.36 1.82 1.90 2.40 1.84 1.33 2.35 2.00 8 E01004790 6.40 3.64 5.33 2.46 0.57 2.67 3.00 0.47 0.44 9 E01004805 6.00 0.00 2.00 4.00 2.00 0.00 2.00 8.00 0.00
In the documentation, can you describe what,
$common_ids
$noncommon_ids_from_trajectories
, and$noncommon_ids_from_denominator
are?----------------------------------------------------------------------
>
We appreciate your very keen observation. We have now dealt with this issue.
First, we change these output names to more meaningful ones, i.e.
$common_ids
remains as $common_ids
$noncommon_ids_from_trajectories
changed to $ids_unique_to_traj_data
$noncommon_ids_from_denominator
changed to $ids_unique_to_denom_data
$rates_data
changed to $rates_estimates
Second, we then provide the description of each variable (see under @return section)
>
>
Regarding the README.Rmd
- A very minor thing - the README.Rmd file uses html_document instead of github_document (commented out here) - this should then generate the R output that can be seen in the README.md
- Not necessary for this review at all, but the authors might also be interested in looking at the
pkgdown
R package for automatically generating websites for R packages. See https://pkgdown.r-lib.org/ for more details.
>
The README.md now renders nicely in the browser. Please, see here: https://github.com/MAnalytics/akmedoids/blob/master/README.md
Thank you very much for the recommendation of pkgdown
material. We will look into this for our package.
Most helpful comment
Thanks for your reponse, @MAnalytics !
I will reply shortly.