Joss-reviews: [REVIEW]: oolong: An R package for validating automated content analysis tools

Created on 10 Jul 2020 · 74Comments · Source: openjournals/joss-reviews

Submitting author: @chainsawriot (Chung-hong Chan)
Repository: https://github.com/chainsawriot/oolong
Version: 0.3.11
Editor: @kakiac
Reviewers: @pdwaggoner, @mbod, @Kudusch
Archive: 10.5281/zenodo.4256574

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/6e535564e7142d705f4f3d68b18dac62"><img src="https://joss.theoj.org/papers/6e535564e7142d705f4f3d68b18dac62/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/6e535564e7142d705f4f3d68b18dac62/status.svg)](https://joss.theoj.org/papers/6e535564e7142d705f4f3d68b18dac62)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@pdwaggoner & @mbod, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kakiac know.

✨ Please try and complete your review in the next six weeks ✨

Review checklist for @pdwaggoner

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@chainsawriot) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @mbod

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@chainsawriot) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @Kudusch

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@chainsawriot) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Makefile R TeX accepted published recommend-accept review

Source

whedon

Most helpful comment

Congrats on your new publication @chainsawriot! And thanks to editor @kakiac and reviewers @pdwaggoner, @mbod, and @Kudusch for your time and expertise!! 🎉 🎉

(I will leave this issue open until the DOI resolves)

kthyng on 10 Nov 2020

🎉2

All 74 comments

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @pdwaggoner, @mbod it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon on 10 Jul 2020

Reference check summary:

OK DOIs

- 10.18637/jss.v040.i13 is OK
- 10.21105/joss.00774 is OK
- 10.1080/10584609.2020.1723752 is OK
- 10.1080/19312458.2018.1458084 is OK
- 10.1093/pan/mps028 is OK
- 10.1140/epjds/s13688-016-0085-1 is OK
- 10.18637/jss.v091.i02 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon on 10 Jul 2020

:point_right: Check article proof :page_facing_up: :point_left:

whedon on 10 Jul 2020

Hi @kakiac , @mbod , and @chainsawriot - Thanks for the package and for the chance to review the paper and software. For the most part this looks like a good, simple package and paper. Still, I think there are a few revisions needed prior to publication at JOSS:

There is no statement of or clear path to community contribution (in general or for bug reports) either in the paper or in the GH repo (though it's buried in the DESCRIPTION file via BugReports). Some note to users on how to contribute to the package in these two locations would be nice.
In the paper, there is no "state of the field". Though a few other packages are cited, though in reference to those supported by oolong, there is currently only a single paragraph at the outset of the paper that gives a few sentences on what validation is according to two papers (Chang et al. 2009 and Song et al. 2020). A few more paragraphs that, a) go into more depth on validation and why these techniques over the many other possible approaches to validation, b) detail the packages that currently exist on related topics and approaches to validation of content analytic models, and thus c) defend why this package (oolong) is usefully filling a real gap in the literature/software, would all be quite useful.
Related to the previous point, in general, I found myself thinking about many other ways to validate commonly fit content analytic models. For example, the most simple and widely used validation approaches to topic models are to check coherence (and also perplexity) score(s), as a core issue with topic models is whether k is an appropriate characterization of the corpus. As topic models are usually unsupervised, this is almost always the first step in validating these models. Though this specific flavor of validation doesn't seem to be at the core of this package (by assumption), I wondered why not? This question could likely be satisfied by either a more thorough justification for the selected techniques mentioned in the previous point, or by expanding the package to include other prominent validation techniques for content analysis. But at a minimum, some nod to the partial selection of validation techniques (beyond noting that the authors of those two papers suggested them) would be extremely valuable, deepen the reliability of the software, as well as sharpen the contribution of this piece more broadly.

Other than these issues, the paper and software look good. I look forward to seeing the revisions and then completing my review. Thanks all!

pdwaggoner on 15 Jul 2020

Dear Dr Waggoner @pdwaggoner ,

Thank you very much for your valuable comments.

My coauthor @msaeltzer and I have revised both the paper and the GitHub repo to address your comments. It can be reflected in the current version of chainsawriot/oolong@a0fa6d0.

We have updated the Github repo of oolong to provide a clear path to community contribution. We have provided a short guide to contribution and contributer code of conduct.
We have addressed the two other points by you under the section "Validation of Automated Content Analysis" of the paper. In this section, we outline what does it mean by validation of automated methods (Statistical validation, Semantic Validation, and Predictive Validation). The reason for us to focus on semantic validation is also provided.

We hope we have addressed all your concerns with both the Github repo and the paper. We are looking forward for your further feedback.

Regards,
Dr Chung-hong Chan @chainsawriot

chainsawriot on 22 Jul 2020

👍1

@whedon generate pdf

chainsawriot on 22 Jul 2020

:point_right: Check article proof :page_facing_up: :point_left:

whedon on 22 Jul 2020

Hi @chainsawriot - Thanks for the revisions! The paper looks much better now, with a very clear picture of how your substantive and technical approaches fit in the broader literature. I also like how you've included code of conduct in addition to contribution guidelines. Great changes and solid software! This is good to go in my opinion.

Back to you @kakiac .

pdwaggoner on 22 Jul 2020

👍1

@whedon generate pdf

chainsawriot on 17 Aug 2020

:point_right: Check article proof :page_facing_up: :point_left:

whedon on 17 Aug 2020

Dear Professor O'Donnell (@mbod) & @kakiac ,

I have made a minor update to the manuscript to reflect the changes in the bibliography (A cited paper is no longer "in press"). I hope it can facilitate the review. Thank you very much!

Regards,
Chung-hong Chan

chainsawriot on 17 Aug 2020

Dear @kakiac ,

According to the blog post about the current reduced service mode, a review is expected to be finished in 6 weeks (was 2 weeks). If it takes longer than that, you, as an editor, is expected to intervene. According to my calendar (probably yours too), it has been 7 weeks. I was wondering if you can either check with the reviewer (@mbod ) or assign a new reviewer if @mbod is not available. Thank you very much!

Possible additional reviewers are bachl , cschwem2er , koheiw , pmyteh .

cc. my coauthor @msaeltzer & the original EiC @danielskatz

Regards,
Chung-hong Chan

chainsawriot on 30 Aug 2020

Dear @arfon,

I've contacted both the editor @kakiac and the reviewer @mbod about it through Github and e-mail and there is nothing going on with this review. Could you assign another reviewer? Thank you very much!

Regards,
Chung-hong Chan

chainsawriot on 8 Sep 2020

@chainsawriot - I'll follow up and with @kakiac myself to see what we can do to help move this along.

For your information, six weeks is a _guideline_ for reviewers, not a hard time limit.

/ cc @danielskatz for visibility.

arfon on 8 Sep 2020

Dear Chung-hong Chan (@chainsawriot), many thanks for your patience and for your work on the submission so far (I have sent you a separate email explaining further).

I am following up with @mbod this week to ask how he is progressing with the review and whether he is still available to review. I will also assign an additional reviewer to speed things up. I should be in touch through this issue again hopefully with some news.

@arfon @danielskatz @pdwaggoner many thanks for many thanks for the support and for your continued work on this submission whilst I was off-sick 🥇:)

kakiac on 1 Oct 2020

@whedon generate pdf

kakiac on 1 Oct 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 1 Oct 2020

@whedon check references

kakiac on 1 Oct 2020

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v040.i13 is OK
- 10.21105/joss.00774 is OK
- 10.1177/0093650220944790 is OK
- 10.1080/10584609.2020.1723752 is OK
- 10.1080/19312458.2018.1458084 is OK
- 10.1093/pan/mps028 is OK
- 10.1140/epjds/s13688-016-0085-1 is OK
- 10.18637/jss.v091.i02 is OK
- 10.1016/j.poetic.2013.08.004 is OK
- 10.1073/pnas.1018067108 is OK
- 10.1111/j.1540-5907.2009.00427.x is OK
- 10.5281/zenodo.3591068 is OK
- 10.1080/19312458.2018.1430754 is OK
- 10.1145/1458082.1458317 is OK
- 10.3115/v1/N15-1018 is OK
- 10.1080/10361146.2017.1324561 is OK
- 10.1016/j.gloenvcha.2020.102038 is OK
- 10.1177/0002716215569192 is OK
- 10.1080/19312458.2019.1671966 is OK
- 10.1080/21670811.2015.1096598 is OK
- 10.1080/21670811.2015.1093270 is OK
- 10.1002/sam.11415 is OK

MISSING DOIs

- 10.2307/2288384 may be a valid DOI for title: Content analysis: An introduction to its methodology

INVALID DOIs

- None

whedon on 1 Oct 2020

@whedon add @Kudusch as reviewer

kakiac on 1 Oct 2020

OK, @Kudusch is now a reviewer

whedon on 1 Oct 2020

👋 @Kudusch many thanks for agreeing to review, this is the review thread for the paper. All of our communications will happen here from now on.

You should see that there is a checklist at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, you are are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#2461 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 6 weeks. Please let me know if you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@kakiac ) if you have any questions/concerns.

kakiac on 1 Oct 2020

Thank you as well to @kakiac for the chance to review for JOSS!

Dear @chainsawriot,

first of all: Thank you for writing this package! It has been very useful for me and I'm sure it'll be/it has been very useful for the community as well.

I have now read the paper and (re-)installed oolong and went through some of the examples provided by you. I think the paper and the package is almost fit for publication, I just have a few little comments:

There is an issue with the R output in the paper on pages 3, 4, & 5. The lines seem to be to long and they are not being broken correctly. This is mainly an aesthetic issue, as the code itself is readable. Maybe, the output can be truncated for the paper?
In the installation guidelines in the overview you recommend installing the package from GitHub. Is the developmental version the recommended version of the package or are installations through CRAN fine as well?
In the paper on page 4 the output of the summarize_oolong() function might need a little more explanation; a guide on how these values can be interpreted. This information is (in part) also found in the package's documentation, but a paragraph or two on the values returned by summarize_oolong() might be helpful.

With these minor issues addressed, I think the paper should be published!

Kudusch on 6 Oct 2020

@Kudusch Many thanks for the review and for the feedback.
@chainsawriot many thanks for the changes, we will be keeping an eye on the issue progress here, and then move to the next step in the process:

https://github.com/chainsawriot/oolong/issues/31

kakiac on 14 Oct 2020

@whedon generate pdf

chainsawriot on 15 Oct 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 15 Oct 2020

Dear Mr. Tim Schatto-Eckrodt (@Kudusch ),

Thank you so much for accepting the review and your tremendously useful comments. Regarding your three points, I have made the following changes:

We have followed your advice to truncate some output, e.g. from tibble. However, we want to keep the output from oolong to show how it guides us interactively. The output problem was partly an issue of the verbose output from summarize_oolong. We have made the output from summarize_oolong cleaner (e.g. rounding some numbers).
We have edited the overview to reflect our recommendation: We suggest using the development version because the package is constantly changing. However, we have also included the instruction to install the so-called "stable" (a.k.a. older) version from CRAN.
In the software paper, we have followed your advice to provide guidance on how to interpret the results from summarize_oolong.

On top of these, we have also updated some references to reflect the up-to-date version of some cited papers.

We hope we have addressed all your concerns. We are looking forward for your further feedback. Once again, thank you very much!

Regards,
Dr Chung-hong Chan

chainsawriot on 15 Oct 2020

Dear Chung-hong Chan (@chainsawriot),

perfect! I just looked through the latest article proof and the changes you made to the repo. You have addressed all of my comments and I have nothing left to complain about!

Thank you for the work you put into this package and the review process!

@kakiac I think, after the two review phases, the paper is now fit for publication and I recommend to accept it.

Kudusch on 15 Oct 2020

@chainsawriot

I was not able to get all the tests to pass and didn't immediately see documentation on the scope of the tests and how to run them.

topicmodels package required for tests - not stated

> library(oolong)
> 
> test_check("oolong")
Error: No tests found for oolong
> setwd('~/files/oolong/tests/')
> library(testthat)
> library(oolong)
> 
> test_check("oolong")
── 1. Error: github issue #8 - word (@test-topicmodels.R#38)  ───────────────────────────────
there is no package called ‘topicmodels’
1: library(topicmodels) at testthat/test-topicmodels.R:38

── 2. Error: github issue #8 - topic (@test-topicmodels.R#45)  ──────────────────────────────
there is no package called ‘topicmodels’
1: library(topicmodels) at testthat/test-topicmodels.R:45

── 3. Error: generate_topic_content (@test-topicmodels.R#58)  ───────────────────────────────
object 'abstracts_topicmodels' not found
1: expect_warning(oolong:::.generate_test_content(abstracts_topicmodels, abstracts$text[1:10], exact_n = 12)) at testthat/test-topicmodels.R:58
2: quasi_capture(enquo(object), label, capture_warnings)
3: .capture(act$val <- eval_bare(get_expr(.quo), get_env(.quo)), ...)
4: withCallingHandlers(code, warning = function(condition) {
       out$push(condition)
       invokeRestart("muffleWarning")
   })
5: eval_bare(get_expr(.quo), get_env(.quo))
6: oolong:::.generate_test_content(abstracts_topicmodels, abstracts$text[1:10], exact_n = 12)
7: .extract_ingredients(.convert_input_model_s3(input_model), n_top_terms = n_top_terms, difficulty = difficulty, 
       need_topic = !is.null(input_corpus), n_topiclabel_words = n_topiclabel_words, input_dfm = input_dfm, 
       use_frex_words = use_frex_words, input_corpus = input_corpus, btm_dataframe = btm_dataframe)
8: .convert_input_model_s3(input_model)
9: .is_topic_model(input_model)
10: class(x) %in% c("WarpLDA", "STM", "BTM", "keyATM_output")

  |===================================================================================| 100%INFO  [17:17:12.419] early stopping at 50 iteration 

══ testthat results  ════════════════════════════════════════════════════════════════════════
[ OK: 66 | SKIPPED: 4 | WARNINGS: 0 | FAILED: 3 ]
1. Error: github issue #8 - word (@test-topicmodels.R#38) 
2. Error: github issue #8 - topic (@test-topicmodels.R#45) 
3. Error: generate_topic_content (@test-topicmodels.R#58) 

Error: testthat unit tests failed
> install.packages('topicmodels')
also installing the dependencies ‘NLP’, ‘modeltools’, ‘tm’


  There is a binary version available but the source version is later:
    binary source needs_compilation
NLP  0.2-0  0.2-1             FALSE

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/modeltools_0.2-23.tgz'
Content type 'application/x-gzip' length 212143 bytes (207 KB)
==================================================
downloaded 207 KB

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/tm_0.7-7.tgz'
Content type 'application/x-gzip' length 1111260 bytes (1.1 MB)
==================================================
downloaded 1.1 MB

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/topicmodels_0.2-11.tgz'
Content type 'application/x-gzip' length 1662803 bytes (1.6 MB)
==================================================
downloaded 1.6 MB


The downloaded binary packages are in
    /var/folders/sv/2svwpgtx7ssb0zscpbwcwc480000gp/T//RtmpLtzG2M/downloaded_packages
installing the source package ‘NLP’

trying URL 'https://cran.rstudio.com/src/contrib/NLP_0.2-1.tar.gz'
Content type 'application/x-gzip' length 144368 bytes (140 KB)
==================================================
downloaded 140 KB

* installing *source* package ‘NLP’ ...
** package ‘NLP’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (NLP)

The downloaded source packages are in
    ‘/private/var/folders/sv/2svwpgtx7ssb0zscpbwcwc480000gp/T/RtmpLtzG2M/downloaded_packages’
> library(testthat)
> library(oolong)
> 
> test_check("oolong")
── 1. Error: generate_topic_content (@test-topicmodels.R#58)  ───────────────────────────────
object 'abstracts_topicmodels' not found
1: expect_warning(oolong:::.generate_test_content(abstracts_topicmodels, abstracts$text[1:10], exact_n = 12)) at testthat/test-topicmodels.R:58
2: quasi_capture(enquo(object), label, capture_warnings)
3: .capture(act$val <- eval_bare(get_expr(.quo), get_env(.quo)), ...)
4: withCallingHandlers(code, warning = function(condition) {
       out$push(condition)
       invokeRestart("muffleWarning")
   })
5: eval_bare(get_expr(.quo), get_env(.quo))
6: oolong:::.generate_test_content(abstracts_topicmodels, abstracts$text[1:10], exact_n = 12)
7: .extract_ingredients(.convert_input_model_s3(input_model), n_top_terms = n_top_terms, difficulty = difficulty, 
       need_topic = !is.null(input_corpus), n_topiclabel_words = n_topiclabel_words, input_dfm = input_dfm, 
       use_frex_words = use_frex_words, input_corpus = input_corpus, btm_dataframe = btm_dataframe)
8: .convert_input_model_s3(input_model)
9: .is_topic_model(input_model)
10: class(x) %in% c("WarpLDA", "STM", "BTM", "keyATM_output")

  |===================================================================================| 100%INFO  [17:20:01.289] early stopping at 50 iteration 

══ testthat results  ════════════════════════════════════════════════════════════════════════
[ OK: 68 | SKIPPED: 4 | WARNINGS: 1 | FAILED: 1 ]
1. Error: generate_topic_content (@test-topicmodels.R#58)

mbod on 15 Oct 2020

@chainsawriot paper now reads well - installation steps are documentation.

However, I would anticipate that a segment of the user-base who would really want to benefit from a tool like this might need more hand holding and walk through setup. Perhaps assuming all will be RStudio and R familiar is appropriate given the specific tools/models oolong is designed to interoperate with. For the future some more basic tutorials or even videos could be a strong addition to your very useful tool!

Thanks! Good work!

mbod on 15 Oct 2020

@mbod I think the de facto standard way to run unit tests of an R package is for you to clone the git repo to your local; change the working directory to the base directory of the R package; and then run: devtools::test() or if you like: devtools::check(manual = TRUE, remote = TRUE). But please make sure you have set up an environment that can run those checks (e.g. latex etc) if you want to do check. It would be good also if you can provide your sessionInfo()

More information on how to run tests of an R package can be found in the book R Packages. Also, some tests are skipped on CRAN, because they are testing the features depending on the packages listed in the Suggests field of the DESCRIPTION, e.g. topicmodels, keyATM, BTM etc. Some data files (e.g. abstracts_topicmodels) are not included in the package (because they are included in the .RBuildignore), but only useful during the development phase. So, you will certainly get an error if you install the package from GitHub only (because certain files have been skipped due to .RBuildignore during the package building process of devtools::install_github) and then download all the tests from Github and then run them.

If you want to test it on your local machine, it is generally assumed you have cloned the package to local and to have all the suggested packages installed (the same is true for doing standard development tasks such as building vignettes. As a matter of fact, if you don't follow the de facto standard of installing all suggested packages, you don't even have testthat installed). I think it has been a pretty standard convention already

Suggests: your package can use these packages, but doesn’t require them. You might use suggested packages for example datasets, to run tests, build vignettes, or maybe there’s only one function that needs the package.

I don't need to manually instruct travis CI, for example, to install all suggested packages to run all the tests. I believe the arrangement for GitHub Actions is pretty similar.

Could you please give it a try once again? Thank you very much.

chainsawriot on 16 Oct 2020

@mbod Could you please also elaborate on what do you mean by "need more hand-holding and walk through setup", so that we can improve the paper accordingly?

Do you mean information such as installation instructions or even more general information on how to train a topic model? This kind of hands-on information can be found in the overview on Github or the vignette.

This is my first submission to JOSS, maybe it is also a good opportunity for me to consult @kakiac on what kind of information should be included in a JOSS software paper. The current oolong paper is 4 times longer than the example JOSS paper. A more general text analysis R package facing probably the same audience as oolong is quanteda. The JOSS software paper of quanteda does not contain a lot of information on usage. A recently accepted paper of textnets is the standard JOSS 3-pager with no information on how to use whatsoever.

The current software paper has overlapped a lot with the vignette already. Would a hyperlink from the software paper to the vignette be enough as "hand-holding and walk through setup"?

@mbod 's advice on providing tutorials +/- videos has been implemented. For example, in the previous ICA conference, I have provided a tutorial at a preconference. A video has been recorded. The video is too rough to be included in the README, however. In the future, we will seek opportunities to give workshops on validating text analysis.

Thanks a lot!

chainsawriot on 16 Oct 2020

Thank you all @chainsawriot @mbod @Kudusch @pdwaggoner - great work and the paper and repo look much improved as a result! @chainsawriot many thanks for taking on board the advice and recording the tutorial 👍

@mbod can you confirm that you are happy that the following two items in your checklist have been addressed:

[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.

@mbod, could you also confirm that you are happy to recommend acceptance of the paper?

kakiac on 19 Oct 2020

@whedon generate pdf

kakiac on 19 Oct 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 19 Oct 2020

@whedon check references

kakiac on 19 Oct 2020

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.31235/osf.io/np5wa is OK
- 10.18637/jss.v040.i13 is OK
- 10.21105/joss.00774 is OK
- 10.1177/0093650220944790 is OK
- 10.1080/10584609.2020.1723752 is OK
- 10.1080/19312458.2018.1458084 is OK
- 10.1093/pan/mps028 is OK
- 10.1140/epjds/s13688-016-0085-1 is OK
- 10.18637/jss.v091.i02 is OK
- 10.1016/j.poetic.2013.08.004 is OK
- 10.1073/pnas.1018067108 is OK
- 10.1111/j.1540-5907.2009.00427.x is OK
- 10.5281/zenodo.3591068 is OK
- 10.1080/19312458.2018.1430754 is OK
- 10.1145/1458082.1458317 is OK
- 10.3115/v1/N15-1018 is OK
- 10.1080/10361146.2017.1324561 is OK
- 10.1016/j.gloenvcha.2020.102038 is OK
- 10.1177/0002716215569192 is OK
- 10.1080/19312458.2019.1671966 is OK
- 10.1080/21670811.2015.1096598 is OK
- 10.1080/21670811.2015.1093270 is OK
- 10.1002/sam.11415 is OK

MISSING DOIs

- 10.2307/2288384 may be a valid DOI for title: Content analysis: An introduction to its methodology

INVALID DOIs

- None

whedon on 19 Oct 2020

Editor's "After reviewers recommend acceptance" checks:

[x] Get a new proof with the @whedon generate pdf command.
[x] Download the proof, check all references have DOIs, follow the links and check the references.
[x] Whedon can help check references with the command @whedon check references
[x] Proof-read the paper and ask authors to fix any remaining typos, badly formed citations, awkward wording, etc..
[x] Ask the author to make a tagged release and archive, and report the version number and archive DOI in the review thread.
[x] Check the archive deposit has the correct metadata (title and author list), and request the author edit it if it doesn’t match the paper.
[x] Run @whedon set as archive.
[x] Run @whedon set as version if the version was updated.
[x] Run @whedon accept to generate the final proofs, which has Whedon notify the @openjournals/joss-eics team that the paper is ready for final processing.

kakiac on 19 Oct 2020

@kakiac I would like to say that Whedon's suggested DOI for the Krippendorff's book is incorrect. The DOI suggested is a book review of the book. As far as I know, the book has no DOI.

chainsawriot on 19 Oct 2020

Thanks @chainsawriot - yes I spotted that too :) whedon is mostly correct with these things but not always :)

kakiac on 20 Oct 2020

Dear @kakiac ,

I was wondering if the paper is now "conditionally accepted" or not?

Are we waiting for @mbod 's response? If yes, should we set a time limit for this?

Thank you very much!

chainsawriot on 27 Oct 2020

@whedon generate pdf

kakiac on 5 Nov 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 5 Nov 2020

Dear @kakiac ,

I was wondering if the paper is now "conditionally accepted" or not?

Are we waiting for @mbod 's response? If yes, should we set a time limit for this?

Thank you very much!

Thanks @chainsawriot! On the basis of at least two reviewers having ticked their checklists and the quality of the submission, I am happy to accept the paper, and proceed with the next steps. Can you please:

make a tagged release (on github) and
archive (on Zenodo or Figshare), and
report the tagged version number and archive DOI in this review thread.

Let me know if you need any help with either.

(Note: Upon successful completion of the review, you need to make a tagged release of the software (via github, see here for info https://git-scm.com/book/en/v2/Git-Basics-Tagging), and deposit a copy of the repository with a data-archiving service such as Zenodo or figshare.

You then need to get the DOI you will be assigned from them (it is automatically assigned once you create a repository).

As soon as you have both copy the release version number (from github) and the DOI in this thread.

See here for more info under section The review process https://joss.readthedocs.io/en/latest/submitting.html#submitting-your-paper - let me know if you need any help!

An example of a recent tagged release on github is here: https://github.com/kgoldfeld/simstudy/ archived on Zenodo with the DOI: https://doi.org/10.5281/zenodo.4134675)

kakiac on 5 Nov 2020

Dear @kakiac

Thanks for the info. I have created a tagged release on both Github and Zenodo. It's version 0.3.11.

https://github.com/chainsawriot/oolong/releases/tag/0.3.11

https://doi.org/10.5281/zenodo.4256574

Please let me know what I should do next. Thank you very much!

Regards,
Chung-hong Chan

chainsawriot on 7 Nov 2020

👍1

Thanks @chainsawriot 👍

kakiac on 9 Nov 2020

@whedon set 0.3.11 as version

kakiac on 9 Nov 2020

OK. 0.3.11 is the version.

whedon on 9 Nov 2020

@whedon set 10.5281/zenodo.4256574 as archive

kakiac on 9 Nov 2020

OK. 10.5281/zenodo.4256574 is the archive.

whedon on 9 Nov 2020

@whedon generate pdf

kakiac on 9 Nov 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 9 Nov 2020

@whedon accept

kakiac on 9 Nov 2020

Attempting dry run of processing paper acceptance...

whedon on 9 Nov 2020

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1900

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1900, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.
@whedon accept deposit=true

whedon on 9 Nov 2020

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.31235/osf.io/np5wa is OK
- 10.18637/jss.v040.i13 is OK
- 10.21105/joss.00774 is OK
- 10.1177/0093650220944790 is OK
- 10.1080/10584609.2020.1723752 is OK
- 10.1080/19312458.2018.1458084 is OK
- 10.1093/pan/mps028 is OK
- 10.1140/epjds/s13688-016-0085-1 is OK
- 10.18637/jss.v091.i02 is OK
- 10.1016/j.poetic.2013.08.004 is OK
- 10.1073/pnas.1018067108 is OK
- 10.1111/j.1540-5907.2009.00427.x is OK
- 10.5281/zenodo.3591068 is OK
- 10.1080/19312458.2018.1430754 is OK
- 10.1145/1458082.1458317 is OK
- 10.3115/v1/N15-1018 is OK
- 10.1080/10361146.2017.1324561 is OK
- 10.1016/j.gloenvcha.2020.102038 is OK
- 10.1177/0002716215569192 is OK
- 10.1080/19312458.2019.1671966 is OK
- 10.1080/21670811.2015.1096598 is OK
- 10.1080/21670811.2015.1093270 is OK
- 10.1002/sam.11415 is OK

MISSING DOIs

- 10.2307/2288384 may be a valid DOI for title: Content analysis: An introduction to its methodology

INVALID DOIs

- None

whedon on 9 Nov 2020

@chainsawriot Did you address the missing DOI above? I just did a quick check and it looks like it does indeed link to a publication of that title.

kthyng on 9 Nov 2020

@chainsawriot Please update the metadata in your Zenodo archive so that the title and author list exactly match your JOSS paper.

kthyng on 9 Nov 2020

@chainsawriot Just finished going through your paper. It looks good! Just one issue is that some items in your references that should be capitalized are not. To preserve capitalization in your .bib file, you can put {} around the word or a whole title if you prefer. Please look through your references in detail and make sure all appropriate items are capitalized.

kthyng on 9 Nov 2020

@chainsawriot Did you address the missing DOI above? I just did a quick check and it looks like it does indeed link to a publication of that title.

@kthyng: just to confirm, that the missing DOI offered by Whedon is actually referencing a review of the paper (rather than the paper itself) and there is no DOI we could find of the paper, unfortunately.

kakiac on 10 Nov 2020

@kakiac Oh ok! I thought it might have been something like that. Thanks.

kthyng on 10 Nov 2020

@whedon generate pdf

chainsawriot on 10 Nov 2020

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon on 10 Nov 2020

Dear @kthyng and @kakiac

Thank you so much for going through the paper. Regarding the issues:

I have updated the metadata in Zenodo so that the metadata matches both the paper, Github and CRAN.
As mentioned above, the missing DOI was a "false positive" by whedon. The alleged missing DOI links to the book review of the Krippendorff (a popular textbook on quantative content analysis). As far as I know, the Krippendorff doesn't have a DOI.
I have carefully updated the BibTeX file, especially on the capitalization, e.g. Abbreviations and proper nouns such as US, LDA, ANEWS, Gibbs sampling are capitalized, whereas R package names, e.g. lda, stminsights etc are lowercased.

I hope with these revisions, the paper can be published.

Thank you very much!

Regards,
Chung-hong Chan

chainsawriot on 10 Nov 2020

👍1

Yep everything looks good now!

kthyng on 10 Nov 2020

@whedon accept deposit=true

kthyng on 10 Nov 2020

Doing it live! Attempting automated processing of paper acceptance...

whedon on 10 Nov 2020

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

whedon on 10 Nov 2020

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/1905
Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.02461
If everything looks good, then close this review issue.
Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

whedon on 10 Nov 2020

Congrats on your new publication @chainsawriot! And thanks to editor @kakiac and reviewers @pdwaggoner, @mbod, and @Kudusch for your time and expertise!! 🎉 🎉

(I will leave this issue open until the DOI resolves)

kthyng on 10 Nov 2020

🎉2

Dear @kthyng & @kakiac

Thank you so much for handling this. I think the DOI resolves.

I was wondering if it is still possible to modify the metadata of the paper now. I have made a mistake: I have forgotten to change the default ORCID in the template. It the ORCID is now pointing to Adrian Price-Whelan (https://orcid.org/0000-0003-0872-7098). Is it still possible to change it to my ORCID (https://orcid.org/0000-0002-6232-7530) now?

Thank you very much!

Regards,
Chung-hong Chan

chainsawriot on 12 Nov 2020

@chainsawriot - No problem. I've just updated the paper and Crossref deposit metadata with your ORCID. This should be updated on the live site in the next hour or two.

arfon on 12 Nov 2020

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.02461/status.svg)](https://doi.org/10.21105/joss.02461)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.02461">
  <img src="https://joss.theoj.org/papers/10.21105/joss.02461/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.02461/status.svg
   :target: https://doi.org/10.21105/joss.02461

This is how it will look in your documentation:

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Volunteering to review for us sometime in the future. You can add your name to the reviewer list here: https://joss.theoj.org/reviewer-signup.html
Making a small donation to support our running costs here: https://numfocus.org/donate-to-joss

whedon on 12 Nov 2020

@arfon Thank you very much!

chainsawriot on 12 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[PRE REVIEW]: Kindel: indel-aware consensus for nucleotide sequence alignments

whedon · 12Comments

[REVIEW]: PyGDH ("pigged"): Python Grid Discretization Helper

whedon · 11Comments

[REVIEW]: GRUPO: Gauging Research University Publication Output

whedon · 10Comments

[REVIEW]: The Experiment Factory: Reproducible Experiment Containers

whedon · 12Comments

[REVIEW]: Noisyopt

whedon · 6Comments