Submitting author: @strengejacke (Daniel Lüdecke)
Repository: https://github.com/easystats/insight
Version: 0.2.1
Editor: @karthik
Reviewer: @alexpghayes
Archive: 10.5281/zenodo.3256639
Status badge code:
HTML: <a href="http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831"><img src="http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831/status.svg"></a>
Markdown: [](http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
@alexpghayes, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @karthik know.
✨ Please try and complete your review in the next two weeks ✨
paper.md
file include a list of authors with their affiliations?Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @alexpghayes it looks like you're currently assigned as the reviewer for this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
Attempting PDF compilation. Reticulating splines etc...
Thanks again @alexpghayes! Be sure to unwatch the repo, and then work through the list. You can build the pdf anytime with @whedon generate pdf
as a comment here.
We're planning to submit a package-update to CRAN, which would change the version number. The changes won't affect the paper or review-process, but should we update the version-number here somehow?
This review is of the Github master
branch on 2019-05-07.
insight
is a well-written and well-documented package. Reading both the source, pkgdown website and tests was a pleasure. The goal of insight
is to reduce the complexity of working with model objects by providing an easy and intuitive interface to abstracting information from model objects.
My comments are largely about this interface. I think about this primarily in terms of complexity. In some sense insight
is competing with the original modeling packages -- if the interface to insight
is: (1) simpler than fiddling with the model objects, and (2) consistent across model objects, then using insight
reduces the overall complexity of interacting with model objects.
My primary concern is that insight
has a complicated interface. The homepage outlines the interface, which consists of:
find_terms()
get_data()
get_variance()
find_parameters() / get_parameters()
With such a broad interface, I had a hard time figuring out the entry point. Some questions I asked myself:
Where should I start? After some digging it seems like model_info()
is the key function that returns all the information insight
provides at once.
What is the difference between get_*()
and find_*()
?
I strongly recommend shrinking this interface. In particular
providing nearly identical functionality. Also, I struggled to understand the difference between:
This meant I had to read the documentation for each of these methods before figuring out which one did what I wanted. These distinctions should be explained clearly on the front page of the pagedown website. The diagram in the JOSS paper helps somewhat but still needs an accompanying narrative.
For additional discussion of interface design and hiding complexity, I highly recommend A philosophy of software design by Ousterhoust, which I've been enjoying recently.
While the documentation is very clear about function returns, I had difficulty understanding why behavior differs from different model objects.
In find_parameters()
: aov
objects fit linear models with conditional
and random
terms, but instead this information is stored in between
and within
fields of the return
In find_algorithm()
: the object return differs from different object types, so I still would need to check which elements are in this list before trying to access them. Also, I belive the term algorithm
should be replaced by estimator
, which is more appropriate. Also, for Bayesian models, describing the algorithm as "sampling" is too vague.
In find_formula()
: how is random
different from correlation
? How is slopes
different from conditional
? I would have grouped these together. Being somewhat familiar with the model objects under consideration, my impression is that the returns here reflect choices that made the implementation easy for developers, but that do not necessarily provide a generalized interface for users.
On the whole, I get the impression that model components that I conceive of as conceptually similar often end up in different parts of a return object. If I need to understand how get_parameters()
behaves for both lm
and brm
models, it's unclear how beneficial get_parameters()
is.
Examples: The documentation shows how a user can extract information from a model, but it doesn't show how the user can use this information to solve larger problems. Some applications showing how the various functions operate together (and how they are distinct) would be greatly beneficial.
/R
is organized by method but /test
is organized by modelMy initial impression is that using insight
is about as complicated as understanding a model object itself. I think the package needs narrative explaining at a conceptual level how it generalizes across models, and return objects with fewer special cases. Both of these are very difficult tasks due to the heterogeneity of models. insight
may benefit from narrowing in on particular families of models. For instance, it seems like most existing functionality is targeted at GLMMs.
@alexpghayes Thanks for the detailed and comprehensive review! I think your comments are very helpful to make the package more accessible to users. We'll try to address your issues point-by-point and respond here once we think we have a revised version that is sufficient for re-reviewing again.
@strengejacke We can fix your version before this review is completed.
Just giving a short sign that we work on the revisions, there was some delay due to other deadlines and tasks.
Thanks @strengejacke. We appreciate the update.
@whedon generate pdf
Just to check if the revised paper looks ok, a short in-between-built of the PDF...
Attempting PDF compilation. Reticulating splines etc...
PDF failed to compile for issue #1412 with the following error:
Error producing PDF.
! Missing number, treated as zero.
futurelet
l.429 ...n the expression texttt{x + I(x^{}2)}
Looks like we failed to compile the PDF
@karthik Any ideas why compilation fails? It seems to be a code-block in markdown (x + I(x^2)
) which causes the error, although when I build the PDF locally, it works fine.
Dear @alexpghayes,
on behalf of the co-authors, let me thank you again for your thorough comments on our submission. We substantially revised our paper, the package documentation and the accompanying website, and revised package code and functions as suggested. Let me give you a more detailed response to your suggestions and our revisions.
Interface complexity: My primary concern is that insight has a complicated interface. With such a broad interface, I had a hard time figuring out the entry point. Where should I start?
Thank you for this comment. We have revised the paper accordingly, making it more clear for users where they can start. We adapted these revisions both to README of the package’s front page (https://github.com/easystats/insight), as well as the online documentation (https://easystats.github.io/insight/articles/insight.html).
What is the difference between get_() and find_()?
We have added more details in the paper, to the README and the online documentation to make this essential distinction clearer to users. Furthermore, in addition with the figure showing how package functions are related to model components (Figure 1 in the paper), it should now be easier to grasp the differences between the get- and the find-functions.
I strongly recommend shrinking this interface.
See our previous point. In revising the paper and the online documentation, the need for both get- and find-functions should be clearer now. Since it is not possible to extract parameter names for specific model parts (like random or fixed effects, zero-inflated or conditional component) without additional effort when you have the data from one of the get-functions, and vice versa, getting the underlying data once you have the names of parameters or predictors, we see the need for having both find- and get-functions. There are several use cases where either one is needed, and getting the required information or data easily is one of the core aspects of the insight package.
Also, I struggled to understand the difference between: variables and predictors and response, terms and formula. This meant I had to read the documentation for each of these methods before figuring out which one did what I wanted. These distinctions should be explained clearly on the front page of the pagedown website. The diagram in the JOSS paper helps somewhat but still needs an accompanying narrative.
Thanks for pointing this out. We have added a complete new section on the definition of model components to the paper. Furthermore, we added two figures that help making it more clear what the differences between these terms are, and why it is necessary to have these distinctions. We also added the new section to the README of the package’s front page, as well as to the online-documentation (including the new figures).
While the documentation is very clear about function returns, I had difficulty understanding why behavior differs from different model objects. In find_parameters(): aov objects fit linear models with conditional and random terms, but instead this information is stored in between and within fields of the return
We agree with this point, and revised affected package functions accordingly. Now, there are less exceptions, making it more clear and expectable for users what will be returned. Regarding your particular example, the functions now return the elements “conditional” and “random”, being in line with the return values of other model objects.
In find_algorithm(): the object return differs from different object types, so I still would need to check which elements are in this list before trying to access them. Also, I belive the term algorithm should be replaced by estimator, which is more appropriate. Also, for Bayesian models, describing the algorithm as "sampling" is too vague.
We agree with this point. We implemented this function, knowing that it currently is still a bit “work in progress”, and there is an open GitHub issue on this (https://github.com/easystats/insight/issues/38). So this is something we plan to address, however, we still need and want to ask other people with correspondent expertise for their opinion in order to adequately address this issue.
In find_formula(): how is random different from correlation? How is slopes different from conditional? I would have grouped these together.
This is a design decision where we think it is not easily possible to find a closing decision. Our idea was to reflect the different parts of a model formula in the returned elements from the function. “lme()”, for instance, can have both a random-argument and a correlation-argument, and we wanted to preserve this distinction. For “feis”-objects (fixed effects regressions with individual slopes), we have a complicated formula structure with several components. Again, we wanted to preserve the distinction between these parts of the model formula. However, we are happy to reduce interface complexity here and are open for concrete suggestions that help users and developers.
Examples: The documentation shows how a user can extract information from a model, but it doesn't show how the user can use this information to solve larger problems. Some applications showing how the various functions operate together (and how they are distinct) would be greatly beneficial.
Thanks for this suggestion. We have added a new section with examples of practical use cases (in different R packages) to our paper.
/R is organized by method but /test is organized by model
This is due to performance-reasons. Since each function should work for (almost) every model object, for each test we would need to fit all the models each time again, if tests were organized by models. To reduce computational time, especially for CRAN, we decided to organize tests differently that the code-files in the R-folder.
The homepage and README should have links to information about contributing
We have added explicit information about information, support and contributing to the README and website.
I think the package needs narrative explaining at a conceptual level how it generalizes across models, and return objects with fewer special cases.
We have largely revised the paper, README and website (online documentation), providing a clearer narrative (also at the conceptual level) in order to make the idea and functioning of the package clearer for users.
insight may benefit from narrowing in on particular families of models. For instance, it seems like most existing functionality is targeted at GLMMs.
One of the core aims of the insight package is to provide a unified syntax to access model information. Since this information is not easily accessible for many model objects, the idea is to work with as many models as possible. Against the background, it would run against the nature of this package to narrow the scope of supported models. We have added a section in the paper showing examples of use cases and hope that these examples explain why it makes sense to support many different models.
Again, thank your for this valuable review! We hope that we have addressed all your points sufficiently and are happy to receive your feedback on our revision.
Best
Daniel, on behalf of the co-authors
@whedon generate pdf
If the paper.md does not yet compile, we probably can use a temporary PDF?
Here is the link: https://github.com/easystats/insight/blob/master/paper/paper.pdf
Attempting PDF compilation. Reticulating splines etc...
PDF failed to compile for issue #1412 with the following error:
Error producing PDF.
! Missing number, treated as zero.
futurelet
l.435 expression texttt{x + I(x ^{} 2)}
Looks like we failed to compile the PDF
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Response based on comments by @strengejacke above and the Github master branch on 2019-06-16.
I'm impressed by the changes you've made, especially to the documentation. After reading more carefully, I agree with your comments that the broad interface is warranted.
Something clicked for me when I realized that the importance of differentiating between names and values. Some functions return the names of columns of data, some functions return values of columns of data. The difference is especially clear with get_weights()
and find_weights()
. I believe get_*()
and find_*()
are an attempt to make this differentiation, but I am now convinced that a better approach is to give functions names like the following:
find_weights()
-> weights_name()
get_weights()
-> weights_values()
I suspect users will have an easier time with this sort of naming strategy and encourage you to consider this scheme, or something similar.
I appreciated the new definitions of the various components of a model. Instead of presenting this material all at once, I recommend you group the definitions by components that are easy to confuse and set these groups visually apart.
Note that you have chosen to define terms
and variables
in a non-standard way for the R community. In an R model formula an entire chunk such as I(x^2)
is referred to as a term, rather than a variable. I recommend following this definition, which amounts to flipping the current usage of term and variable.
Finally, I recommend grouping get_*()
and find_*()
together on the pkgdown site in the Reference section. That is, you want get_parameters()
and find_parameters()
(or parameter_names()
and parameter_values()
) to appear next to each other.
I appreciate that there are fewer special cases in the returns.
The test file organization makes sense, I'm on board.
It makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods. Can you comment on why this is happening and/or necessary?
When is clean_names()
different from find_terms()
? I would include examples so that users can tell when to use which function.
I appreciate that you've linked to packages that use insight
. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight
, the user must now go read an entire package on their own, find a usage of insight
, and infer the problem that is being solved. I would construct explicit, self-contained examples showing idiomatic usage. If these are inspired by the linked packages, fantastic, but merely linking to another code base isn't that helpful.
I recommend keeping the original paper title insight: Easy Access to Model Information for Various Model Objects
rather than insight: A Unified Syntax for Accessing Model Information for Various Model Objects in R
. Insight provides a unified interface, but does not introduce any new syntax, which is a language level feature. Domain specific languages such as dplyr
and data.table
introduce new syntax (i.e. :=
and !!
) via metaprogramming tricks, but insight
does not and makes use of standard evaluation and syntax.
I would rename all_models_equal()
to all_models_have_same_class()
. The current name makes it seem like the all_models_equal()
is also checking whether the parameter values match up.
Instead of providing a new generic n_obs()
I would provide methods for the existing generic nobs()
.
I am about to respond with additional thoughts on find_algorithm()
in https://github.com/easystats/insight/issues/38.
@alexpghayes Thanks for your review! Since some of your remaining suggestions mean quite some substantial rework or breaking changes, I would like to clarify some points and suggest how we would address your issues, before we actually start working on it. Could you please give a short feedback if our planned steps are ok for you?
Since _insight_ is part of the larger "easystats" project, we have developed some coding guidelines for our packages (https://github.com/easystats/easystats#convention-of-code-style). Your suggestion would break these guidelines, so for consistency within and across packages, we would prefer to keep the current names. However, we may add aliases to the existing functions.
I'm not sure about this, I could not find clear definitions yet (if you have some references/links, would be happy if you could share them). However, since these functions are currently only used by one package that depends on _insight_, we could flip the usage of _term_ and _variable_ here.
I'm not sure if this is possible with some _pkgdown_ options, I think that we would have to combine the functions rd-files, so both ?find_parameters
and ?get_parameters
would appear in the same help-page. Would you suggest going this route?
This seems to be new in R 3.6 (at least, since then it started appearing for me), and once namespaces are referred to, R warns about overwritten S3-methods. ggplot2 depends on rlang, afaik, and it seems ggplot2 overwrites some S3-methods from rlang (or vice versa). In short, this issues seems unrelated to _insight_.
We would add some more comprehensive working examples that clearly show some solutions to problems to the readme (and as further (sub) section to the paper) - would that be sufficient?
Since insight is part of the larger "easystats" project, we have developed some coding guidelines for our packages (https://github.com/easystats/easystats#convention-of-code-style). Your suggestion would break these guidelines, so for consistency within and across packages, we would prefer to keep the current names. However, we may add aliases to the existing functions.
If you don't want to change these, I understand. Since the difference between get_()
and find_()
does seem to come down to names versus values, I really do recommend function names reflecting this, but I'm happy to recommend for publication either way.
Definition of "term" and "variable".
Definitely swap. It look like you found some references in https://github.com/easystats/insight/issues/56#issuecomment-503565591?
Finally, I recommend grouping get_() and find_() together on the pkgdown site in the Reference section.
Yeah, you can do this all with pkgdown
options, no need for @rdname
or anything fancy. See the _pkgdown.yaml
file (I think that's what it's called) in dplyr
for examples: https://dplyr.tidyverse.org/reference/index.html
This seems to be new in R 3.6 (at least, since then it started appearing for me), and once namespaces are referred to, R warns about overwritten S3-methods. ggplot2 depends on rlang, afaik, and it seems ggplot2 overwrites some S3-methods from rlang (or vice versa). In short, this issues seems unrelated to insight.
Great, wouldn't worry about it then.
We would add some more comprehensive working examples that clearly show some solutions to problems to the readme (and as further (sub) section to the paper) - would that be sufficient?
Yep, that'd be ideal! The current examples show how to access certain kinds of information. The open question is when would users want to access this kind of information. What common problem does insight
provide a solution too? My guess is there's some problem where you need to map over many models at once for model comparison and insight
makes that process nicer.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Hi @alexpghayes
we have revised the paper, package and docs according to your last comments. Let me provide a point-by-point statement again to report the changes we made.
Rename find_weights() -> weights_name() and get_weights() -> weights_values()
See our discussion above, for now we decided to stick to the function names, but we're thinking about introducing aliases where it make sense and doesn't obscure the documentation.
I appreciated the new definitions of the various components of a model. Instead of presenting this material all at once, I recommend you group the definitions by components that are easy to confuse and set these groups visually apart.
We revised the paper and the docs by restructuring this section and changing the figures used to illustrate the different components. You find the changes in the paper here and for the online-documentation here. Furthermore, we added these revisions to the readme-page and documentation-homepage, however, w/o figures to keep the reading time for readme-page low.
Note that you have chosen to define terms and variables in a non-standard way for the R community. In an R model formula an entire chunk such as I(x^2) is referred to as a term, rather than a variable. I recommend following this definition, which amounts to flipping the current usage of term and variable.
We have flipped the functions, so the function names are now in line with the common definitions inside (and probably outside) the R community, i.e. find_terms()
is now named find_variables()
and vice versa. All tests and docs were adapted accordingly, and so was the paper.
Finally, I recommend grouping get_() and find_() together on the pkgdown site in the Reference section. That is, you want get_parameters() and find_parameters() (or parameter_names() and parameter_values()) to appear next to each other.
Thanks for the hint to this option (which we weren't aware of, but now also have applied to other package documentations). We have revised the docs accordingly and set groups visually apart here:
https://easystats.github.io/insight/reference/index.html
It makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods. Can you comment on why this is happening and/or necessary?
See our discussion above.
When is clean_names() different from find_terms()? I would include examples so that users can tell when to use which function.
We have added more details and examples to the docs, making the differences between the two functions clearer: https://easystats.github.io/insight/reference/clean_names.html
I appreciate that you've linked to packages that use insight. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight, the user must now go read an entire package on their own, find a usage of insight, and infer the problem that is being solved. I would construct explicit, self-contained examples showing idiomatic usage. If these are inspired by the linked packages, fantastic, but merely linking to another code base isn't that helpful.
We have added another short section on practical use cases to the paper as well as to the readme-file and the documentation-homepage (https://easystats.github.io/insight/). These contain short code-snippets that should demonstrate the major aim of the package, its universal interface to access model information and how it helps users and package developers to reduce the hassle with different model objects.
I recommend keeping the original paper title insight: Easy Access to Model Information for Various Model Objects rather than insight: A Unified Syntax for Accessing Model Information for Various Model Objects in R. Insight provides a unified interface, but does not introduce any new syntax, which is a language level feature. Domain specific languages such as dplyr and data.table introduce new syntax (i.e. := and !!) via metaprogramming tricks, but insight does not and makes use of standard evaluation and syntax.
We have renamed the paper title according to your suggestion and it now reads: _insight: A Unified Interface to Access Information from Model Objects in R_
I would rename all_models_equal() to all_models_have_same_class(). The current name makes it seem like the all_models_equal() is also checking whether the parameter values match up.
We have renamed that functions, resp. added an alias to softly deprecate the old name.
Instead of providing a new generic n_obs() I would provide methods for the existing generic nobs().
We have added a nobs()
method for those model object that do not yet provide such a method, to not overwrite existing S3-methods.
I am about to respond with additional thoughts on find_algorithm() in easystats/insight#38.
Thanks! We would like to keep this issue open until we have a clearer vision of how to improve this function, if you agree.
best
Daniel, on behalf of the co-authors
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
This looks great! @karthik I recommend publication.
@whedon check references
Attempting to check references...
```Reference check summary:
OK DOIs
MISSING DOIs
INVALID DOIs
@alexpghayes Thanks so much for such a thorough review. Your review has improved this package far beyond the checklist we have here. 🙏 💯
@strengejacke and co-authors: The paper in the current form is a little too long for JOSS. JOSS papers are not meant to replace documentation, and should really only provide a high level overview of the package and highlight a few applications. It should not read like a vignette. I'd recommend taking out
Example use cases in R
Examples of use cases in R packages (and linking to both sections in your package down docs). Once you make these edits I'm happy to proceed with the remaining steps.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Ok, I revised the paper accordingly and just checked the PDF proof. Next step would be creating an archive with a DOI, right?
Here is the DOI linking to the archive with the very latest updates: 10.5281/zenodo.3256612
Something is wrong with your references now. Can you check again?
Some references have been deleted during the removal of the paragraphs you suggested, so the reference list is shorter now. Maybe it was the bib-file, which still contained the old entries? What about now?
Ok, I reduced the parapgraphs into one sentence, inlcuding the references (I added them back now), and referring to the website. So we have removed the long sections, but still have all references in the paper as before. The new DOI for this archive is: 10.5281/zenodo.3256639
I'll generate a PDF proof to check.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Looks good to me, fits into exactly 6 pages.
Thanks. Please archive again and send me the updated DOI.
Sorry, maybe this got lost in the conversation. Here's the updated DOI from the archive containing the very last changes:
10.5281/zenodo.3256639
@whedon set 10.5281/zenodo.3256639 as archive
OK. 10.5281/zenodo.3256639 is the archive.
Over to you @openjournals/joss-eics
Looks good to me!
@whedon accept
Attempting dry run of processing paper acceptance...
```Reference check summary:
OK DOIs
MISSING DOIs
INVALID DOIs
Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/795
If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/795, then you can now move forward with accepting the submission by compiling again with the flag deposit=true
e.g.
@whedon accept deposit=true
@whedon accept deposit=true
Doing it live! Attempting automated processing of paper acceptance...
🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦
🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨
Here's what you must now do:
Party like you just published a paper! 🎉🌈🦄💃👻🤘
Any issues? notify your editorial technical team...
Congrats on your article's acceptance in JOSS @strengejacke! Thanks to @alexpghayes for reviewing and @karthik for editing.
:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:
If you would like to include a link to your paper from your README use the following code snippets:
Markdown:
[](https://doi.org/10.21105/joss.01412)
HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01412">
<img src="http://joss.theoj.org/papers/10.21105/joss.01412/status.svg" alt="DOI badge" >
</a>
reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.01412/status.svg
:target: https://doi.org/10.21105/joss.01412
This is how it will look in your documentation:
We need your help!
Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
Most helpful comment
This looks great! @karthik I recommend publication.