Submitting author: @seanmylaw (Sean Law)
Repository: https://github.com/TDAmeritrade/stumpy
Version: 1.0.0
Editor: @mbobra
Reviewer: @ejolly, @hooman650
Archive: 10.5281/zenodo.3340125
Status badge code:
HTML: <a href="http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29"><img src="http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29/status.svg"></a>
Markdown: [](http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
@ejolly & @hooman650, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.
β¨ Please try and complete your review in the next two weeks β¨
paper.md
file include a list of authors with their affiliations?paper.md
file include a list of authors with their affiliations?Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ejolly, @hooman650 it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews πΏ
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
Attempting PDF compilation. Reticulating splines etc...
@ejolly @hooman650 Thank you for agreeing to review this submission! Whedon generated a checklist and linked a reviewer guide above -- let me know if you have any questions.
π @ejolly @hooman650 How is it going? Would you like more time to review? Do you have any questions? Please let me know!
Sorry @mbobra, Iβve been traveling but I can have this done by the end of this week if thatβs ok.
Ok, I have done a preliminary review of STUMPY.
Summary of work from my perspective:
Stumpy simply computes the euclidean distance between a segment with length window to a sequence of data. While a simple operation, it requires a high computational time and space. Stumpy builds upon the ideas published in several papers that employ FFT and algebra to improve the computational time of such process. The space complexity is handled by simply storing the smallest value for each comparison. In general, the work done sounds interesting and can be handy to find patterns in large time-series.
Comments:
your_time_series = np.random.rand(10000)
window_size = 50 # Approximately, how many data points might be found in a pattern
matrix_profile = stumpy.stump(your_time_series, m=window_size)
left_matrix_profile_index = matrix_profile[2]
right_matrix_profile_index = matrix_profile[3]
idx = 10 # Subsequence index for which to retrieve the anchored time series chain for
anchored_chain = stumpy.atsc(left_matrix_profile_index, right_matrix_profile_index, idx)
all_chain_set, longest_unanchored_chain = stumpy.allc(left_matrix_profile_index, right_matrix_profile_index)
I copy the exception :
index 10 is out of bounds for axis 0 with size 4
Of course the arrays are of size 4 and the index is asking for 10. Please fix.
A comment in regards to the performance comparison graph that is mentioned here; How did the authors of GPU-STOMP implement their algorithm? In tensorflow? I feel like a good implementation is totally feasible in tensorflow, given that recently almost every hard computation is performing way better on GPUs compared to CPUs, I have difficulties believing a good GPU implementation would be beaten by CPUs. But, I might be wrong.
My final comment is in regards to the "window size" input of STUMPY. This could be compared to kernel size in CNNs. This will be always challenging to know prior to analysis. Any suggestions for determining that?
There is no version release number in the repository.
Overview:
In general, I feel like the author has done a good job and STUMPY can be a good contribution to the time-series analysis tool-chain. The documentation look good but more work should be done to make sure all the examples run smoothly and correctly.
@hooman650 Thank you for your thorough review.
Regarding 2:
The author has prepared a nice documentation page as well as contributing guidelines and examples. However, I just ran into an exception as I tried to run the following example from the documentation:
Indeed, this was a typo and we have submitted an issue and fixed it accordingly in both our README and ReadTheDocs. Thank you for pointing this out!
Regarding 3:
A comment in regards to the performance comparison graph that is mentioned here; How did the authors of GPU-STOMP implement their algorithm? In tensorflow? I feel like a good implementation is totally feasible in tensorflow, given that recently almost every hard computation is performing way better on GPUs compared to CPUs, I have difficulties believing a good GPU implementation would be beaten by CPUs. But, I might be wrong.
Unfortunately, I am not the original author of the GPU-STOMP publication/code and the numbers shown were simply extracted from their published paper for comparison. Currently, I only have access to CPUs but it is a top priority in our project roadmap to port this work over to GPUs. We are in the process of looking for assistance and resources from folks at NVIDIA. The initial goal of our scalable CPU implementation was to allow non-tech-savvy scientists to be able to get up and running quickly without needing access to any specialized hardware. We believe that we have achieved this goal.
Regarding 4:
My final comment is in regards to the "window size" input of STUMPY. This could be compared to kernel size in CNNs. This will be always challenging to know prior to analysis. Any suggestions for determining that?
This is an excellent question and has been discussed by the original authors (not me) in _Section D_ of their paper matrix profile II. In summary, the window size is certainly a _user input_ that requires some level of domain expertise. However, the original authors have demonstrated that the matrix profile is robust to varying window sizes and that being "in the ballpark" is often enough to find motifs.
Regarding 5:
There is no version release number in the repository.
We currently provide a version number in the standard setup.py
file and the version number is also accessible within Python via stumpy.__version__
. Perhaps there is a better or more standard place to specify the version release number? Any guidance would be greatly appreciated.
π @seanlaw
Github is showing that there aren't any releases in the repository. Could you please go ahead and create a release?
Could you also include your responses to points 3 and 4 above in the text of the paper along with the appropriate references?
@mbobra Thank you for pointing me to the helpful resources. I've gone back and tagged the commit that coincides with the first upload to PyPI (May 3rd) as v1.0.0. Let me know if that is sufficient.
Regarding points 3 and 4, both references (Matrix Profile I and Matrix Profile II) were already included in the original article proof (note that the references that I mention above are just the preprints that are available directly on the original author's group website and the references in the original article proof are referencing the published IEEE manuscripts).
Hi @mbobra sorry for the delay on my review!
First of all, I'd like to note that Stumpy is a great addition to the time-series analyst's toolkit and is very well-documented, explained, and referenced. I also rather enjoyed the talk. Really nice @seanlaw!
My testing was done using Python 3.6 on Mac OS 10.14.2 and everything installed without issues. I updated my installation given the most recent changes made as per @hooman650's review and can attest that fix for comment 2 now works.
I was unable to test out the performance claims given limited current access to a distributed compute system at this time, but I was able to at least test the functionality by running a local dask server without issues.
Just a few minor suggestions:
Tutorial_1
includes inline plots, it would good to add the %matplotlib inline
command to the top code cell of that notebook so that anyone just downloading and running the notebooks immediately reproduces the prerendered files on githubTutorial_2
is a bit lacking in context and explanation. While it's nice to see how to use some additional functionality and be provided links to the original paper, I might suggest at least a bit more of an explanation as to what and why one might use time-series chains. Tutorial_1
for example, does a great job in providing a high-level overview of the matrix profile and elucidating the impact of the window size free parameterWith those minor changes I think this would make a great addition to JOSS.
@ejolly Thank you for your constructive and useful feedback. Please see my responses below:
Regarding 1:
Since the tutorial examples are provided as jupyter notebooks and the prerendered file for Tutorial_1 includes inline plots, it would good to add the %matplotlib inline command to the top code cell of that notebook so that anyone just downloading and running the notebooks immediately reproduces the prerendered files on github
This is an excellent suggestion and we've filed/fixed/closed the this issue as per your recommendation. For completeness, we have also provided interactive Binder notebooks in addition to the pre-rendered notebooks so that the user can "try before installing".
Regarding 2:
In its present form, Tutorial_2 is a bit lacking in context and explanation. While it's nice to see how to use some additional functionality and be provided links to the original paper, I might suggest at least a bit more of an explanation as to what and why one might use time-series chains. Tutorial_1 for example, does a great job in providing a high-level overview of the matrix profile and elucidating the impact of the window size free parameter
We completely agree and it is one of our older issues dating back to May 18th that we are hoping to get some help on identifying a good example dataset for and writing up a more complete tutorial like Tutorial 1. Currently, the tutorial only demonstrates the time series chains API and we'd really like to provide some more intuitive insight with a better data set than the current Taxi data set.
In all fairness, the goal of the STUMPY software is to faithfully implement the algorithms based on the published papers (not written by us) and so we strongly recommend that the user read the papers (clearly referenced) as the papers can provide far more detail and insight than STUMPY can. One needs to keep in mind that without STUMPY, there is really no scalable, performant, and easy to install implementation for computing the matrix profile and so our current focus is to provide a suite of tools based on the published papers and to save the user the time and headache from having to implement the published papers (which are not without errors and missing important implementation details). Eventually, once we've created a community/user base and developed most of the published features then we will certainly spend more time improving the tutorials. It's probably important to point out that STUMPY was created and currently maintained by a single person (me) and, for better or for worse, it is mostly done on my personal time and, without additional assistance, one person can only do so much.
While I completely and wholeheartedly agree that the tutorials could be better (and they will be once the feature set stabilizes), I would respectfully argue that the JOSS requirements make no mention of tutorials and so they are a "nice to have" but should not be used as a criteria to judge the completeness of the software. From an API documentation, unit testing/code coverage, installation instructions, and example usage standpoint, we humbly believe that this open source software meets the JOSS requirements.
Regarding 3:
More of a minor suggestion, but it might be nice to have the text link to the documentation a bit higher up on the github README page, for example in the website field of the repo description or edit the text in the first paragraph to clearly mention the documentation site. While badge and "matrix normal" links work, the actualy documentation site itself isn't discussed until halfway through the README. Feel free to ignore this if you prefer.
This is good feedback. We've filed/fixed/closed the following new issue and added a clearer link in the opening paragraph of the README.
@seanlaw the binder addition is a great one and I think will be great for new users.
Regarding my point 2:
I apologize as I should have been more clear. I completely agree with your response that publication in JOSS should be _not_ contingent on you adding a more comprehensive tutorial 2 and my comment was more of a suggestion for something that would improve the tutorials, i.e. would be "nice to have". From my review, you have already done a fantastic job of documenting, testing, and providing the requisite high-level explanation of the package functionality as per JOSS requirements.
I completely understand that creating and maintaining a solo project is a huge demand on your time and it will be great to see how tutorials and functionality grow with the community base!
@ejolly No need to apologize as I assumed no ill intent. Thank you (as well as to @hooman650 and @mbobra) for taking the time to review! I really appreciate the valuable feedback.
@hooman650 and @ejolly Thank you so much for reviewing! We really appreciate your time and effort βοΈ
@seanlaw We're almost there! Can you please archive your release on Zenodo to obtain a DOI and then put that in your README.rst file? After that I think we're done π
@mbobra I've added the DOI as a badge a the top of the README.rst. Is that what you mean?
@whedon set 10.5281/zenodo.3340125 as archive
OK. 10.5281/zenodo.3340125 is the archive.
@whedon set 1.0.0 as version
OK. 1.0.0 is the version.
@whedon check references
Attempting to check references...
```Reference check summary:
OK DOIs
MISSING DOIs
INVALID DOIs
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@openjournals/joss-eics This paper is ready for acceptance! Nice work @seanlaw π
Thanks @mbobra, @hooman650, and @ejolly! This was a wonderful and pleasant submission experience. Hopefully, I will run into you at a conference one day!
@whedon accept
Attempting dry run of processing paper acceptance...
Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/842
If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/842, then you can now move forward with accepting the submission by compiling again with the flag deposit=true
e.g.
@whedon accept deposit=true
@danielskatz The final proof looks good. Is there anything else that I need to do or is whedonβs command for you to handle? I just want to make sure I am not holding up the process.
Sent with GitHawk
It's fine - between being on a plane where I couldn't see the final PDF to check it, and then driving and sleeping, I've just gotten back to it :)
Thanks to @ejolly & @hooman650 for reviewing and to @mbobra for editing
@whedon accept deposit=true
Doing it live! Attempting automated processing of paper acceptance...
π¦π¦π¦ π Tweet for this paper π π¦π¦π¦
π¨π¨π¨ THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! π¨π¨π¨
Here's what you must now do:
Party like you just published a paper! πππ¦ππ»π€
Any issues? notify your editorial technical team...
:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:
If you would like to include a link to your paper from your README use the following code snippets:
Markdown:
[](https://doi.org/10.21105/joss.01504)
HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01504">
<img src="http://joss.theoj.org/papers/10.21105/joss.01504/status.svg" alt="DOI badge" >
</a>
reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.01504/status.svg
:target: https://doi.org/10.21105/joss.01504
This is how it will look in your documentation:
We need your help!
Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
@danielskatz I just spotted a minor typo in the PDF. Is there some way that I can fix it?
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@arfon I have fixed the typo in the original source repository. Can you please take a look?
@arfon I have fixed the typo in the original source repository. Can you please take a look?
Done. It could take a few hours to show up as fixed on the JOSS site as there's caching in place for the PDFs.
Thanks, @arfon! It looks good now
Most helpful comment
@ejolly @hooman650 Thank you for agreeing to review this submission! Whedon generated a checklist and linked a reviewer guide above -- let me know if you have any questions.