Joss-reviews: [PRE REVIEW]: CSaransh: Software Suite to Study Molecular Dynamics Simulations of Collision Cascades

Created on 6 May 2019  Â·  28Comments  Â·  Source: openjournals/joss-reviews

Submitting author: @haptork (Utkarsh Bhardwaj)
Repository: https://github.com/haptork/csaransh
Version: v0.3.1
Editor: @katyhuff
Reviewers: @jmborr, @arose

Author instructions

Thanks for submitting your paper to JOSS @haptork. Currently, there isn't an JOSS editor assigned to your paper.

@haptork if you have any suggestions for potential reviewers then please mention them here in this thread. In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission.

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
C++ CMake Python pre-review

Most helpful comment

Thank you @haptork and @ManojWarrier .

From what I can tell, when you say "The installation steps in the manual include the steps to run the post-processors on the sample data given and view the results using the interface. " you are referring to this statement:

Run command should look something like this: %PATH%/csaransh_pp data//xyz (this works on the sample data given with the repository).

While that's great, it doesn't constitute instructions for making an objective assertion of the software capabilities.

Our requirement for "ok" with regard to testing is that there must be an objective assertion of the correct behavior. When we say assertion, we do mean the "assert" function, or some manual instructions to make an objective comparison of observed results to expected results. Example input files are not tests if the user must determine the accuracy of the results qualitatively and with no guidance about what to expect as a correct result from the instructions.

To be clear, what I'm saying is that when one runs the example files, they'll see results (ideally) but will have to determine for themselves whether the results are correct. On their own, without an assertion command or a correct, expected result to compare against, these are not tests. This is clear from your description: you've used the word "qualitative" while our requirement for "ok" is this objective assertion. The following does not qualify as objective assertion.

checking the behaviour by loading the processed data for visualisation.

Nor does the unguided, qualitative process you have described:

Different results including number of defects, cluster shapes have been verified qualitatively against prior published results and results found using other algorithms while many new results can also be verified qualitatively since the software includes visualisation tools as mentioned by @ManojWarrier .

As the editor of this submission, I can start the review (with the help of our volunteer reviewers above). But, the submission will be stalled until there are tests that meet the criteria. At the moment, I would classify the current capabilities as "Bad (not acceptable): No way for you the reviewer to objectively assess whether the software works." My recommendation is that you instead postpone your submission until the submission is more likely to meet our review requirements.

All 28 comments

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@whedon commands

What happens now?

This submission is currently in a pre-review state which means we are waiting for an editor to be assigned and for them to find some reviewers for your submission. This may take anything between a few hours to a couple of weeks. Thanks for your patience :smile_cat:

You can help the editor by looking at this list of potential reviewers to identify individuals who might be able to review your submission (please start at the bottom of the list). Also, feel free to suggest individuals who are not on this list by mentioning their GitHub handles here.

Attempting PDF compilation. Reticulating splines etc...

Hi @katyhuff, it looks like this submission might be up your alley. Can you edit this?

@whedon assign @katyhuff as editor

OK, the editor is @katyhuff

Thanks @kyleniemeyer and @katyhuff for the quick response.

Following are the names of reviewers that I filtered from the spreadsheet based on some keywords:

  • katyhuff
  • jochym
  • jmborr
  • KEIPERTK
  • DanielLenz
  • zhampel
  • arose
  • MDebasish

@jochym: Paweł T. Jochym, are you able to and interested in taking on this review?
This submission is in need of expertise in physics, in particular, molecular dynamics related to radiation damage cascades, as well as familiarity with the systems and languages involved: python and c++ as well as, ideally, node.js. I know that you have at least a subset of these areas of expertise. I hope you'll let me know if you're able and willing to review it.

Title: CSaransh: Software Suite to Study Molecular Dynamics Simulations of Collision Cascades
Summary: The software suite to post-process, explore and visualize Molecular Dynamics (MD) simulations of collision cascades. It is an elaborate software solution for studing MD results of radiation damage simulations, starting from identifying defects from xyz file to finding correlations, visualizing subcascades to pattern matching clusters and the list goes on. There are many novel methods. The algorithms employed are fast and there is a refreshing interactive webapp to explore the results. You can check the web page for the project (link) and read the manual to go through the different sections of the analysis. The web page shows the results for the MD database of IAEA Challenge on Materials for Fusion 2018.
Article Proof: https://github.com/openjournals/joss-papers/blob/joss.01433/joss.01433/10.21105.joss.01433.pdf
Submitting author: @haptork (Utkarsh Bhardwaj)
Repository: https://github.com/haptork/csaransh
Version: v0.3.1

@jmborr Jose Borreguero, are you able to and interested in taking on this review?
This submission is in need of expertise in physics, in particular, molecular dynamics related to radiation damage cascades, as well as familiarity with the systems and languages involved: python and c++ as well as, ideally, node.js. I know that you have at least a subset of these areas of expertise. I hope you'll let me know if you're able and willing to review it.

Title: CSaransh: Software Suite to Study Molecular Dynamics Simulations of Collision Cascades
Summary: The software suite to post-process, explore and visualize Molecular Dynamics (MD) simulations of collision cascades. It is an elaborate software solution for studing MD results of radiation damage simulations, starting from identifying defects from xyz file to finding correlations, visualizing subcascades to pattern matching clusters and the list goes on. There are many novel methods. The algorithms employed are fast and there is a refreshing interactive webapp to explore the results. You can check the web page for the project (link) and read the manual to go through the different sections of the analysis. The web page shows the results for the MD database of IAEA Challenge on Materials for Fusion 2018.
Article Proof: https://github.com/openjournals/joss-papers/blob/joss.01433/joss.01433/10.21105.joss.01433.pdf
Submitting author: @haptork (Utkarsh Bhardwaj)
Repository: https://github.com/haptork/csaransh
Version: v0.3.1

@arose Alexander Rose, are you able to and interested in taking on this review?
This submission is in need of expertise in physics, in particular, molecular dynamics related to radiation damage cascades, as well as familiarity with the systems and languages involved: python and c++ as well as, ideally, node.js. I know that you have at least a subset of these areas of expertise. I hope you'll let me know if you're able and willing to review it.

Title: CSaransh: Software Suite to Study Molecular Dynamics Simulations of Collision Cascades
Summary: The software suite to post-process, explore and visualize Molecular Dynamics (MD) simulations of collision cascades. It is an elaborate software solution for studing MD results of radiation damage simulations, starting from identifying defects from xyz file to finding correlations, visualizing subcascades to pattern matching clusters and the list goes on. There are many novel methods. The algorithms employed are fast and there is a refreshing interactive webapp to explore the results. You can check the web page for the project (link) and read the manual to go through the different sections of the analysis. The web page shows the results for the MD database of IAEA Challenge on Materials for Fusion 2018.
Article Proof: https://github.com/openjournals/joss-papers/blob/joss.01433/joss.01433/10.21105.joss.01433.pdf
Submitting author: @haptork (Utkarsh Bhardwaj)
Repository: https://github.com/haptork/csaransh
Version: v0.3.1

Hi Katy,

I have looked onto the repository, this project is 90% javascript, 8% C++
and 2% python. I am proficient with python, average with C++ and a total
ignorant with javascript. Thus, I am pretty sure I can't make a proper
review of this package. 😞

On Thu, May 9, 2019 at 10:28 AM Katy Huff notifications@github.com wrote:

@jmborr https://github.com/jmborr Jose Borreguero, are you able to and
interested in taking on this review?
This submission is in need of expertise in physics, in particular,
molecular dynamics related to radiation damage cascades, as well as
familiarity with the systems and languages involved: python and c++ as well
as, ideally, node.js. I know that you have at least a subset of these areas
of expertise. I hope you'll let me know if you're able and willing to
review it.

Title: CSaransh: Software Suite to Study Molecular Dynamics Simulations
of Collision Cascades
Summary: The software suite to post-process, explore and visualize
Molecular Dynamics (MD) simulations of collision cascades. It is an
elaborate software solution for studing MD results of radiation damage
simulations, starting from identifying defects from xyz file to finding
correlations, visualizing subcascades to pattern matching clusters and the
list goes on. There are many novel methods. The algorithms employed are
fast and there is a refreshing interactive webapp to explore the results.
You can check the web page for the project (link) and read the manual to go
through the different sections of the analysis. The web page shows the
results for the MD database of IAEA Challenge on Materials for Fusion 2018.
Article Proof:
https://github.com/openjournals/joss-papers/blob/joss.01433/joss.01433/10.21105.joss.01433.pdf
Submitting author: @haptork https://github.com/haptork (Utkarsh
Bhardwaj)
Repository: https://github.com/haptork/csaransh
Version: v0.3.1

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/1433#issuecomment-490928189,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIVLBQK76DW5QAQFJMYSKDPUQYJNANCNFSM4HLA722Q
.

@jmborr I had the same reaction initially, actually -- It seems to me, though, that all the science is in the post processor (which is solely in this directory: https://github.com/haptork/csaransh/tree/master/csaransh-pp) . The javascript seems to be separate from the science, and is more like a support utility. If you don't mind following the installation instructions in the readme, then perhaps you could just reveiw the scientific contributions in the postprocessor? The rest, I think just supports the interface. @haptork : can you confirm that this is the right interpretation?

@jmborr @katyhuff yes, the algorithms and methods are in the post-processor directory mentioned by katyhuff. The post-processor writes the results it finds in JSON format which are then visualised and presented by web interface, written using javascript present in its entirety in csaransh-server directory.

The C++ code includes algorithms for finding the defects from xyz file of the MD simulation, finding features to characterise the clusters, find similar clusters etc. The python code adds machine learning results such as dimensionality of clusters and cascades using PCA, number of sub-cascades using DBSCAN, dimensionality reduction on features of clusters and their classification using UMAP, t-SNE and HDBSCAN etc.

I could take a look to the c++/python source, but then I was wondering
what best software practices are required for all submissions. Right off
the bat I don't see any tests (unit, integration, or otherwise). From this
fact I assume there's no kind of continuous integration either.
This would be my first review so I ignore if these practices are a must.

On Fri, May 10, 2019 at 2:52 AM haptork notifications@github.com wrote:

@jmborr https://github.com/jmborr @katyhuff
https://github.com/katyhuff yes, the algorithms and methods are in the
post-processor directory mentioned by katyhuff. The post-processor writes
the results it finds in JSON format which are then visualised and presented
by web interface, written using javascript present in its entirety in
csaransh-server directory.

The C++ code includes algorithms for finding the defects from xyz file of
the MD simulation, finding features to characterise the clusters, find
similar clusters etc. The python code adds machine learning results such as
dimensionality of clusters and cascades using PCA, number of sub-cascades
using DBSCAN, dimensionality reduction on features of clusters and their
classification using UMAP, t-SNE and HDBSCAN etc.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/1433#issuecomment-491177913,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIVLBU25ZADI6K4GXINBKTPUULULANCNFSM4HLA722Q
.

@katyhuff I am happy to review this

Thanks both (@jmborr and @arose).

Unfortunately - @jmborr has made an excellent point that I hadn't noticed at first glance. @haptork - I do not see any testing here. There appear to be no unit tests and while the repository holds some example input files (and, perhaps, corresponding output) these do not seem to constitute integration tests. Can you comment on this? Tests are a requirement among the many features we expect in a JOSS submission. From the author guidelines:

Tests
Authors are strongly encouraged to include an automated test suite covering the core functionality of their software.

Good: An automated test suite hooked up to an external service such as Travis-CI or similar
OK: Documented manual steps that can be followed to objectively check the expected functionality of the software (e.g. a sample input file to assert behaviour)
Bad (not acceptable): No way for you the reviewer to objectively assess whether the software works

If you would like to withdraw your submission for the time being to work on these requirements, please let me know.

As co-author, these are my comments:
1) The physics and results from CSaransh are described in the following paper -> https://arxiv.org/abs/1811.10923
2) Regarding tests,
(a) CSaransh reproduces published physics results for the "number of defects" as a function of "primary knock-on atom energy".
(b) The part where a certain type of cluster is chosen and other similar clusters are identified using pattern matching is a test in itself. One sees that similar clusters are identified.

Utkarsh, please decide on including a test suite where the above, and anything else besides, you find appropriate is included.

@katyhuff The software includes sample inputs in the data directory as you noted. The installation steps in the manual include the steps to run the post-processors on the sample data given and view the results using the interface. I think this passes the "OK" guideline for the tests.

Different results including number of defects, cluster shapes have been verified qualitatively against prior published results and results found using other algorithms while many new results can also be verified qualitatively since the software includes visualisation tools as mentioned by @ManojWarrier .

We understand the importance of unit-tests and continuous integration especially as the software development progresses. We will work on making a test-suite for unit-testing possibly after a few more interesting research explorations. However, I believe that the software passes the "OK" guideline as is, since it includes the manual steps to run it on the included sample data and checking the behaviour by loading the processed data for visualisation. In addition to the sample data, the link to IAEA challenge database is included if someone wants to do more elaborate behavioural testing on 76 cascades at different energies and different elements.

Thank you @haptork and @ManojWarrier .

From what I can tell, when you say "The installation steps in the manual include the steps to run the post-processors on the sample data given and view the results using the interface. " you are referring to this statement:

Run command should look something like this: %PATH%/csaransh_pp data//xyz (this works on the sample data given with the repository).

While that's great, it doesn't constitute instructions for making an objective assertion of the software capabilities.

Our requirement for "ok" with regard to testing is that there must be an objective assertion of the correct behavior. When we say assertion, we do mean the "assert" function, or some manual instructions to make an objective comparison of observed results to expected results. Example input files are not tests if the user must determine the accuracy of the results qualitatively and with no guidance about what to expect as a correct result from the instructions.

To be clear, what I'm saying is that when one runs the example files, they'll see results (ideally) but will have to determine for themselves whether the results are correct. On their own, without an assertion command or a correct, expected result to compare against, these are not tests. This is clear from your description: you've used the word "qualitative" while our requirement for "ok" is this objective assertion. The following does not qualify as objective assertion.

checking the behaviour by loading the processed data for visualisation.

Nor does the unguided, qualitative process you have described:

Different results including number of defects, cluster shapes have been verified qualitatively against prior published results and results found using other algorithms while many new results can also be verified qualitatively since the software includes visualisation tools as mentioned by @ManojWarrier .

As the editor of this submission, I can start the review (with the help of our volunteer reviewers above). But, the submission will be stalled until there are tests that meet the criteria. At the moment, I would classify the current capabilities as "Bad (not acceptable): No way for you the reviewer to objectively assess whether the software works." My recommendation is that you instead postpone your submission until the submission is more likely to meet our review requirements.

@katyhuff I have added unit tests with edge cases to the C++ algorithms in the directory csaransh-pp/test with more than 600 assertions. I have also integrated travis CI and codecov. The code coverage badge shows around 90% coverage if we exclude boilerplate in main and printing of json data. The exercise turned out really helpful and I think it can also help in understanding of the behaviour of algorithm implementations. Thank you for making it part of the submission.

@haptork Thanks for adding tests! I think we can go ahead with the review now that the basics have been covered and now that we have two review volunteers (thanks @jmborr and @arose !).

@whedon assign @jmborr as reviewer

OK, the reviewer is @jmborr

@whedon add @arose as reviewer

OK, @arose is now a reviewer

@whedon start review

OK, I've started the review over in https://github.com/openjournals/joss-reviews/issues/1461. Feel free to close this issue now!

@haptork @jmborr @arose : Let's all move over to #1461 where we'll conduct the review!

Was this page helpful?
0 / 5 - 0 ratings