Below I describe an idea for making ESMValTool more friendly to use for developers and users, based on the discussion in #769 and elsewhere. Please comment if you have any ideas about this @ESMValGroup/esmvaltool-coreteam
As discussed in #769, it would be desirable to split the core esmvaltool code (i.e. the esmvaltool command, recipe parsing code, preprocessor, etc) from the diagnostics/recipes, so we would get two public GitHub repositories. Private GitHub repositories would then be forks of the public diagnostic/recipes repository.
This would have several advantages:
pip, instead of against the development version that keeps changingThe esmvaltool core, in it's own public GitHub repository, could be a pure Python package that can be installed from PyPI.
Out of the diagnostics + recipes GitHub repository, we would probably make several different packages:
and finally we would probably like a single esmvaltool package on PyPI (and Conda) that installs all the others as dependencies, so we can have a single command installation.
EDIT: We will start with a much simpler plan, where we create just two packages on PyPI: esmvalcore and esmvaltool, which has esmvalcore as a dependency.
cheers for the plan, General Bouwe @bouweandela :medal_military:
I completely agree with all, but we need to make sure that the diagnostic packages are maintained at the same level as the independent core (continuous integration, good testing etc) which is gonna be hard since we need to designate people for that, and we're not too many at the moment. How do you see we get around that?
I'm not sure if that is quite on topic here, but just to answer your question: the easiest way to do continuous integration/test diagnostics is probably to get a big machine with all the data on it and run the diagnostic and compare the output to a previous run whenever something changes in the code. Everything automated as much as possible. i think that we cannot expect every scientist that contributes a diagnostic to write unit tests, so just running the diagnostic will probably as good as it gets regarding testing.
I'm not sure if that is quite on topic here, but just to answer your question: the easiest way to do continuous integration/test diagnostics is probably to get a big machine with all the data on it and run the diagnostic and compare the output to a previous run whenever something changes in the code. Everything automated as much as possible. i think that we cannot expect every scientist that contributes a diagnostic to write unit tests, so just running the diagnostic will probably as good as it gets regarding testing.
In theory all sounds good, man, the true logistics will be much harder for the real situations that we'll face, hence my concern about the number of man hours per qualified dev person that will have to maintain each of the separate language packages. In any case, let's not dabble on it just yet, I am very happy with the plan you put forward irrespective of what sort of manpower issues we may face. That's why we need to get a strong structure of the team in the future too (part of the gov issue) :beer:
As a first step in implementing this, we will need to rename/move some directories and come up with names for them. We need names for the various packages listed above (suggestions welcome!) and will need to move any directories that are not part of the esmvaltool core framework out of the esmvaltool directory into the root of the repository (and give them more meaningful names).
Idea for renaming:
Core repository:
esmvaltool -> esmval? esmvalcore? something else?
Diagnostics repository:
esmvaltool/diag_scripts -> esmvaltool-diagnostics
esmvaltool/utils/cmorizers -> esmvaltool-cmorizers
esmvaltool/utils -> esmvaltool-utils
Core repository:
esmvaltool -> esmval? esmvalcore? something else?
What about esmvaltool-core, consistent with the other names you suggested?
One question: shouldn't cmorizers and utils be part of the core?
What about esmvaltool-core, consistent with the other names you suggested?
That would be a rather long name for a Python package, e.g. to use a preprocessor function, you would need the python code
from esmvaltool_core.preprocessor import regrid
One question: shouldn't cmorizers and utils be part of the core?
I think the utils are mostly tools for diagnostic development and not really part of the esmvaltool Python package as it is defined at the moment.
Regarding the cmorizers, wasn't the plan to start using cistools in the future? Why are we writing our own cmorization package? Is that just because cistools doesn't support NCL?
Regarding the cmorizers, wasn't the plan to start using cistools in the future? Why are we writing our own cmorization package? Is that just because cistools doesn't support NCL?
Not only, also because not all datasets we use are included in CIS.
Yes, but wouldn't it be better contribute a plugin for those datasets? That will make the cmorization script available to a much wider community and also we avoid doing duplicate work by not having to implement and maintain our own cmorization framework.
I like esmval for the core!
I also like esmval-toolkit that can include cmorizers and utils
I also like esmval-diagnostics
I also like beer :beer:
Yes, but wouldn't it be better contribute a plugin for those datasets? That will make the cmorization script available to a much wider community and also we avoid doing duplicate work by not having to implement and maintain our own cmorization framework.
maybe. But bear in mind that the cmorizers are as the diagnostics -> community developed and I bet you my boots only 10% of scientists are familiar with CIS and know how to work and develop with it
Just another thought in case we decide to split the ESMValTool into two repositories (core and diags): what happens to the documentation? So far, I always saw the documentation as a "user's and developer's guide". Such a manual naturally contains documentation of both, the core functionality and the diagnostics. In some cases, things are mixed up as they do not clearly belong to one or the other part such as, for example, documentation of shared code that does not belong to a single diagnostic or a description of the coding rules and standards that largely apply to both parts. So not sure what to do about the documentation. Any thoughts?
I think we'll need to split the documentation too, This shouldn't be too big a problem, because we can always link from one readthedocs page to another, so there will be no need to duplicate things. A good starting point can be that esmvaltool diagnostics build on top of esmvaltool core, so we would mostly refer from the diagnostic documentation to the core documentation, where needed.
documentation of shared code that does not belong to a single diagnostic
This would be part of the diagnostics repository, as it's shared between diagnostics, not used by the core
a description of the coding rules and standards that largely apply to both parts
I think we can just create a link to the relevant sections in the core documentation from the diagnostic documentation, and add some extra text to describe the differences (e.g. unit tests are not required for diagnostics).
I bet you my boots only 10% of scientists are familiar with CIS and know how to work and develop with it
Yes, but at least we wouldn't be forcing them to learn two systems, ESMValTool cmorizers and CIS, just knowing CIS would be enough
Splitting the ESMValCore from the ESMValTool is done now.
a virtual :beer: to @bouweandela