Esmvaltool: A plan for making esmvaltool more stable and user/developer friendly

Created on 25 Apr 2019 · 15Comments · Source: ESMValGroup/ESMValTool

Below I describe an idea for making ESMValTool more friendly to use for developers and users, based on the discussion in #769 and elsewhere. Please comment if you have any ideas about this @ESMValGroup/esmvaltool-coreteam

Split core from diagnostic code

As discussed in #769, it would be desirable to split the core esmvaltool code (i.e. the esmvaltool command, recipe parsing code, preprocessor, etc) from the diagnostics/recipes, so we would get two public GitHub repositories. Private GitHub repositories would then be forks of the public diagnostic/recipes repository.

This would have several advantages:

diagnostic developers can develop their diagnostics against a stable version of esmvaltool installed simply with pip, instead of against the development version that keeps changing
it will no longer be possible to mix changes to the core of esmvaltool with diagnostic development, making it easier to maintain good quality standards and keep pull requests manageable
much less synchronization needed between public and private repository
it will no longer be necessary to install a hefty list of requirements if you just want to use the esmvaltool core, e.g. in a Jupyter notebook (see #1030), or esmvaltool core + diagnostics in a single language.

Packaging and publishing

The esmvaltool core, in it's own public GitHub repository, could be a pure Python package that can be installed from PyPI.

Out of the diagnostics + recipes GitHub repository, we would probably make several different packages:

one for recipes on PyPI
one for Python diagnostics on PyPI
one for NCL diagnostics on PyPI
one for R diagnostics on CRAN
one for Julia diagnostics in the Julia package repository
for diagnostic development in progress it will remain possible to work with (private) diagnostic + recipe GitHub repositories.

and finally we would probably like a single esmvaltool package on PyPI (and Conda) that installs all the others as dependencies, so we can have a single command installation.

EDIT: We will start with a much simpler plan, where we create just two packages on PyPI: esmvalcore and esmvaltool, which has esmvalcore as a dependency.

Source

bouweandela

❤2

All 15 comments

cheers for the plan, General Bouwe @bouweandela :medal_military:
I completely agree with all, but we need to make sure that the diagnostic packages are maintained at the same level as the independent core (continuous integration, good testing etc) which is gonna be hard since we need to designate people for that, and we're not too many at the moment. How do you see we get around that?

valeriupredoi on 25 Apr 2019

I'm not sure if that is quite on topic here, but just to answer your question: the easiest way to do continuous integration/test diagnostics is probably to get a big machine with all the data on it and run the diagnostic and compare the output to a previous run whenever something changes in the code. Everything automated as much as possible. i think that we cannot expect every scientist that contributes a diagnostic to write unit tests, so just running the diagnostic will probably as good as it gets regarding testing.

bouweandela on 26 Apr 2019

👍1

I'm not sure if that is quite on topic here, but just to answer your question: the easiest way to do continuous integration/test diagnostics is probably to get a big machine with all the data on it and run the diagnostic and compare the output to a previous run whenever something changes in the code. Everything automated as much as possible. i think that we cannot expect every scientist that contributes a diagnostic to write unit tests, so just running the diagnostic will probably as good as it gets regarding testing.

In theory all sounds good, man, the true logistics will be much harder for the real situations that we'll face, hence my concern about the number of man hours per qualified dev person that will have to maintain each of the separate language packages. In any case, let's not dabble on it just yet, I am very happy with the plan you put forward irrespective of what sort of manpower issues we may face. That's why we need to get a strong structure of the team in the future too (part of the gov issue) :beer:

valeriupredoi on 26 Apr 2019

As a first step in implementing this, we will need to rename/move some directories and come up with names for them. We need names for the various packages listed above (suggestions welcome!) and will need to move any directories that are not part of the esmvaltool core framework out of the esmvaltool directory into the root of the repository (and give them more meaningful names).

Idea for renaming:

Core repository:
esmvaltool -> esmval? esmvalcore? something else?

Diagnostics repository:
esmvaltool/diag_scripts -> esmvaltool-diagnostics
esmvaltool/utils/cmorizers -> esmvaltool-cmorizers
esmvaltool/utils -> esmvaltool-utils

bouweandela on 1 May 2019

Core repository:
esmvaltool -> esmval? esmvalcore? something else?

What about esmvaltool-core, consistent with the other names you suggested?

One question: shouldn't cmorizers and utils be part of the core?

mattiarighi on 1 May 2019

What about esmvaltool-core, consistent with the other names you suggested?

That would be a rather long name for a Python package, e.g. to use a preprocessor function, you would need the python code

from esmvaltool_core.preprocessor import regrid

One question: shouldn't cmorizers and utils be part of the core?

I think the utils are mostly tools for diagnostic development and not really part of the esmvaltool Python package as it is defined at the moment.

Regarding the cmorizers, wasn't the plan to start using cistools in the future? Why are we writing our own cmorization package? Is that just because cistools doesn't support NCL?

bouweandela on 2 May 2019

Regarding the cmorizers, wasn't the plan to start using cistools in the future? Why are we writing our own cmorization package? Is that just because cistools doesn't support NCL?

Not only, also because not all datasets we use are included in CIS.

mattiarighi on 2 May 2019

Yes, but wouldn't it be better contribute a plugin for those datasets? That will make the cmorization script available to a much wider community and also we avoid doing duplicate work by not having to implement and maintain our own cmorization framework.

bouweandela on 2 May 2019

👍1

I like esmval for the core!
I also like esmval-toolkit that can include cmorizers and utils
I also like esmval-diagnostics
I also like beer :beer:

valeriupredoi on 2 May 2019

😄1

Yes, but wouldn't it be better contribute a plugin for those datasets? That will make the cmorization script available to a much wider community and also we avoid doing duplicate work by not having to implement and maintain our own cmorization framework.

maybe. But bear in mind that the cmorizers are as the diagnostics -> community developed and I bet you my boots only 10% of scientists are familiar with CIS and know how to work and develop with it

valeriupredoi on 2 May 2019

Just another thought in case we decide to split the ESMValTool into two repositories (core and diags): what happens to the documentation? So far, I always saw the documentation as a "user's and developer's guide". Such a manual naturally contains documentation of both, the core functionality and the diagnostics. In some cases, things are mixed up as they do not clearly belong to one or the other part such as, for example, documentation of shared code that does not belong to a single diagnostic or a description of the coding rules and standards that largely apply to both parts. So not sure what to do about the documentation. Any thoughts?

axel-lauer on 3 May 2019

I think we'll need to split the documentation too, This shouldn't be too big a problem, because we can always link from one readthedocs page to another, so there will be no need to duplicate things. A good starting point can be that esmvaltool diagnostics build on top of esmvaltool core, so we would mostly refer from the diagnostic documentation to the core documentation, where needed.

documentation of shared code that does not belong to a single diagnostic

This would be part of the diagnostics repository, as it's shared between diagnostics, not used by the core

a description of the coding rules and standards that largely apply to both parts

I think we can just create a link to the relevant sections in the core documentation from the diagnostic documentation, and add some extra text to describe the differences (e.g. unit tests are not required for diagnostics).

bouweandela on 3 May 2019

I bet you my boots only 10% of scientists are familiar with CIS and know how to work and develop with it

Yes, but at least we wouldn't be forcing them to learn two systems, ESMValTool cmorizers and CIS, just knowing CIS would be enough

bouweandela on 3 May 2019

Splitting the ESMValCore from the ESMValTool is done now.

bouweandela on 23 Mar 2020

👍1

a virtual :beer: to @bouweandela

valeriupredoi on 23 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

fx file retrieval for OBS

valeriupredoi · 4Comments

Add often used scientific regions to ESMValTool

lukasbrunner · 4Comments

Development help tools collection

BenMGeo · 5Comments

Beginner's questions

chris-to-pher · 3Comments

MAGIC WP5 Ensemble Clustering (EnsClus) cluster analysis tool for ensembles of climate model simulations

IreMav · 4Comments