Hi ESMValTool community,
I have a question: is there any recipe feature, which allows concatenation of two observational datasets over 'time' dimension?
Real life example:
There is an observational dataset "AAA" with a variable 'var', which is available from 1923 to 1967, there is also another observational dataset ''BBB" with the same variable 'var' but from 1968 to 2020. What is needed: a continuous dataset "AAA-BBB" with variable 'var' from 1923 to 2020.
I couldn't find any answers myself, so I came to two possible solutions:
a) concatenate in the diagnostic (currently implemented), which is not beautiful. Additionally, I used the Dataset class to add a new entry, but I haven't found any way to remove "AAA" and "BBB" entries from the class, which is annoying latter on for the diagnostic and results in additional if loops.
b) concatenate the datasets on the step of cmorization and provide "AAA-BBB" from there. Since "AAA" and "BBB" are different datasets, and they come from two independent sources, it doesn't sound appealing.
I was wondering, if there is maybe a more elegant solution?
P.S. It's a python diagnostic for IPCC AR6 and deals with snow cover extent in the Northern hemisphere.
you can import the Core's concatenator as from esmvalcore._io import concatenate func here - this is quite a robust function that should deal with possible data irregularities. Note that this should be done in the diagnostic, since this is an internal core function and you can't call it as a preprocessor (maybe we can generalize it to a preprocessor function in the future)
Thanks a lot for your reply @valeriupredoi ! I was wondering, my code generally loops over projects and then datasets, and to do so it uses the Datasets class. So if one concatenates AAA and BBB in the diagnostic, it seems, that the concatenated cube and its data_info dictionary does not automatically get into the Datasets, correct? And if not, is there any chance to remove the datasets from the Datasets class? Or should I open another issue for it?
Maybe @schlunma would be able to help out with this, as he has implemented the Dataset class.
As an alternative, you could use the group_metadata function to group the metadata describing the preprocessed files as shown in the example diagnostic to group your input data. If you then load the files with iris that you want to combine into a cube, that should probably work fine.
To help you better with doing this, it would be really helpful if you could share the part of the recipe where you define the variables and the datasets that you want to use.
I would really recommend you use the group_metadata and select_metadata functions mentioned by @bouweandela. The Dataset class is a remnant from version 1 that I implemented in version 2 before @bouweandela added the group_metadata and select_metadata functions. It is missing several key functionalities and is overly complex in comparison to the new interface.
@bouweandela I really think we should deprecate the Dataset, Variable and Variables classes. I already wanted to delete them two years ago (#712), but couldn't since people still used them.
@bouweandela I really think we should deprecate the Dataset, Variable and Variables classes. I already wanted to delete them two years ago (#712), but couldn't since people still used them.
I would be fine with that, would you be willing to open a pull request? Initially it should just throw a deprecation warning and say in the documentation that this will be removed in two versions from the version in which they are deprecated, e.g. deprecate in v2.2 and remove in v2.4. Before the 2.4 release we would then need to actually remove it and also update any existing diagnostics, but I think it shoudn't be too much work to change the existing recipes.
@bouweandela and @schlunma Thanks, I'll try to correct the diagnostic and see if there are any problems with group_metadata and select_metadata.