Esmvaltool: Multi model statistics output not further preprocessed

Created on 2 Jul 2018  路  17Comments  路  Source: ESMValGroup/ESMValTool

Issue reported by @ledm:

I've found that putting the area_average, volume_average and time_average preprocessor after the multi-model preprocessor means that the multi-model dataset hasn't had the same manipulations as the other models. Ie, in my case, multi-model data still contains all time points, where as the rest of the models are a time average. Similarly, the volume_average should have time-only dimensional, where as the multi-model contains latitude, longitude, time and depth dimensions.

This previously was no problem because the multi model step was the final step. The reason that I did not implement further preprocessing steps on the multi model statistics cubes, is that there are no settings specified for those datasets in the recipe, so I'm not sure what preprocessor steps with what settings should be applied to them.

enhancement

All 17 comments

The datasets list should be updated after multi_model_stats, so that it contains MultiModelMean and/or MultiModelMedian as additional datasets.

In this way, every preprocessing step coming after multi_model_stats in the chain will be applied consistently to all datasets.

Not at the moment, now the settings used by the preprocessor are determined from the recipe before running anything and those settings are specific for each dataset, because some preprocessor steps depend on the dataset, e.g the fixes. I guess I could use those settings that are not dataset specific and come after the multi model statistics step and use those for the multi model output though.

The problem could be that multi_model is also somewhat dataset specific (due to the exclude key).

but MultiModel[statistic]*.nc files are part of the dataset list, together with all the other datasets, additional datasets and OBS's no? At least I know I've put this in a couple months ago - has this been removed?

the exclude key determines which models are not thrown in the multimodel stats computation, it should not exclude the multimodel datasets from being members of the 'datasets' list...

I just stumbled across this issue in one of my recipes, do we have any solution for this yet?

No, then the issue would have been closed. However, I'm working on it in the version2_provenance branch, since I had to rewrite part of the preprocessor anyway to accomodate for recording provenance information.

Hi Manuel!

There is a solution! I used the custom order preprocessor flag set to true, then put the multi-model preprocessor at the end of the chain.
Something like this:

  prep_timeseries_1: # For 2D fields
    custom_order: true
    average_region:
      coord1: longitude
      coord2: latitude
    multi_model_statistics:
      span: overlap
      statistics: [mean ]

More examples here: https://github.com/ESMValGroup/ESMValTool/blob/version2_development/esmvaltool/recipes/recipe_OceanPhysics.yml

Lee

P.S. Nice to put a face to the name!

Hi Lee,

thanks, that looks really good, I will try that!

P.S. Yeah, definitely! I could have added a profile picture before, though :smile:

According to the design document the multi model statistics function should be at the very end anyway, maybe we should just change that? Or is there a good reason why
'depth_integration',
'average_region',
'average_volume',
'zonal_means',
'seasonal_mean',
'time_average',
are performed after multi_model_statistics? @mattiarighi ?

Multi-model should be performed before the time/area operations. This is because, for example, the multi-model mean of a regional-average is not the same as the regional-average of the multi-model mean, and the latter is more accurate.

But, as @ledm pointed out some time ago, there might be some situation where the multi-model stats are too expensive and performing it afer some time/area averaging would reduce the amount of data to process.

The default should be multi-model before time/area, but we need to be flexible.

Thanks for explaining, I now see that time/area operations are not in the design document at all. Maybe that should be updated?

Here is the latest version...

esmvaltool_workflow

Done.

Also, in this figure, Time/Area is not fully correct, as we also have several z-axis preprocessors. Could we replace "Time/Area Subsetting" and "Time/Area Statistics" with "Time & Space" or "Temporal & Spatial"?

Good point!

esmvaltool_workflow

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jvegasbsc picture jvegasbsc  路  4Comments

IreMav picture IreMav  路  4Comments

bouweandela picture bouweandela  路  4Comments

valeriupredoi picture valeriupredoi  路  4Comments

BenMGeo picture BenMGeo  路  5Comments