Esmvaltool: Always apply CMOR short_name/standard_name/long_name?

Created on 13 Sep 2018 · 13Comments · Source: ESMValGroup/ESMValTool

At the moment, a cube has to have the correct standard name to be loaded by the preprocessor as a safety measure. This means that if the cube has the wrong standard name, a file level fix needs to be applied, which means copying the file to the preprocessor directory. We could relax this constraint a bit so we can instead have a cheap metadata fix, with the folllowing steps:

1) try to load with short_name, standard_name, long_name as constraints. If this returns one or more cubes: go to 4), else continue with 2). If no cube loaded, raise exception.
2) If there is no cube loaded, try to load without constraints and if we find only one cube, go to to 4.
3) If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.
3) Apply short_name, standard_name and long_name from the cmor table in the fix_metadata step, to make sure these are set to the right value.

Anyone if favour or against? Note that there is some risk of loading something completely different (e.g. a different variable) without noticing if we would apply this.

standards

Source

bouweandela

👍2

All 13 comments

Sounds like a good approach. :+1:

I think that the risk of loading something wrong is pretty low, since the variable name is also given in the dataset filename.

mattiarighi on 13 Sep 2018

👍1

If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.

Careful with this one - have just been through the hoops with a totally busted OBS file last week - damn thing had three cubes inside, one time-axis with no DimCoords and two with identical DimCoords but only one was the actual data, the other was the errors for each data point

valeriupredoi on 13 Sep 2018

That means that the OBS file was not generated properly.
But this is an issue of the reformat scripts, not of the preprocessor.

In this case, please report problems here.

mattiarighi on 13 Sep 2018

@mattiarighi hell no, that thing was probably generated on a NOAA research plane while flying through a hurricane - cheers for the link, was not aware of it - I forwarded it to Ranjini, she is currently working on formatting these sort of crazy OBS files (that one was hers)

valeriupredoi on 13 Sep 2018

The framework for the reformat scripts is not yet ported to v2.
But if Ranjini can write a script for cmorizing the dataset you mentioned, we can plug it in later.

mattiarighi on 13 Sep 2018

👍1

yes, that's what I suggested she should do

valeriupredoi on 13 Sep 2018

If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.

When I have found this scenario with iris is usually because the coordinates attribute was not filled in the data variable and it had 2D coordinates. Something like this is correctly loaded (if dimensions and var name matches you don't need to add it to the coordinates attribute)

var(i,j, time)
   coordinates = "latitude longitude"

latitude(i,j)

longitude(i,j)

time(time)

This leads to loading three cubes: var, latitude and longitude. I think we should be able to fix at least those cases

var(i,j, time)

latitude(i,j)

longitude(i,j)

time(time)

jvegasbsc on 18 Sep 2018

I am in strong favor of something like the changes proposed by @bouweandela . As an intermediate step before loading the cube with no constraints (step 2), we could try to load the cube with only one out of the three constraints long_name, standard_name, short_name (trying one after the other, only one at a time).
This will help a lot with some of the obs data that we have at the moment. Can we have this change soon please?

axel-lauer on 27 Nov 2018

In that case I think the priority order should be:
1a. try all
1b. try short_name and standard_name only
1c. try short_name only

mattiarighi on 27 Nov 2018

Did #755 address this as well?

mattiarighi on 14 Feb 2019

🎉1

Not really. This was addressed on #638. Since that merge, this problems can (and should) be solved by using model fixes at the fix_metadata step.

jvegasbsc on 14 Feb 2019

Wrong question.
New question: can we close this?

mattiarighi on 14 Feb 2019

Yes!

jvegasbsc on 14 Feb 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Monthly ESMValtool meeting December

bouweandela · 4Comments

fx file retrieval for OBS

valeriupredoi · 4Comments

Development help tools collection

BenMGeo · 5Comments

Add often used scientific regions to ESMValTool

lukasbrunner · 4Comments

Start talking of datasets instead of models

jvegasbsc · 4Comments