Esmvaltool: Always apply CMOR short_name/standard_name/long_name?

Created on 13 Sep 2018  路  13Comments  路  Source: ESMValGroup/ESMValTool

At the moment, a cube has to have the correct standard name to be loaded by the preprocessor as a safety measure. This means that if the cube has the wrong standard name, a file level fix needs to be applied, which means copying the file to the preprocessor directory. We could relax this constraint a bit so we can instead have a cheap metadata fix, with the folllowing steps:

1) try to load with short_name, standard_name, long_name as constraints. If this returns one or more cubes: go to 4), else continue with 2). If no cube loaded, raise exception.
2) If there is no cube loaded, try to load without constraints and if we find only one cube, go to to 4.
3) If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.
3) Apply short_name, standard_name and long_name from the cmor table in the fix_metadata step, to make sure these are set to the right value.

Anyone if favour or against? Note that there is some risk of loading something completely different (e.g. a different variable) without noticing if we would apply this.

standards

All 13 comments

Sounds like a good approach. :+1:

I think that the risk of loading something wrong is pretty low, since the variable name is also given in the dataset filename.

  1. If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.

Careful with this one - have just been through the hoops with a totally busted OBS file last week - damn thing had three cubes inside, one time-axis with no DimCoords and two with identical DimCoords but only one was the actual data, the other was the errors for each data point

That means that the OBS file was not generated properly.
But this is an issue of the reformat scripts, not of the preprocessor.

In this case, please report problems here.

@mattiarighi hell no, that thing was probably generated on a NOAA research plane while flying through a hurricane - cheers for the link, was not aware of it - I forwarded it to Ranjini, she is currently working on formatting these sort of crazy OBS files (that one was hers)

The framework for the reformat scripts is not yet ported to v2.
But if Ranjini can write a script for cmorizing the dataset you mentioned, we can plug it in later.

yes, that's what I suggested she should do

If we find more then one cube in step 2: it is not clear what we have, so we could continue with some checks on the coordinates to find the cube with the data in it, e.g. if the cube has time, latitude, longitude coordinates. If after this check we still have more than one cube, raise an exception.

When I have found this scenario with iris is usually because the coordinates attribute was not filled in the data variable and it had 2D coordinates. Something like this is correctly loaded (if dimensions and var name matches you don't need to add it to the coordinates attribute)

var(i,j, time)
   coordinates = "latitude longitude"

latitude(i,j)

longitude(i,j)

time(time)

This leads to loading three cubes: var, latitude and longitude. I think we should be able to fix at least those cases

var(i,j, time)

latitude(i,j)

longitude(i,j)

time(time)

I am in strong favor of something like the changes proposed by @bouweandela . As an intermediate step before loading the cube with no constraints (step 2), we could try to load the cube with only one out of the three constraints long_name, standard_name, short_name (trying one after the other, only one at a time).
This will help a lot with some of the obs data that we have at the moment. Can we have this change soon please?

In that case I think the priority order should be:
1a. try all
1b. try short_name and standard_name only
1c. try short_name only

Did #755 address this as well?

Not really. This was addressed on #638. Since that merge, this problems can (and should) be solved by using model fixes at the fix_metadata step.

Wrong question.
New question: can we close this?

Yes!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bouweandela picture bouweandela  路  4Comments

valeriupredoi picture valeriupredoi  路  4Comments

BenMGeo picture BenMGeo  路  5Comments

lukasbrunner picture lukasbrunner  路  4Comments

jvegasbsc picture jvegasbsc  路  4Comments