Esmvaltool: CMORize Duveiller2018

Created on 30 Apr 2019  路  33Comments  路  Source: ESMValGroup/ESMValTool

I am building on cmorize_obs_Landschuetzer2016.py to cmorize another obs dataset. This is my yml file:

Content of Duveiller2018.yml

---
# Common global attributes for Cmorizer output
attributes:
  dataset_id: Duveiller2018 #TODO where should I document the full reference to this dataset?
#  version: 'v2016' # There is no version.
  tier: 2
  modeling_realm: clim
  project_id: CMIP5 #TODO What to put here?
  source: 'https://www.nature.com/articles/sdata201814'
  reference: 'Duveiller2018'
  comment: ''

# Variables to cmorize
variables:
  alb:
    mip: Amon
    # Match CMOR variables with input file one
    raw: Delta_albedo
    # input file name
    file: albedo_IGBPgen.nc

I follow very closely the script by @tomaslovato . Also the variable 'alb' is defined in the custom tables. However, I run into the following error:

 2019-04-30 15:08:21,986 INFO     esmvaltool.utils.cmorizers.obs.cmorize_obs_Duveiller2018,89    CMORizing var alb from file /net/exo/landclim/PROJECTS/C3S/datadir/rawobsdir/Tier2/Duveiller2018/albedo_IGBPgen.nc
Traceback (most recent call last):
  File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/bin/cmorize_obs", line 11, in <module>
    load_entry_point('ESMValTool', 'console_scripts', 'cmorize_obs')()
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 201, in execute_cmorize
    _cmor_reformat(config_user, obs_list)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 260, in _cmor_reformat
    module_root + dataset)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 122, in _run_pyt_script
    py_cmor.cmorization(in_dir, out_dir)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 103, in cmorization
    extract_variable(var_info, raw_info, out_dir, glob_attrs)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 50, in extract_variable
    var = var_info.short_name
AttributeError: 'NoneType' object has no attribute 'short_name'

I tried to further trace down the problem, but at some stage I got lost. See my own attempt of a traceback below, if it helps.

# PROBLEM: var_info is None, so the below function returns none. Where is this function? 
var_info = cmor_table.get_variable(vals['mip'], var)

# 
cmor_table = CFG['cmor_table']

# 
CFG = _read_cmor_config('Duveiller2018.yml')

# 
def _read_cmor_config(cmor_config):
    cfg['cmor_table'] = \
        CMOR_TABLES[cfg['attributes']['project_id']]

# So the CMOR table is defined in the YML. Makes sense. I provide it as CMIP5. So the above line reads:
CMOR_TABLES['CMIP5']

# But what exactly is the object CMOR_TABLES? It is imported at the top as:
from esmvaltool.cmor.table import CMOR_TABLES

# It starts as an empty dictionary in table.py
CMOR_TABLES = {}

# So where is this object initiated? At this stage I am lost.

Is it correct that I take CMIP5 as a project ID? Or should it indicate 'custom' since this is a custom variable? Any ideas on what goes wrong here?

help wanted observations standards

Most helpful comment

Since alb is a custom variable you need to read from the custom table.

All 33 comments

Since alb is a custom variable you need to read from the custom table.

The above error has been solved, thanks.

I ran into another error. I am quite sure that the 'standard_name' in CMOR_alb.dat is supposed to be left empty, but it raises an error. However, changing it to some random other valid standard name does not remove the error, so it seems not fully related.

2019-05-06 15:24:49,085 INFO     esmvaltool.utils.cmorizers.obs.cmorize_obs_Duveiller2018,89    CMORizing var alb from file /net/exo/landclim/PROJECTS/C3S/datadir/rawobsdir/Tier2/Duveiller2018/albedo_IGBPgen.nc
Traceback (most recent call last):
  File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/bin/cmorize_obs", line 11, in <module>
    load_entry_point('ESMValTool', 'console_scripts', 'cmorize_obs')()
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 201, in execute_cmorize
    _cmor_reformat(config_user, obs_list)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 260, in _cmor_reformat
    module_root + dataset)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 122, in _run_pyt_script
    py_cmor.cmorization(in_dir, out_dir)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 103, in cmorization
    extract_variable(var_info, raw_info, out_dir, glob_attrs)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 63, in extract_variable
    _fix_var_metadata(cube, var_info)
  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/utilities.py", line 43, in _fix_var_metadata
    cube.standard_name = var_info.standard_name
  File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/iris/_cube_coord_common.py", line 128, in standard_name
    raise ValueError('%r is not a valid standard_name' % name)
ValueError: '' is not a valid standard_name

@tomaslovato or @valeriupredoi can you help?

@bascrezee At first I would say that it is a problem with the standard_name definition...

I saw that the following branch exists version2_development_cmorize_duveiller2018
If it is yours or connected to this issue, could you please upload in there both the Duveiller2018.yml and cmorize_obs_Duveiller2018.py so it will much easier to reproduce the error !

I just staged the files and pushed them. Thanks for looking into this!

the error is a standard iris error for non-standard standard names (CF conventions) :grin:

Here is an example of a custom cmor table for a variable which will not have any standard name since otherwise will break CF conventions and hence get the iris error above

!----------------------------------
! Variable attributes:
!----------------------------------
standard_name:
units:             1
cell_methods:      time: mean

the problem here is that the custom cmor table will not contain any entry for standard_name since it's a derived variable so the cmorizer will always fail because of that cube.standard_name = var_info.standard_name line. So we need to plug in a special case in the cmorizer utilities that accounts for derived variables. That's not going to be easy because the purpose of the cmorizer is to make cmor-compliant data that also adheres to CF standards; any way you can grab the rsds and rsus datasets so alb can be derived internally in ESMValTool?

Thanks Valeriu, I think I kind of get what you mean.

What you suggest as a solution, is not a solution here, since this observational dataset has no rsds or rsus. There is just values of (difference in) albedo.

in that case put a check in utilities.py eg

if var_info.standard_name == '':
    cube.standard_name = None

that will save the cube ok and will be ok when running it through ESMValTool since standard_name is None anyway from the custom table

What you suggest as a solution, is not a solution here, since this observational dataset has no rsds or rsus. There is just values of (difference in) albedo.

That's correct.
Derived variables are designed for models only, in order to compare with a variable which is only available in the OBS.

Solution works :) I'll keep the issue open until I finished the CMORization :)

I now arrived at taking care of the 'time' axis. This is a somewhat special case, since it is a climatological dataset. How should I deal with this within ESMValTool? (See ncdump below). There are CF conventions describing how NetCDF files with climatological statistics should look like, however, since the original dataset does not adhere to these conventions, it would be involving to get there... Any guidance?

Here is the ncdump:

netcdf albedo_IGBPgen {
dimensions:
    lon = 360 ;
    lat = 180 ;
    mon = 12 ;
    iTr = 6 ;
variables:
    double lon(lon) ;
        lon:units = "degreesE" ;
        lon:long_name = "Longitude" ;
    double lat(lat) ;
        lat:units = "degreesN" ;
        lat:long_name = "Latitude" ;
    int mon(mon) ;
        mon:units = "months" ;
        mon:long_name = "Month" ;
    double iTr(iTr) ;
        iTr:long_name = "Vegetation transition code" ;
    float Delta_albedo(iTr, mon, lat, lon) ;
        Delta_albedo:_FillValue = NaNf ;
        Delta_albedo:long_name = "Difference in surface albedo for a given vegetation cover transition" ;
    float SD_Delta_albedo(iTr, mon, lat, lon) ;
        SD_Delta_albedo:_FillValue = NaNf ;
        SD_Delta_albedo:long_name = "St.Dev. on the diff. in surface albedo for a given vegetation cover transition" ;
    float N_Delta_albedo(iTr, mon, lat, lon) ;
        N_Delta_albedo:units = "samples" ;
        N_Delta_albedo:_FillValue = NaNf ;
        N_Delta_albedo:long_name = "Number of samples from which the aggregated estimate is made" ;
}

Climatological data are not officially supported yet by Iris (https://github.com/SciTools/iris/issues/2904). Soon it will be possible to vote for this functionality in Iris (https://github.com/SciTools/iris/issues/3307). I now wonder if it makes sense to CMORize this dataset at this moment. Is it possible to simply read and plot this dataset in a custom diagnostic without running the CMORizing script? @mattiarighi Thanks for your help :)

@ledm has cmorized some climatological data from the WOA dataset, you can try to use his script as an example.

@bascrezee Actually You can define the timeline of the dataset using time instead of mon, by setting the correct year of reference for the climatology as done for WOA data. This make even more sense since the climatology is representative of a certain period and it should be better to have it explicitly associated to the data .

You can add a custom variable for the reference year similarly to WOA
https://github.com/ESMValGroup/ESMValTool/blob/0b4ef0e7b1f124897a75981b0c82e47153742068/esmvaltool/utils/cmorizers/obs/cmor_config/WOA.yml#L40-L43

and then read it within the cmorization function of your cmorizer script using CFG['custom']['years'] and finally apply/set the time values to the cube, e.g, within extract_variable.

Sounds like a good approach. The original data contains a monthly climatology over 4 years (2008-2012). Is my understanding correct, that with the approach you suggest, 4 files will be written, one for each year? Each file will hold exactly the same data values. Since the data is not too big, this is a fine workaround.

I now run into another error:

  File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/utilities.py", line 131, in save_variable
    dates = reftime.num2date(cube_time.points[[0, -1]])
  File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/cf_units/__init__.py", line 1988, in num2date
    cdf_utime = self.utime()
  File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/cf_units/__init__.py", line 1902, in utime
    raise ValueError(emsg.format(interval))
ValueError: Time units with interval of "months", "years" (or singular of these) cannot be processed, got 'months'.

It has been reported before (https://github.com/ESMValGroup/ESMValTool/issues/516). For @schlunma it did work when using Iris v2.2.0, but not for me. I use cf_units v2.0.2. Any idea's what might go wrong here? (I will run the ESMValTests just to be sure that my installation is completely fine, keep you updated).
update
The tests are running fine...

@bascrezee since you have a monthly climatology you need to set only one reference year, in this case I would suggest to set 2010 (middle of climatological period). Only one file has to be generated.

Note that source should point to the exact download path of the data so
https://github.com/ESMValGroup/ESMValTool/blob/0b4ef0e7b1f124897a75981b0c82e47153742068/esmvaltool/utils/cmorizers/obs/cmor_config/Duveiller2018.yml#L9
should be reporting instead the nature download link https://ndownloader.figshare.com/files/9969496 or the amazon S3 archive full path https://s3-eu-west-1.amazonaws.com/pstorage-npg-968563215/9969496/albedo_IGBPgen.nc

Ok, thanks. But the start and end of the period should be included somehow as well, to describe the data correctly. I guess adding them to the global attributes makes sense?

Or in the filename?

time in filename so far matches with data content, so in this case the final cmorizes name should contain 201001-201012. It may be a good idea to add it in the global attributes.

@bascrezee To solve the issue with time the you reported it would probably be better to use a callback function when iris load the data to set the cube reference time and units.

Thanks. This callback works fine indeed. It now ran through :)

Now I am checking the file with recipe_check_obs.yml:

# ESMValTool
# recipe_check_obs.yml
---
documentation:
  description: |
    Test recipe for OBS, no preprocessor or diagnostics are applied,
    just to check correct reading of the CMORized data.

  authors:
    - righ_ma

preprocessors:
  nopp:
    extract_levels: false
    regrid: false
    mask_fillvalues: false
    multi_model_statistics: false

diagnostics:
  Duveiller2018:
    description: Duveiller2018
    variables:
      albDiff:
        preproc: nopp
        mip: Amon
    additional_datasets:
      - {dataset: Duveiller2018, project: OBS, tier: 2, version: v2018, start_year: 2010, end_year: 2010, frequency: mon}
    scripts: null

But I run into the following error. I do not fully understand the error message. It does not find the dataset key, but it is specified in the recipe.

File "/home/crezees/ESMValTool/esmvaltool/_data_finder.py", line 117, in _replace_tags
    "your recipe entry".format(tag, variable))
KeyError: "Dataset key type must be specified for {'preproc': 'nopp', 'mip': 'Amon', 'variable_group': 'albDiff', 'short_name': 'albDiff', 'diagnostic': 'Duveiller2018', 'preprocessor': 'default', 'dataset': 'Duveiller2018', 'project': 'OBS', 'tier': 2, 'version': 'v2018', 'start_year': 2010, 'end_year': 2010, 'frequency': 'mon', 'recipe_dataset_index': 0, 'cmor_table': 'OBS', 'standard_name': '', 'long_name': 'Difference in surface albedo for a given vegetation cover transition', 'units': '1', 'modeling_realm': ['atmos']}, check your recipe entry"

Branch: https://github.com/ESMValGroup/ESMValTool/tree/version2_development_cmorize_duveiller2018

the missing key is not dataset but type - if you look at the source code for the error:

            raise KeyError("Dataset key {} must be specified for {}, check "
                           "your recipe entry".format(tag, variable))

(look at it next time :grin: )
Type can be eg type: reanalysis but that depends on your data, dunno that :beer:

Oops.. :stuck_out_tongue_closed_eyes: Yes, I will look at the source code next time.

My dataset has a non-standard dimension called vegetation_transition_code. So I added this to the file CMOR_albDiff.dat:

dimensions: longitude latitude time vegetation_transition_code

But I run into the following error:

  File "/home/crezees/ESMValTool/esmvaltool/cmor/table.py", line 648, in _read_table_file
    table[value] = self._read_variable(value, None)
  File "/home/crezees/ESMValTool/esmvaltool/cmor/table.py", line 520, in _read_variable
    var.coordinates[dim] = self.coords[dim]
KeyError: 'vegetation_transition_code'

It seems as if I still need to define this dimension somewhere. Maybe @jvegasbsc can help me, since I noted that CMOR_clisccp.dat includes a non-standard dimension named tau.

@bascrezee You need to add the information about your new axis vegetation_transition_code in CMOR_coordinates.dat, following the structure of the already available non-standard dimension.

Interestingly, whereas for all custom variable definitions we leave the standard_name blank, but not for the CMOR_coordinates.dat file. Do you have any idea why? @jvegasbsc

At least in case of the derived variables I created the reason was, that the standard name in the variable definition hat to be in the list in IRIS std_names.py. Else You get an error. To remove that the easiest way is to leave the standard name blank in the derived variable file.

I picked up this work again today, after moving around some files due to the split into tool/core I got back to the stage where I was. The script runs through, but the CMORize checker is not happy yet.

esmvalcore.cmor.check.CMORCheckError: There were errors in variable albDiff:
iTr: standard_name should be , not None
 time: Frequency mon does not match input data
 albDiff: does not match coordinate rank
in cube:
Difference in surface albedo for a given vegetation cover transition / (1) (Vegetation transition code: 6; time: 12; latitude: 180; longitude: 360)
     Dimension coordinates:
          Vegetation transition code                                                                  x        -             -               -
          time                                                                                        -        x             -               -
          latitude                                                                                    -        -             x               -
          longitude                                                                                   -        -             -               x
     Attributes:
          Conventions: CF-1.5
          climatology_end: 2012-12-31T23:59:59Z
          climatology_start: 2008-01-01T00:00:00Z
          comment: 
          host: exo
          mip: Amon
          modeling_realm: clim
          project_id: custom
          reference: Duveiller, G., J. Hooker, A. Cescatti, Scientific Data 5, 180014 (2018...
          source: https://ndownloader.figshare.com/files/9969496
          source_file: /net/exo/landclim/PROJECTS/C3S/datadir/obsdir/Tier2/Duveiller2018/OBS_...
          tier: 2
          title: Duveiller2018 data reformatted for ESMValTool v2.0a2
          user: crezees
          version: v2018

I hope to tackle them one-by-one.

iTr: standard_name should be , not None
In the CMOR_coordinates.dat I left standard name blank, as usual for custom variables.

time: Frequency mon does not match input data
Is it possible that the CMOR checker does not know how to handle climatological data? See also the discussion above?

albDiff: does not match coordinate rank
Hope this goes away as soon as iTr has been fixed, maybe it is related to that one.

Branches:
landvariables [core repository ; for custom CMIP definitions]
version2_development_cmorize_duveiller2018 [public repository ; cmorize scripts ]

Any help is appreciated !

Please ask @jvegasbsc for CMOR related issues

@jvegasbsc any thoughts on this?

I tried two options of fixing the custom coordinate, but both fail, see the comments in the code below. Is it possible that the CMOR checker fails in parsing correctly a custom defined CMOR coordinate? I think I am the first one adding a coordinate that does not have a valid standard name.

    for cube in cubes:
        if cube.var_name == rawvar:
            for cubecoord in cube.coords():
                if cubecoord.var_name=='iTr':
#                    cubecoord.standard_name = None # CMOR checker raises: iTr: standard_name should be , not None
                    cubecoord.standard_name = ''  # this script raises: ValueError: '' is not a valid standard_name

I can trace back the error to being raised in l. 104 (permalink does not embed because it's a different repository?), so it must be one of the checks before that fail.

https://github.com/ESMValGroup/ESMValCore/blob/dbcfb85715ea1ee130db7351980f719108ffabde/esmvalcore/cmor/check.py#L96-L104

Update: I decided to extract a certain vegetation cover transition code, after which this is not a dimension coordinate any more. This is a fine workaround for my case. But it might still be good to check if the CMOR checker allows for custom 'non-valid standard name' coordinate names.

Update: Time has been solved as well. CMORization done. Thanks for the support, especially to @tomaslovato. I will submit a PR early next week.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jhardenberg picture jhardenberg  路  5Comments

francesco-cmcc picture francesco-cmcc  路  4Comments

BenMGeo picture BenMGeo  路  5Comments

valeriupredoi picture valeriupredoi  路  4Comments

lukasbrunner picture lukasbrunner  路  4Comments