Esmvaltool: Missing values and _FillValue attribute not saved with Iris 2

Created on 11 Apr 2018  路  18Comments  路  Source: ESMValGroup/ESMValTool

As discussed here, the new version of Iris (v2.0.0) include some changes in the way missing values are handled (see here).

In particular:

When saving a cube or list of cubes in NetCDF format, a fill value or list of fill values can be specified via a new fill_value argument. If a list is supplied, each fill value will be applied to each cube in turn. If a fill_value argument is not specified, the default fill value for the file format and the cube鈥檚 data type will be used.

In the current version of the preprocessor, this results in the missing values being set to 9.96921e36 instead of 1.e20 and, most importantly, in _FillValue disappearing from the variable's attributes. This leads to completely different results, as the diagnostic script cannot recognize these values as missing and interpret them as actual values.

As pointed out by @valeriupredoi, the solution could be to force the missing values at save level, but we should also make sure that the missing values are correctly propagated in the various cube operations, regardless of whether they are saved to NetCDF or not.

@bjlittle what is the cleanest way to tackle this problem, also in view of future Iris releases?

help wanted

All 18 comments

This problem has a work-around: loading a masked cube in memory has the advantage of re-setting the mask fill_value argument (see below for example). Reassigning the new fill_value kwarg is not preserved at save stage when one should specify the fill_value = 1e+20. It is, however, a workaround, and prompts us to be careful at two operation points: load and save.

Example: loading and manipulating a cube that has been previously saved by iris 2.0 without specifying the desired fill_value:

c=iris.load_cube('Python3/namelist_quickrun_20180411_100712/preproc/pp850_ta850/MultiModelMean_T3M_ta_2000-2002.nc')

c.data.fill_value
9.969209968386869e+36

np.ma.set_fill_value(c.data, 1e+20)

c.data.fill_value
1e+20

Note that saving the newly redifined cube mask fill_value will not preserve the fill_value:

iris.save(c, 'test.nc')

d = iris.load_cube('test.nc')

d.data.fill_value
9.969209968386869e+36

but saving it with the specific fill_value kwarg will:

iris.save(c, 'test.nc', fill_value=1e20)

d = iris.load_cube('test.nc')

d.data.fill_value
1e+20

Can we make sure that 1.e20 is defined somewhere as a global constant instead of hard-coding it at every occurrence?

This is the standard _FillValue used in CMIP5 and CMIP6, so it makes sense to define it centrally.

The cmor module is also checking in advance that missing values are correctly set to this value, right @jvegasbsc ?

@valeriupredoi would you be willing to implement the above workaround in a dedicated branch that we can test?

@mattiarighi absolutely! But I would hang on for a bit see what @bjlittle can tell us about this, maybe the old folk at Iris already have a fix or kludge for this in their upcoming version?

@mattiarighi and @valeriupredoi I'd recommend that you specifically nail down the fill-value on saving to preserve it. I'll dig a little further, as fill-value was particularly tricky to handle when it came to lazy masked dask arrays, but if you can nominate a standard fill-value that is representative of what you want in the actual data - then force the fill-value preservation to disk through the kwarg on saving.

Does this help?

hey @bjlittle -- so, as I understand it from your answer, you saying that my workaround (see above post from 12 days ago) is actually the only solution to this problem? :)

Thanks for your suggestion!

A standard _FillValue (=1.e20) is indeed explicitly defined in the CMIP5 and CMIP6 cmor-tables (see e.g., here).

We just need to read it from the tables and force it on saving, right?

@valeriupredoi, could you take care of this? I would suggest to implement the fix in the REFACTORING_issue283 branch.

@mattiarighi I'm sure the backend regridding used its own definition of fill-value... it might be worthwhile for ESMValTool defining global constants that are then used by all it's packages, including regridding.

Good point, this was indeed the approach in v1.0 (see here).

@valeriupredoi, what do you think?

@bjlittle indeed the _MDI was set to 1e20 in regrid, but it was lost at saving :disappointed:
@mattiarighi I'll have a crack at implementing the workaround in the separate branch, I would stay away from global environment variables but rather define this as a fixed MDI per module, what say yous?

It would be good to define this value only once, if possible, to avoid inconsistencies. For example, in preprocessor/__init__.py. Would that make sense?

@mattiarighi @bjlittle this is fixed by this: https://github.com/ESMValGroup/ESMValTool/blob/REFACTORING_fillvalue_issue302/esmvaltool/preprocessor/_io.py note that I resorted to creating a new branch rooted in REFACTORING_backend (it may be overkill, given that the whole new branch differs by exactly 8 lines from REFACTORING_backend, but REFACTORING_issue283/.../_io.py differs a LOT from the _backend one, so I didn't want to mess Bouwe's stuff in there) -- tested and it works :)

That's perfectly OK and it makes PR testing and review much faster. :+1:

PR numero 315 :smile:

Or you could have a preprocessor/constants.py as a name space for all appropriate such constants...

This has been proposed at some point in the past, see #31, but we concluded that we can use scipy for physical constants and a separate file just for fillvalue would be a bit overkill.

could do, but for now it'd be populated with a grand total of one constant, maybe in the future when we get all sorts of constants for the variable derivations? I think should be ok in io for now

Fixed by #315

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bouweandela picture bouweandela  路  4Comments

chris-to-pher picture chris-to-pher  路  3Comments

valeriupredoi picture valeriupredoi  路  3Comments

jvegasbsc picture jvegasbsc  路  4Comments

francesco-cmcc picture francesco-cmcc  路  4Comments