I am trying to upload the land-sea mask in my diagnostic script. What I used to do before (in ESMValTool v. 1) was using the attribute get_cf_lmaskfile from project_info.
From what I can see in the examples, land-sea mask are applied to input fields in the preprocessing phase. But this is not what I want to do. I want to use the land-sea mask as a 2D matrix to be used in the computations of the diagnostic tool.
Is there anything equivalent to what I mentioned in ESMValTool v.1? I saw that there is a function called read_fx_data for NCL. Is there a Python equivalent I am not aware of?
Thanks for helping.
You can make the paths of the mask files available to the diagnostic by specifying the fx files you need in the recipe, see e.g. this recipe:
https://github.com/ESMValGroup/ESMValTool/blob/43598ee48a327fc3ef2a6e53768324d02ad61863/esmvaltool/recipes/recipe_autoassess_landsurface_permafrost.yml#L36
The paths to the files are then made available as part of the dictionary describing the variable/dataset.
Or are you asking about reading in the files?
I have just finished writing the documentation for land/sea/ice masking so here's the bit fresh out of oven :grin:
In ESMValTool v2 land-seas-ice masking can be done in two places: in the preprocessor, to apply a mask on the data before any subsequent preprocessing step, and before
running the diagnostic, or in the disgnostic phase. We present both these implementations below.
To mask out seas in the preprocessor step, simply add mask_landsea: as a preprocessor step in the preprocessor of your choice section of the recipe, example:
.. code-block:: bash
preprocessors:
my_masking_preprocessor:
mask_landsea:
mask_out: sea
The tool will retrieve the corresponding fx: stfof type of mask for each of the used variables and apply the mask so that only the land mass points are
kept in the data after applying the mask; conversely, it will retrieve the fx: sftlf files when land needs to be masked out.
mask_out accepts: land or sea as values. If the corresponding fx file is not found (some models are missing these
type of files; observational data is missing them altogether), then the tool attempts to mask using Natural Earth mask files (that are vectorized rasters).
Note that the resolutions for the Natural Earth masks are much higher than any usual CMIP model: 10m for land and 50m for ocean masks.
Note that for masking out ice the preprocessor is using a different function, this so that both land and sea or ice can be masked out without
losing generality. To mask ice out one needs to add the preprocessing step much as above:
.. code-block:: bash
preprocessors:
my_masking_preprocessor:
mask_landseaice:
mask_out: ice
To keep only the ice, one needs to mask out landsea, so use that as value for mask_out. As in the case of mask_landsea, the tool will automatically
retrieve the fx: sftgif file corresponding the the used variable and extract the ice mask from it.
At the core of the land/sea/ice masking in the preprocessor are the mask files (whether it be fx type or Natural Earth type of files); these files (bar Natural Earth)
can be retrived and used in the diagnostic phase as well or solely. By specifying the fx_files: key in the variable in diagnostic in the recipe, and populating it
with a list of desired files e.g.:
.. code-block:: bash
variables:
ta:
preprocessor: my_masking_preprocessor
fx_files: [sftlf, sftof, sftgif, areacello, areacella]
Such a recipe will automatically retrieve all the [sftlf, sftof, sftgif, areacello, areacella]-type fx files for each of the variables that are needed for
and then, in the diagnostic phase, these mask files will be available for the developer to use them as they need to. They fx_files attribute of the big variable
nested dictionary that gets passed to the diagnostic is, in turn, a dictionary on its own, and members of it can be accessed in the diagnostic through a simple loop over
the 'config' diagnostic variable items e.g.:
.. code-block:: bash
for filename, attributes in config['input_data'].items():
sftlf_file = attributes['fx_files']['sftlf']
areacello_file = attributes['fx_files']['areacello']
@ValerioLembo any more issues related to this issue or the bad things have been masked out? :grin:
@valeriupredoi I am so sorry that I did not reply to your precious suggestions!
Anyway I figured up how to include the land-sea mask, but I was actually wondering if there is a more elegant way than making the land-sea mask name a global attribute of one of the input files, than searching through 'metadata' attribute until I find the filename I need. In the end it is just a few lines of code, I know. But I was not able to run the block you are suggesting at the end of your description, and the way I found looks pretty much unnecessarily complicated to me...
wait @ValerioLembo what do you mean you couldn't use the codeblock? config['input_data'].items() (where config is the metadata) contains all the file names including the file names for mask files initialized in the preprocessor with fx_files: [sftlf, sftof, sftgif, areacello, areacella] or whatever the fx files you need to use. How did you find the mask file - ie is it some custom mask file that only you have?
I got error messages telling me that it could not find the files I was looking for, but I have not the log file at hand now to see what was exactly. Admittedly, I did not dig too much into it because I was in a rush of having results outputted from a fully working version of the diagnostic tool, so I conceived a quick rude workaround (as you can see at lines 381-385 of my code here).
I am sorry for not being more precise. I cannot get back to it now. Maybe someone can give a try in the mean time...
@ValerioLembo no need to extract info from raw nested dict objects, have a look at this very basic Python diagnostic:
I can simplify it so that things are more straightforward (you can group metadata by standard_name, dataset, project etc):
import logging
import os
import iris
from esmvaltool.diag_scripts.shared import run_diagnostic
logger = logging.getLogger(os.path.basename(__file__))
def _get_fx_cubes(cfg):
"""Extract the fx cubes."""
for filename, attributes in cfg['input_data'].items():
sftlf_file = attributes['fx_files']['sftlf']
areacello_file = attributes['fx_files']['areacello']
sftlf_cube = iris.load_cube(sftlf_file)
areacello_cube = iris.load_cube(areacello_file)
return sftlf_cube, areacello_cube
def main(cfg):
"""Main function, dummy here."""
sftlf_cube, areacello_cube = _get_fx_cubes(cfg)
logger.info("My fx cubes are: %s and %s", sftlf_cube, areacello_cube)
if __name__ == '__main__':
with run_diagnostic() as config:
main(config)
run_diagnostic is your best friend here and anywhere else where you need a lot of metadata from the configuration file, the diagnostic settings and the recipe and preprocessor output - you can print cfg out and you will see its structure - it's a massive nested dictionary that is easy to walk through and extract what items you need out of it. Run this little script as a diagnostic on its own and you will see how it works
That is a great hint. Thanks! I will test it as soon as possible...
We have similar needs for R recipes. Any suggestions on how to pass the sftlf_file to the diagnostic in R?
Just add it to the recipe as suggested in https://github.com/ESMValGroup/ESMValTool/issues/691#issuecomment-435304052, the paths to the fx files will then be made available in the metadata.yml files in the preproc directory, as extra attributes of the dataset(s).
@ValerioLembo is this sorted out? Can we close it?
Yes I think it is sorted now...