ESMValTool provides (limited) support for data in their native format. In this case, the steps needed to reformat the data are executed as datasets fixes during the execution of an ESMValTool recipe. This has the advantage that the user does not need to store a duplicate (cmorized) copy of the data. Instead, the cmorization is performed ‘on the fly’ when running a recipe. ERA5 is the first dataset for which this ‘cmorization on the fly’ is supported. (see cmorization as a fix for more information.)
Currently, there is a recipe_era5.yml under the folder cmorizer. This misleads the users to run this recipe to cmorize era5 data (converting from native6 to OBS6) before running their own recipe.
This recipe includes three diagnostics sections: hourly, daily, monthly.
The diagnostics hourly and monthly are meant as examples to show users how to write a recipe working with era5 data. Therefore, the example folder is a better place for them.
The diagnostics daily is very useful to convert hourly data to daily data. Because era5 archive does not include daily data.
A recipe including only this part can be created in the recipes folder with a name likerecipe_daily_era5.yml.
In addition, now recipe_check_obs.yml includes some checks for hourly and monthly era5 under the project OBS6. Those checks should be removed.
Please see also related issues here #1884 and #1889 .
@bouweandela , @Peter9192, @katjaweigel please let me know what you think.
Hi @SarahAlidoost,
Good point, maybe we could move the example use for hourly and monthly data to recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipe cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?
I'm not sure. Without a hint from Birgit @hb326 I didn't manage to find them there, when I first wanted to use ERA5 data, so I agree that it is probably not the best the place. But I'm not sure your suggestions make it easier. Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.
Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.
What frequency do you mean @katjaweigel? For hourly or monthly data, I would be surprised if using cmorized OBS6 data was any faster than just the native6 data.
It is also confusing, that there is the diagnostic:
ESMValTool/esmvaltool/diag_scripts/cmorizers/era5.py
and the fix:
ESMValCore/esmvalcore/cmor/_fixes/native6/era5.py
with the same name. Maybe the diagnostic should be renamed?
Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.
What frequency do you mean @katjaweigel? For hourly or monthly data, I would be surprised if using cmorized data was any faster than just the native data.
Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,
Hi @SarahAlidoost,
Good point, maybe we could move the example use for hourly and monthly data to
recipe_check_obs.yml, because that contains all examples of using this kind of data and
Thank you. I am still concerned about including monthly/hourly dataset with project native 6 in the recipe_check_obs.yml. This recipe is meant for checking cmorized data that are OBS/OBS6, am I right? Also, it would be difficult to find examples there. The monthly/hourly diagnostics sections are meant as an example. So it would be nice to have them in the example folder and notify users in the documentation, here .
move the daily data diagnostic to a recipe
cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?
This is fine.
maybe we could move the example use for hourly and monthly data to
recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipecmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?
I like this suggestion. Maybe we can do both? Do the check of all variables in recipe check obs, and make a separate example recipe to show in general how to use the native data?
maybe we could move the example use for hourly and monthly data to
recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipecmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?I like this suggestion. Maybe we can do both? Do the check of all variables in recipe check obs, and make a separate example recipe to show in general how to use the native data?
I don't really like it. I think it makes it really hard to find and forces the user to check the code to find available variables (with fixes) and just try it for all others. I also like that the recipe currently provides the translation between ERA5 and cmor variable name, this is often not easy to find (could be moved to the description though, like we discussed for ERA5-Land).
also like that the recipe currently provides the translation between ERA5 and cmor variable name
Actually this has changed, and the era5_name and era5_frequency are no longer needed. The current DRS to find the right ERA5 data is:
input_dir:
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
input_file:
default: '*.nc'
so it depends on the user putting the right variables inside the right folders. I agree that this is not pretty. Perhaps we should add a table with this mapping to the documentation instead?
Perhaps we should add a table with this mapping to the documentation instead?
Yes, I agree, and a check that they did it right in the era5 fix.
Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,
@katjaweigel @hb326 It would be great if you could share a recipe and some run times, if you happen to have this.
Hello @bouweandela, I'd like to make a question about the use of ERA5. I'd like to process ERA5 geopotential height (daily means).
I am not sure if the best procedure now is to cmorize the data (as I did with ERA-Interim), but from some issues I have the impression a fix on-the-fly is preferred.
For geopotential (height) a fix is available in https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/_fixes/native6/era5.py but this is not documented here https://docs.esmvaltool.org/projects/ESMValCore/en/latest/develop/fixing_data.html - not sure if it should...
Thanks in advance for the clarification!
A related issue is #1889
Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,
@katjaweigel @hb326 It would be great if you could share a recipe and some run times, if you happen to have this.
I had cmorized the ERA5 data not on the fly, but before I used them in the recipe, since the diagnostics are still fickle and crash a lot. So I cmorized the data beforehand to not have to do this step every time I needed to run the diagnostics again after it crashed. I do not really know if, with a functioning diagnostic, the time to run the on-the-fly-cmorizer would slow things down. I have not timed it so far. Do you still want me to do it?
I used monthly data for the diagnostic.
I used both on the flight and pre cmorized ERA5 monthly data, but never for exactly the same data set (I had that test on my todo list for a while but I somehow never found time, sorry!) I would assume that if there is a time difference for monthly data it is not large, at least the time for preprocessing ERA5 data with the on-the-fly-cmorizer was not unusually long compared to CMIP data.
Hello @bouweandela, I'd like to make a question about the use of ERA5. I'd like to process ERA5 geopotential height (daily means).
@fserva Hi, sorry for the late reply. Just to check, you like to process ERA5 geopotential that is a pressure level parameter and instantaneous (please see Table 9 here). The process is to convert hourly data to daily data. Is it right?
I am not sure if the best procedure now is to cmorize the data (as I did with ERA-Interim), but from some issues I have the impression a fix on-the-fly is preferred.
You can use the variable in your recipe including a preprocessor to convert hourly to daily data (fix on-the-fly). You can also do the following: first, run the recipe_era5.yml to convert hourly data (native6) to daily data (obs6). Then use the daily data in your recipe.
For geopotential (height) a fix is available in https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/_fixes/native6/era5.py but this is not documented here https://docs.esmvaltool.org/projects/ESMValCore/en/latest/develop/fixing_data.html - not sure if it should...
The recipe including variable Zg (instantaneous) returns an error like monthly Zg explained in issue #1889. (cc @remi-kazeroni)
Thanks in advance for the clarification!
I am not sure if the best procedure now is to cmorize the data (as I did with ERA-Interim), but from some issues I have the impression a fix on-the-fly is preferred.
@fserva On the fly is preferred for data that you use in the original time resolution. If you're after daily means computed from hourly values and plan to run the recipe more than once, you're probably better off computing the daily means once and storing them as OBS6 data.
For geopotential (height) a fix is available in https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/_fixes/native6/era5.py but this is not documented here https://docs.esmvaltool.org/projects/ESMValCore/en/latest/develop/fixing_data.html - not sure if it should...
That section of the documentation is a bit awkwardly written, it should have been just instructions on how to implement support for a dataset as a fix, not a list of supported datasets. The list of all supported datasets and variables is here: https://docs.esmvaltool.org/en/latest/input.html#supported-datasets
@fserva I have done something similar last year: computing daily means from hourly zg data (single pressure level). I ran into some issues, I can dig into that again and open a pull request to support the cmorization of zg.
@fserva On the fly is preferred for data that you use in the original time resolution. If you're after daily means computed from hourly values and plan to run the recipe more than once, you're probably better off computing the daily means once and storing them as OBS6 data.
I used the second option as it was needed to run the recipe multiple times.
Thanks @SarahAlidoost @bouweandela @remi-kazeroni for your suggestions!
You are right @SarahAlidoost, I need zg in Table 9 line 4. The division by _g0_ is the only fix needed, apart from the daily average.
So in this specific case (since I am averaging the data temporally) I will try the recipe_era5.yml route. I think I've tried with no success before, but now the procedure is clearer. However I don't see zg in the list of supported ERA5 variables :confused:
Hi, quick update. I've tried with an older environment and the recipe failed due to a problem with units conversion for pressure levels (mbar to unknown). I tried recreating the environment but it was taking several hours (maybe #2150?) and I needed to interrupt the process. Trying again now in debug mode, but messages are not very informative.
After a day or so of 'solving' the environment is ready.
So it seems the files are read in correctly INFO [15494] PreprocessingTask daily_mean/zg created. It will create the files:
recipe_era5_fs_20210512_162231/preproc/daily_mean/zg/native6_ERA5_reanaly_1_6hrPlev_zg_2000-2001.nc
but then there is a problem with their dimensions:
ERROR [15512] Failed to run fix_metadata([<iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>, <iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>, [...], <iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>], {'project': 'native6', 'dataset': 'ERA5', 'short_name': 'zg', 'mip': '6hrPlev', 'frequency': '6hr', 'check_level': <CheckLevels.DEFAULT: 3>})
This seems to be an iris problem iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found none.'
Is this caused by the fact that a 3D (i.e., without PLev) file is expected? @remi-kazeroni
Note that zg is a bit of a tricky variable. This may be related to ESMValGroup/ESMValCore#1099 and/or ESMValGroup/ESMValCore#333.
Thanks for the tip @zklaus. Both issues could be relevant. In fact by looking at the raw file headers
netcdf native6_ERA5_reanaly_1_6hrPlev_zg_200001 {
dimensions:
longitude = 1440 ;
latitude = 721 ;
level = 19 ;
time = 124 ;
variables:
float longitude(longitude) ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int level(level) ;
level:units = "hPa" ;
level:long_name = "pressure_level" ;
int time(time) ;
time:units = "hours since 1900-01-01 00:00:00.0" ;
time:long_name = "time" ;
time:calendar = "gregorian" ;
short z(time, level, latitude, longitude) ;
z:scale_factor = 7.61496044931561 ;
z:add_offset = 244700.965957275 ;
z:_FillValue = -32767s ;
z:missing_value = -32767s ;
z:units = "m**2 s**-2" ;
z:long_name = "Geopotential" ;
z:standard_name = "geopotential" ;
the dimensions has no standard_name. Maybe that's the issue like here https://groups.google.com/g/scitools-iris/c/9IwC2Rr7xm? For usual dimensions, logical standard_names could be assigned if absent maybe? But I don't know the Core functions much.
This seems to be an iris problem
iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found none.'Is this caused by the fact that a 3D (i.e., without PLev) file is expected? @remi-kazeroni
Yes @fserva I think so. It would be great if you could open an issue about the cmorization of the zg variable of ERA5 in ESMValCore documenting the problem you are facing. This problem is too different from the original issue here. Then I could help you to add zg to the list of supported variables for ERA5.
Spot on, @remi-kazeroni ! Is the original issue here solved? @SarahAlidoost, what needs to happen to close this issue?
Spot on, @remi-kazeroni ! Is the original issue here solved? @SarahAlidoost, what needs to happen to close this issue?
We need a pull request to fix the original issue (i.e. moving files around) including this comment . I can make the PR.
@fserva, I have marked a bunch of comments here as off-topic, so that we can focus on the original issue here. Please don't mistake that for me dismissing your questions and input! Let's just discuss the open points in other issues, eg ESMValGroup/ESMValCore#1136 and #1889.
@SarahAlidoost, great if you can take care of this, thanks!
Most helpful comment
@fserva, I have marked a bunch of comments here as off-topic, so that we can focus on the original issue here. Please don't mistake that for me dismissing your questions and input! Let's just discuss the open points in other issues, eg ESMValGroup/ESMValCore#1136 and #1889.
@SarahAlidoost, great if you can take care of this, thanks!