Hey folks, currently (in preprocess.py) there is a kludgey way to get the MIP value:
if project_name == 'CMIP5':
table = model['mip']
else:
table = project_info['MODELS'][0]['mip']
-- tis ugly and inefficient (what if the first model doesn't have a mip key?) -- could Javier have a look and change things so we get the MIP from a higher level position? Cheers, V
Maybe the MIP is not even needed, as the CMOR table should be unique given the variable name.
Agreed it should probably go all together at some point. @bouweandela is working on centralising the namelist parser, perhaps something to add there.
Wait, the MIP is still needed in the namelist to fully define the CMIP5 input (together with the other keys).
It is (maybe) not needed for searching the cmor table of the given variable.
Can't we do a lookup in the CMIP5 tables (which are included in ESMValTool) to determine the MIP given a variable name? Would get rid of the MIP=* trick used when using variables from multiple MIPs?
Regardless, something to take care of while reading in the namelist, I think :-)
Can't we do a lookup in the CMIP5 tables (which are included in ESMValTool) to determine the MIP given a variable name? Would get rid of the MIP=* trick used when using variables from multiple MIPs?
Yes, that's what I was thinking of, but @jvegasbsc should confirm whether it is doable at the preprocessor/cmorization level.
I'm sorry to inform you that this is not possible in all cases: there are some variables in more than one table. For example, sithick in CMIP6 is in SIday and SImon.
Anyway, I think it is a good idea to deduce it for all the variables we are going to use.
And, by the way, the mip parameter should be moved from models configuration to variable config, because it is possible to use variable from different MIPs in some diagnostics
I'm sorry to inform you that this is not possible in all cases: there are some variables in more than one table. For example, sithick in CMIP6 is in SIday and SImon.
That's tricky. I guess the only difference between the two table is the time coordinate.
Is there any other way we can deduce it given the namelist settings?
Something like checking all mips in the MODELS section and taking the most frequent entry?
And, by the way, the mip parameter should be moved from models configuration to variable config, because it is possible to use variable from different MIPs in some diagnostics
That case is already covered by using wildcards in the MODELS section and moving the mip definition to the variable dict (see the yaml concept document). But for the general case I would stick to the current solution.
Let's also see what @axel-lauer think about this...
I think all variables that are available for a particular mip are included in the corresponding table (e.g. tas is in the tables 3hr, 6hrPlev, 6hrPlevPt, day and Amon) but usually, those definitions are (should be) identical. Anyways, to be on the safe side it might be best to have the mip Definition in the variable dict and then simply use the table corresponding to the mip of the variable being processed.
I've checked the CMIP5 tables for the variable ta.
This is defined in 7 mips (6hrLev, 6hrPlev, Amon, cf3hr, cfDay, cfMon, day) and unfortunately there are some differences across these definitions, mostly in the additional variable information section (things like dimensions, valid_min, valid_max, etc.), which could be critical for the preprocessor.
So a mip definition is definitely necessary here. @jvegasbsc suggestion has the advantage that it is clear, safe and flexible (i.e., allows using the same models list with multiple variables from different mips).
The only concern I have is for those namelist which do not use CMIP models, in that case the mip entry might be confusing.
Other ideas?
Otherwise I would suggest @jvegasbsc to go ahead and implement his suggestion.
The only concern I have is for those namelist which do not use CMIP models, in that case the mip entry might be confusing.
This can be confusing only if you are mixing cmorized (all of them will have the mip, even if they are not part of any CMIP) and raw data models, and I think this is something that only "advenced" users will do.
Anyway, having parameters that only apply to certain types of projects and managing those kind of things with ease is part of the magic of YAML. And if we are thinking of using the tool with non-cmorized models, we will need this kind of things for most of the models. For example, for Nemo output we probably will need the frequency and the file type (gridT, gridV, icemod, pisces).
Ok!
Let's move the mip key from the model to the variable dictionary then.
I would suggest waiting for PR159 and continue form that.
I've been working on a namelist parser in the REFACTORING_preprocessor branch that already works with this.
This has been implemented in #172, I think this issue can be closed.
Has the workaround by @valeriupredoi (see above) also been revised accordingly?
PR #172 contained new code for running the preprocessor, independent of preprocess.py. I kept the file around for future reference, there may still be information in there that is useful, but I think it doesn't need to be updated anymore.