I was helping @mattiarighi to debug the derivation of lwcre, in particular an exception that was fired in this line
lwcre = rlutcs_cube - rlut_cube
ValueError: This operation cannot be performed as there are differing coordinates (latitude) remaining which cannot be ignored.
This is because there are very small differences in the latitude coordinate between both cubes, in the order of 10^-6. Iris is too strict there. Is there a way to tell Iris an acceptable difference for this, @bjlittle?
If we can not solve this at iris level, we need to decide what to do here. Maybe we can detect the error, checkthe differences and if, they are small, (below 10^-3) assign the values from rlut to rlutcs. What do you think?
The problem affects the CCSM4 model. I can provide the data for testing.
@jvegasbsc I've stepped through the Iris code from cube subtraction down to the point of Coord.__eq__, which leans on iris.util.arrray_equal to determine the equality of coordinate values.
This is essentially using NumPy equality, so it's not Iris that is being too strict - we are merely providing the same numeric array equality that NumPy defines.
So it's your data that's the problem, otherwise you're really looking for a different operation that says "subtract these cubes, even if the metadata defining the cubes are kinda the same". Can't see how we could easily implement that within Iris and allow the user to control the tolerance (absolute or relative) of accuracy and restrict that tolerance to coordinate values.
Is there a practical workaround here?
Is the data for the coordinates on each cube coming from a different data source? Hence the difference in accuracy? ... would it be sensible to apply a pre-processing step that equalises the number of significant digits of the coordinates across the cubes?
Indeed this is an error in the coordinate of this particular model, which uses very slightly different latitude arrays for different variables (rlut and rlucs in our example, but there are other cases).
Ideally, this should be fixed via model-specific fix-file, but it's not so straightforward since we apply this check on a per-variable basis and there is no clear "correct latitude" to use as a reference for fixing the error.
Another option would be to arbitrarily choose one of the two latitude coordinates this model uses and set it as the correct one, hard-coding it in the fix-file, but it's quite inelegant... :grin:
Better ideas?
I think these kind of differences are more likely to be due to the nature of float variables than to any other reason.
Another option would be to arbitrarily choose one of the two latitude coordinates this model uses and set it as the correct one, hard-coding it in the fix-file, but it's quite inelegant...
For this fixes we can use the fix_metadata step, which is almost not costly in computing time. We can check other variables and use the most common one
This is essentially using NumPy equality, so it's not Iris that is being too strict - we are merely providing the same numeric array equality that NumPy defines.
Numpy also provides the numpy.allclose() for equality with tolerance. Anyway, it will probably be to costly in computing power to be useful within Iris as a general comparison
For this fixes we can use the fix_metadata step, which is almost not costly in computing time. We can check other variables and use the most common one
Where should we implement this fix?
Only in _derive.py for lwcre or at a more general level?
I've tested swcre and it goes through without problems, toz also combines different variables without problems, so it seems to be quite specific to lwcre.
Where should we implement this fix?
Only in _derive.py for lwcre or at a more general level?
As a fix for rlut or rlucs. We should check which of those is the one having different values (comparing with tas or other CCSM4 variables)
Sound good!
Feel free to use my branch REFACTORING_lwcre for that.
@jvegasbsc Are the differences down to the data being sourced from different models and/or the dtype of the actual coordinates?
Just curious... :confused:
The data type of the coordinates is always double and this is checked by the CMORcheck module.
The problem arises due to slightly different latitude coordinates in the CCSM4 model between two variables (rlutcs and rlut) which are subtracted to calculate a derived variable (lwcre).
The two coordinates look like this:
-90 -90
-89.05759429931641 -89.0575916230366
-88.11518096923828 -88.1151832460733
-87.17277526855469 -87.1727748691099
-86.23036956787109 -86.2303664921466
-85.28795623779297 -85.2879581151832
-84.34555053710938 -84.3455497382199
I don't know if this problem occurs also for other models/variables.
In any case, this is not critical as long as we deal with only 1 variable (most common case), but only when 2 variables are combined.
Hope this helps.
What is the status here? Is the bugfix implemented?
It would be good to close this before the workshop.
@mattiarighi @jvegasbsc I reckon a standard solution if we perform cube subtractions/sums/... would be to regrid one on another (if there is no regridding done at all), that way we ensure consistency between spatial coordinates
Regridding looks a bit overkill for this case, I would still prefer the fix in the coordinate.
It is a cleaner solution since it does not modify the original variable data, but only the coordinates. This can be a relevant issue, in particular if the user explicitly sets regridding: none in the namelist.
@mattiarighi regrid: none and different MxN grid dimensions will result in complete failure of intercube operations; regrid: none with identical MxN grid dimensions will still be potentially problematic if the grid centers are not identical -- something that may cause (lat, lon) points to differ order dgrees and not only 1e-6 degrees, hence my suggestion to regrid to a common grid, for sanity, when doing any intercube operation
True, but this would also slightly change that variable data, even when this is just a numerical error in the coordinates (as the case I mentioned above).
Maybe the results will be similar in the end, but conceptually I believe that the coordinate should be fixed without changing the variable data.
OK, agreed, it's a case of balancing an error introduced by interpolation vs an error introduced by tweaking a grid point and keeping fixed the variable value attached to that grid point -- can we set a threshold e.g. if the delta((lat1, lon1) - (lat2, lon2)) is smaller than say double precision then we tweak the grid or not, we regrid?
If the coordinate systems of two variables from the same model/exp/mip/ensemble differ, it is always an error, independently of the delta.
A specific fix for each case is the right solution, in my opinion, also because such cases are not very common I think.