Hey guys, this is a very very obscure case but I think it needs to be addressed nonetheless: when one tries to run with a file, say, blah-blah_2000-2008.nc but in the same data dir there is a file that has exactly the same parameters but slightly different years blah-blah_2000-2005.nc (could be an older version or exactly the same file but with relabelled years, like in my case), concatenation throws an error:
File "/group_workspaces/jasmin/ncas_cms/valeriu/anaconda2_test/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/preprocessor/_io.py", line 67, in concatenate
cube = iris.cube.CubeList(cubes).concatenate_cube()
File "/group_workspaces/jasmin/ncas_cms/valeriu/anaconda2_test/envs/esmvaltool/lib/python3.6/site-packages/iris/cube.py", line 509, in concatenate_cube
raise iris.exceptions.ConcatenateError(msgs)
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
An unexpected problem prevented concatenation.
Expected only a single cube, found 2.
There should be a check if the cubes are identical, and just keep one, me reckons.
@valeriupredoi Hey fella...
concatenate_cube is pretty strict when attempting to concatenate cubes. I think that it's a pretty safe stance for Iris to take, rather than concatenating dissimilar cubes and throwing away metadata.
If you can re-run and re-create the issue, it would be worthwhile adding a breakpoint to discover the actual difference between the two cubes that you believe should be able to concatenate - it will most likely be due to a metadata difference - the bane of most peoples lives!
Also, did you cite the full traceback? Typically, if you use concatenate_cube it will attempt to tell you what the difference that cause the failure to concatenate was...
This should not happen if the data are properly structures in subdirectories. Different versions of the same data should be in different subdirectories.
Can you give us some more info on how the data are structured in the case you mentioned?
@bjlittle @mattiarighi
Bill man, good to hear from you, bro!
Guys, my case is very stupid: I have a data directory (drs value in config set to default so all my CMIP5 data sits in a single default: /blah/blah/ directory, so no proper DRS) and in it I have the same file with two different names: CMIP5_BLAH_2000-2002.nc and CMIP5_BLAH_2000-2009.nc -- the files are ABSOLUTELY the same bar the name (the years) -- and I get this issue. What the code does, it sees two files that it should load, loads them, cubeLists them and then it realizes hang on, I got the same cubes but from files with slightly different names. What the code should do: it should say, I got exactly the same data twice, so will keep only one cube. As I said, this is a dumb situation (so feel free to delete the issue if you think so) but this situation may arise in the case of a user that has two files: one that has data from 2000 to 2002 and is an older version and another that has data from 2000 to 2009 and is a newer version
I think this way of collecting the files should be discouraged. I mean, it's fine for quick tests by developers, but the user should really keep their data organized. So it's good to have this error message from Iris, maybe we could just make it more explicit ("data duplication found", or similar).
yes and yes :grinning:
@valeriupredoi Dude, good to hear from you too!
The thing is, the filename doesn't matter in this case, it's the fact that the cubes are the same (from what I understand what you're saying) therefore it's impossible to concatenate identical cubes together. Hence, I guess why Iris is raising an exception in this case.
In order to concatenate two cubes, they should be non-overlapping in all of the dimensions that they are defined.
Does this help?
yes, correct! the fact that the files have different names plays in only at the loading point at the ESMValTool I/O stage ie there is a list made up of files that should contain the needed data, then on, that list is populated with two identical cubes, therefore the iris cubelist concatenation fails -- so as @mattiarighi says, it's good to have iris complaining about this case, but the message to take home from this issue is not that we want to change iris but we want to add some message in the ESMValTool I/O module saying 'you are trying to concatenate two identical cubes, dumbass, make sure your data is up to date' and then continue to use one cube, and not fail right away (try/except)
So, it seems like you want to catch the iris.exceptions.ConcatenateError, issue an appropriate warning (?) and continue with either one of the identical cubes.
Seems reasonable to me :+1: ... but you'd want to make sure that the cubes are identical before selecting one of them - otherwise concatenate is returning a genuine concatenation fault and that exception should be propagated to the user.
yes, sir, you read my mind :grinning:
I will take a look, but I am not really sure we should go for a warning here. I think we won't have information to prefer one cube or the other in most cases
I agree, I think an error with a more informative message would be better.
We have two options after catching the exception:
What do you prefer?
I've no particular preference, but maybe the second one would be slightly clearer.
@valeriupredoi @bjlittle ?
In my opinion, warnings always get ignored... they're just noise.
So, I'd opt for option two. Catch the exception, log it and re-raise with a new exception :+1:
Sounds good to me either way, guys
Dr Valeriu Predoi.
Computational scientist
NCAS-CMS
University of Reading
Department of Meteorology
Reading RG6 6BB
United Kingdom
On Wed, 25 Apr 2018, 17:40 Bill Little, notifications@github.com wrote:
In my opinion, warning always get ignored... they're just noise.
So, I'd opt for option two. Catch the exception, log it and re-raise with
a new exception 👍—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ESMValGroup/ESMValTool/issues/314#issuecomment-384353208,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AbpCo1NQQZV6grbT1V8ffCrODzWy2VxIks5tsKcRgaJpZM4TjIkD
.
Most helpful comment
I think this way of collecting the files should be discouraged. I mean, it's fine for quick tests by developers, but the user should really keep their data organized. So it's good to have this error message from Iris, maybe we could just make it more explicit ("data duplication found", or similar).