Esmvaltool: large differences in resources between iris 1.13 and iris 2.0

Created on 30 Apr 2018 · 31Comments · Source: ESMValGroup/ESMValTool

Hey guys, @mattiarighi has noticed that the same run done with different versions of iris raises differences in effective run times. He called iris as the root cause. So I decided to run some systematic tests to see first if iris is indeed the problem and not some variation in the transition python 2.7 -> python 3.6 environment, and second, observe by how much we are losing in resources due to iris version change. Here are my results (the pdf file, 3 slides), done for different configurations of the run with either iris 1.13 or iris 2.0. To summarize:

iris version is indeed the problem, replacing iris 2.0 in the python 3.6 environment gives the same memory and times as for iris 1.13 in a python 2.7 environment; it is actually faster (well, one test only so don't trust it 100%) to have iris 1.13 in python 3.6;
the resources taken by using iris 2.0 (both memory and time) are much higher than the ones using iris 1.13; this is a major problem when dealing with large numbers of models and heavy operations like masking and multimodel analyses; in fact, on average, the times double for these operations and @mattiarighi has already encountered cases of running out of memory;

@bjlittle is this because of the change from biggus to dask? Have you guys at the MO noticed these issues as well compared to iris 1.XX? And most importantly, what do you recommend us to do -- eg pin iris 1.17?
Iris2_Issue.pdf

help wanted iris

Source

valeriupredoi

Most helpful comment

@nielsdrost and I discussed this issue and we feel that the really important issues at the moment are the issues that prevent us from doing the v2 alpha release, see this project for a list. Even if iris 2 uses more resources, we are still much faster than ESMValTool V1. Therefore I would suggest not spending too much time on this now, but instead focus on getting our first release out.

So far we've done very little on optimizing the code for memory usage and runtime, apart from trying to write decent code. After we have the minimum required set of features implemented and a reasonable set of diagnostics working, we can go and see if we can make the implementation more efficient, because then it will also be clear where the bottlenecks are (if any). If you start optimizing too early, you run the risk of spending a lot of time optimizing something that turns out to be not important after all.

If the solution to this issue is indeed as simple as suggested by @pelson above, then I would suggest implementing that and switching back to iris v2 (i.e. revert PR #320). That way we will be encouraged to write code that works well with v2 of iris and we will benefit from new features and bugfixes as they appear.

bouweandela on 4 May 2018

👍4

All 31 comments

I have done another test: read two netCDF files, get the root mean square of the two cube datas and assemble a composite mask of the cubes (|=), no writing to file, just operations in memory -- with iris 1.13 and 2.0:

Repeated masked array operations: 35 times
Iris 1.13: mean 0.740 s +/- std 0.125 s
Iris 2.00: mean 0.908 s +/- std 0.191 s

So even at a relatively simple load and a couple mathematical operations we notice something of a 20% offest in real time from iris 1.13 and iris 2.0