Esmvaltool: Memory issues after merge of #616

Created on 24 Oct 2018  路  11Comments  路  Source: ESMValGroup/ESMValTool

After merging the latest development branch into my branch, this recipe fails with the following Memory Error:

Traceback (most recent call last):
  File "~/anaconda3/envs/esmvaltool/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "~/ESMValTool/esmvaltool/_task.py", line 555, in _run_task
    return task.run()
  File ~/ESMValTool/esmvaltool/_task.py", line 188, in run
    self.output_files = self._run(input_files)
  File "~/ESMValTool/esmvaltool/preprocessor/__init__.py", line 284, in _run
    input_files, self.settings, self.order, debug=self.debug)
  File "~/ESMValTool/esmvaltool/preprocessor/__init__.py", line 209, in preprocess_multi_model
    debug)
  File "~/ESMValTool/esmvaltool/preprocessor/__init__.py", line 240, in preprocess
    result = [function(items, **args)]
  File "~/ESMValTool/esmvaltool/preprocessor/_io.py", line 65, in load_cubes
    if np.ma.is_masked(cube.data):
  File "~/anaconda3/envs/esmvaltool/lib/python3.6/site-packages/iris/cube.py", line 1712, in data
    raise MemoryError(msg)
MemoryError: Failed to create the cube's data as there was not enough memory available.
The array shape would have been (672, 17, 192, 288) and the data type float32.
Consider freeing up variables or indexing the cube before getting its data.

This line was added with #616.

Before that, my recipe took ~2 hrs on 1 node with 8 tasks (on DKRZ's mistral). Right now, it even fails on 4 nodes with 4 tasks each and timeouts (> 8 hrs) in 4 nodes with 2 tasks each. Is it really necessary to access the cube's data in the following line? Does it justify that I need more than four times the ressources to run the recipe?

https://github.com/ESMValGroup/ESMValTool/blob/2cf569d73d97b69560a8fdc33908e12a59a51e82/esmvaltool/preprocessor/_io.py#L65

iris

All 11 comments

Yes, there is a typo. @mattiarighi I will make a hotfix and reference it from this issue.

Adding more nodes will not help, as ESMValTool does not have multi-node parallelism yet.

Memory issues are appearing because that line is making data to go real way too early

@jvegasbsc can you have a look at PR #678 whether it fixes the problem?

It does not. The problem will reappear when using iris 2.0

Ok. I'll add a note on the Iris2 issue so that we do not forget.

This closed issue https://github.com/ESMValGroup/ESMValTool/issues/302 discusses the early loading of the data that causes us this trouble now as "work around" for iris 2.0. Maybe we need to reopen it.

The code entered via PR https://github.com/ESMValGroup/ESMValTool/pull/315 and was commented by PR https://github.com/ESMValGroup/ESMValTool/pull/320

the _fillvalue needs to be set as early as possible, the problem here is with iris 2.1 and not with what we do with the data at a given stage, the MemoryError occurs at cube data load stage, I wouldn't reopen #302 but rather test in iris 2.1 with a cube of the same shape @schlunma is using, I can do this right now, but in general, the new iris is problematic in more than one respect

Yes, but in that line we are calling cube.data so in fact we are realizing the data at a stage that the preprocessor does not expect. This give us a monstrous memory consumption (no time nor area slicing at that point).

We need to find a way to add the fill_value without loading the data

good point @jvegasbsc we can move that to the point where we first load the (reduced) cube(s)?

We need to find a way to add the fill_value without loading the data

Please open a separate issue to discuss this.

yessir! am starting to look at it right now

Was this page helpful?
0 / 5 - 0 ratings

Related issues

francesco-cmcc picture francesco-cmcc  路  4Comments

valeriupredoi picture valeriupredoi  路  4Comments

valeriupredoi picture valeriupredoi  路  3Comments

bascrezee picture bascrezee  路  5Comments

lukasbrunner picture lukasbrunner  路  4Comments