Esmvaltool: Conda build is broken

Created on 10 Feb 2021  路  32Comments  路  Source: ESMValGroup/ESMValTool

The conda build is broken since February 5: https://github.com/ESMValGroup/ESMValTool/actions?query=workflow%3AConda-build+

On CircleCI it runs out of memory (4GB max) and on GitHub Actions it looks like it times out after running for 6 hours.

bug installation

All 32 comments

so a few updates with the new esmvalcore=2.2.0 off conda and the iris3 branch here:

  • environment solves fine (if a bit slow)
  • installation of ESMValTool and tests flagged with -m "not installation" pass fine
  • conda build locally on my 8GB laptop is going into swap when using >4GB memory and from there it needs to be killed manually otherwise the whole show grinds to a halt
(base) valeriu@valeriu-PORTEGE-Z30-C:~/ESMValTool$ conda build package -c conda-forge -c esmvalgroup
No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
WARNING:conda_build.metadata:No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
Adding in variants from internal_defaults
INFO:conda_build.variants:Adding in variants from internal_defaults
Attempting to finalize metadata for esmvaltool-python
INFO:conda_build.metadata:Attempting to finalize metadata for esmvaltool-python
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Attempting to finalize metadata for esmvaltool-julia
INFO:conda_build.metadata:Attempting to finalize metadata for esmvaltool-julia
INFO conda_build.metadata:finalize_outputs_pass(748): Attempting to finalize metadata for esmvaltool-julia
Attempting to finalize metadata for esmvaltool-ncl
INFO:conda_build.metadata:Attempting to finalize metadata for esmvaltool-ncl
INFO conda_build.metadata:finalize_outputs_pass(748): Attempting to finalize metadata for esmvaltool-ncl
Attempting to finalize metadata for esmvaltool-r
INFO:conda_build.metadata:Attempting to finalize metadata for esmvaltool-r
INFO conda_build.metadata:finalize_outputs_pass(748): Attempting to finalize metadata for esmvaltool-r
Attempting to finalize metadata for esmvaltool
INFO:conda_build.metadata:Attempting to finalize metadata for esmvaltool
INFO conda_build.metadata:finalize_outputs_pass(748): Attempting to finalize metadata for esmvaltool
BUILD START: ['esmvaltool-python-2.1.1-py_0.tar.bz2', 'esmvaltool-julia-2.1.1-0.tar.bz2', 'esmvaltool-ncl-2.1.1-0.tar.bz2', 'esmvaltool-r-2.1.1-0.tar.bz2', 'esmvaltool-2.1.1-0.tar.bz2']
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... Killed

so...whatever is causing conda to go mentally memory-hungry has nothing to do with the esmvalcore version (not sure if that's good or bad), I suspect there is a bug somewhere in conda, I'll try backtrack see when we had our last succesful build on GA and sniff around what has changed in that 24h period :+1:

It looks like the conda installation of ESMValTool v2.1.1 also requires more than 4GB of RAM since the same date, the last successful installation on CircleCI was on February 4: https://app.circleci.com/pipelines/github/ESMValGroup/ESMValTool?branch=master

yeah, all hell broke loose after this merge https://github.com/ESMValGroup/ESMValTool/pull/1963 on Feb 4th, but for the life of me if I can see why that would introduce an issue in the environment, think it's just circumstantial

here's an even stranger find - the build environment for a job that was fine is the same as the for a job what failed

Did the conda version change?

nope - 4.9.2 since November last year

here's an even stranger find - the build environment for a job that was fine is the same as the for a job what failed

Are the links correct? I see an XML file when I click them.

here's an even stranger find - the build environment for a job that was fine is the same as the for a job what failed

Are the links correct? I see an XML file when I click them.

yes, I see a nice txt file in browser, but you can just go ARTIFACTS -> build_env from the two jobs (first failed/last passed).

Speaking of conda: I see a Python 3.9 build linux-64/conda-4.9.2-py39hf3d152e_0.tar.bz2 in https://anaconda.org/conda-forge/conda/files - whereas both my machine and the CircleCI are using python38 builds: py38h578d9bd_0 (me laptop which is slightly newer) and py38h06a4308_0 Circle, I am going to try and install that (by upgrading the base Python)

there is a Miniconda3-py39_4.9.2-Linux-x86_64.sh released for py39 in late Dec 2020 - I grabbed that, and tested the conda build - jumped in the same puddle - memory just shoots up to 4.5G and keeps going until it completely fills my swap too (so an extra 2G) in like 5 seconds when solving the env - it's definitely the conda solver that's reacting badly to a set of changes in dependencies that happened around the Feb 4th mark date

ok so I created the env for esmvaltool with conda env create ... no problem, installed just esmvaltool via pip install -e .[develop] no problem too (with esmvalcore=2.2, python=3.9 etc) - all this from scratch in the new miniconda for python39 base - I think it's one of the Julia or R bits that's killing us here @bouweandela

sanity check: it's the esmvaltool-python package build that chugs huge memory, I incrementally turned off the R, Julia and NCL package builds, leaving only the Python one to build. I'll see if I manage to determine which dependency is causing conda to go into an infinite solver loop; btw running with --debug doesn't really give much info

pinning (more) packages in meta.yml like this doesn't do much either:

        - cartopy>=0.18
        - cdo>=1.9.8
        - eccodes!=2.19.0  # cdo dependency; something messed up with libeccodes.so
        - cdsapi
        - cf-units
        - cftime
        - cmocean
        - dask>=2.12
        - ecmwf-api-client  # in esmvalgroup channel
        - eofs
        - esmpy>=8.0.1
        - esmvalcore>=2.2.0,<2.3  # in esmvalgroup channel
        - fiona>=1.8.18
        - gdal>=3.1.4
        - iris>=3.0.1
        - jinja2
        - joblib
        - lime
        - matplotlib>3.3.1  # bug in 3.3.1
        - mpich>=3.3,<3.4  # avoid 3.4.1 external_2 from conda-forge like the plague
        - natsort
        - nc-time-axis
        - netCDF4>=1.5.5
        - numpy>=1.20.1
        - pandas
        - pyproj>=2.1
        - python>=3.6
        - python-cdo>=1.5.4
        - pyyaml
        - rasterio>=1.2.0  # replaces pynio
        - scikit-image>=0.18
        - scikit-learn>=0.24

I found what the problem is - specifying - python>=3.6 in the run section of the build for esmvaltool-python makes it go into an infinite loop - not asking for python as a dep package solves the immense memory drain and the env gets solved well and fast. Note that even just unpinning python but keeping it as a dep, it will make the solver go into madness and infinite loop and freeze. Why are we actually listing python as a dep when that is a system constant? Pinning to >=3.6 is useless anyway since nothing will make the env be built with <3.6

We're pinning it to reduce the size of the search space.

yeah, that makes sense - but why is it SO importnat and able to make conda go mental if in the deps list and the build to hang if it's not. BTW - my latest experiment: I tried to see if an environment with the esmvaltool-python dependencies from meta.yaml gets built - I created an environment file with all those deps (including python>=3.6) and...it bloody solves and gets created. How is conda env create different at env solving than the solver inside conda-build - this smells like a 100% conda bug

here is the list of the packages I have in that env I created from the list of deps from the build file, this could be useful if we start pinning stuff, but note this is done with Miniconda3 for python39
pkgslist.txt

not gonna matter too much anyway since I tried the pinning game inside meta.yml, didn't work, the only thing that got me past the env solving was removing python as dependency, but that makes the thing hang on Circle. Stumped. Calling it a day, need pizza :pizza:

@jvegasbsc could you please also look into this - we need this to be fixed before release, and it's better if more eyes look at it. I've hnoestly got no idea just yet :beer:

I'm on it

I can not get even a hint on why it is failing to resolve. Conda outputs, even with the --debug option do not provide enough info

yeah - the env solves well if from an environment file (see https://github.com/ESMValGroup/ESMValTool/issues/2016#issuecomment-777736240 ) but fails at env solving inside the conda build process - well, it doesnt fail, it just goes in an infinite loop, eating up more and more memory. Having python removed from the deps list in meta.yml is the secret, but that makes the build hang later, after the env had been solved successfully

I'll try downgrading conda-build see what we get

I think we are getting more packages from conda in the package that with the environment, as we relay on pip quite a bit in the last case

downgrading conda-build to 3.21.3 and making sure all conda- components are from the main channel is not solving the issue. I am quite sure it's got to do with Python 3.9 and one of the deps that is in conflict with it - that's why removing python from the dep list in meta.yml makes the env solve fine - and yeah, you're right Javi, that offending dep is probably from pip in the build environment and from conda-forge in the env via an env file - I'll have a look at a regular build env and compare it with the list I posted above

I managed to build successfully on CircleCI by setting python>=3.8 in the - run section of the esmvaltool package: https://app.circleci.com/pipelines/github/ESMValGroup/ESMValTool?branch=conda-build-test
Maybe that could provide a hint?

I looked at the conda solver some time ago and I think I remember that packages that are specified directly are treated differently than their dependencies. If I understood it correctly, for the specified direct dependencies it tries a bit harder to get the latest version, while for the rest it tries to find something that works.

very very good @bouweandela - have you tried running this on your laptop? I'd be curious to know how much is the actual memory consumption, I have tried this exact pin on my machine yesterday and I quickly ran out of memory, but given that my available mem is only order 2G it may not be enough for me but good enough for the CI machine and Github Actions (the ones we care about)

I just tried again on me laptop and went 1.8G into swap (meaning I've used my 2 odd gigs of free RAM and entered another 1.8G in swap) - but that's with the py39 miniconda, but if that works on the remote machines - go for it!

It was about 1.6 GB on my own machine and the same on CircleCI. The only issue with using this would be that we would like to build the package for Python versions 3.6-3.9, not just 3.8 and up.

I do not think that we must worry too much about building the package for the older python versions. We expect users to install esmvaltool conda package in their own isolated environment and, if they don't do, python compatibility is likely to be the least of their dependency issues.

Anyway, 3.6 end of life is this year so we should start thinking on dropping support for it

I agree with Javi, but the problem is that if the user has Python<3.8 in their anaconda environment and they want to keep it that way, the installation of a package built with Python>=3.8 will not work. This is probably going to happen in 1% of the cases, but we can't safely say we support Python=3,6 and =3.7 in this case

we can't safely say we support Python=3,6 and =3.7 in this case

We can be more explicit and say that our conda package only supports python >=3.8, but ESMValTool can be installed in >= 3.6 from source

I would be fine with dropping Python 3.6 support, but I think it's too early to already consider dropping support for Python 3.7.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

valeriupredoi picture valeriupredoi  路  4Comments

bjlittle picture bjlittle  路  5Comments

BenMGeo picture BenMGeo  路  5Comments

lukasbrunner picture lukasbrunner  路  4Comments

axel-lauer picture axel-lauer  路  5Comments