Ray: Create conda-forge packages for Ray

Created on 22 May 2017  路  23Comments  路  Source: ray-project/ray

To make conda install ray -c conda-forge work.

Someone (I can?) create a pull request into https://github.com/conda-forge/staged-recipes to get things started; any pitfalls to be aware of?

P2 enhancement

Most helpful comment

I am just starting to use ray and found it very impressive. However, the lack of conda packaging makes installation much less convenient as mixing conda and pip installs can cause all sorts of headaches.

All 23 comments

I've never really used conda install so I don't know what's involved. Let me look into this more.

This issue is a little bit old but here is the recipe: https://github.com/barrachri/staged-recipes/blob/master/recipes/ray/meta.yaml

It's my first conda-forge recipe, so I am not sure about the build and run requirements (I used the ones inside setup.py).

cc @xhochy

It seems the building process is a little bit too complicated and hard to integrate inside a conda-recipe (https://github.com/barrachri/staged-recipes/commit/c1cf6dde6a429ae0706e038d8e03c8b348956dd1#commitcomment-25555553).

How can we simplify it?

For example is arrow built externally for a specific reason?

Wanted to bump this as well. How does one build this with external arrow? (and other deps)

E.g., there's pyarrow, arrow-cpp and parquet-cpp conda packages in conda-forge.

There's also boost and flatbuffers in conda-forge -- would be nice to be able to point at them too.

@aldanor currently we don't build with external arrow (and it may not be realistic in the short term given how rapidly the APIs we use in Arrow are changing, so we really rely on specific Arrow commits).

As for boost and flatbuffers, that could make sense.

I may not have a chance to look into this for a while, but if you have some expertise on using conda-forge, would you be interested in giving it a try? Or pointers to other projects that use boost/flatbuffers from conda-forge would be helpful.

Took a closer look at this and the current build system is a little tricky to manage with conda.

For example, it'd be much easier (as @aldanor mentioned) to use conda packages for boost or redis. In the case of redis, its Makefile assumes some relative paths that make the conda package builder upset.

I suppose once the dependencies themselves have stabilized a bit (or required changes from ray upstreamed in), conda might be an attractive option. Until then, we'd have to maintain multiple conda packages and probably more trouble than its worth.

Any updates on a allowing conda install?

Someone created a conda recipe that didn't get merged yet - https://github.com/conda-forge/staged-recipes/pull/8163

Someone created a conda recipe that didn't get merged yet - conda-forge/staged-recipes#8163

Main reason: The build fails.

Otherwise it would be nice when the conda-package wouldn't vendor pyarrow as the PyPI package does(?).

To add on to what @odelalleau mentioned, https://github.com/ray-project/ray/issues/3377#issuecomment-571669913 conda installing non-Python dependencies on its own allows packages to still work on powerful-but-quirky machines. I've gotten time on some high-powered machines only to find that I couldn't get some packages to work when installed via pip --- because the required libraries were missing, or in the wrong place, or misconfigured, or something --- but conda installing the same packages worked fine, since conda just ignored the environment and downloaded its own copies of the non-Python dependencies.

More importantly, some packages simply cannot be hosted on PyPI. It's great that ray can be (and it looks like you've done a lot of work to make that possible, which is appreciated), but e.g. graph-tool can only be hosted on conda-forge, due to its thicket of non-Python dependencies.
https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions#manual-compilation

Anyone making a package that relies on a package like graph-tool must also use conda-forge, because PyPI doesn't have the same power for handling non-Python dependencies.
Because of that, many packages (such as numpy) are mirrored between PyPI and conda-forge, so that downstream packages can depend on both numpy and graph-tool.
It would be very useful to have ray mirrored in the same way. (I'm currently working on a package that requires both ray and graph-tool and will therefore have to be distributed via conda-forge because it cannot be handled by PyPI.)

The build logs from https://github.com/conda-forge/staged-recipes/pull/8163 are no longer viewable, so I created a new attempt: https://github.com/conda-forge/staged-recipes/pull/10637

I'm currently trying to follow the instructions at https://ray.readthedocs.io/en/latest/installation.html#building-ray-from-source, for lack of instructions on how to build the PyPI package.
While I am very much not an expert on conda, I have the vague impression that anything that can be uploaded to PyPI can also be uploaded to conda-forge the same way, so if there are instructions anywhere for the process of making and uploading the PyPI package, that might be better, I don't know.

I'm currently failing on:

    /home/conda/staged-recipes/build_artifacts/ray_1579456441783/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/bin/bazel: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/conda/staged-recipes/build_artifacts/ray_1579456441783/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/bin/bazel)

I conda installed libgcc like it said in https://ray.readthedocs.io/en/latest/installation.html#dependencies, but ../build.sh doesn't seem to be using the conda-installed version, not sure why.
It has GLIBCXX_3.4.13, but not GLIBCXX_3.4.14. I think (?) GLIBCXX_3.4.14 is the minimum required for ray.

$ /sbin/ldconfig -p | grep stdc++
    libstdc++.so.6 (libc6,x86-64) => /usr/lib64/libstdc++.so.6
$ strings /usr/lib64/libstdc++.so.6 | grep LIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13

Actually, something's going wrong there. libgcc as installed by Anaconda should have the newer versions.

$ ~/anaconda/bin/conda install --name tmpEnv libgcc
The following NEW packages will be INSTALLED:
    libgcc: 7.2.0-h69d50b8_2
$ strings ~/anaconda/envs/tmpEnv/lib/libstdc++.so | grep LIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_3.4.26

But bazel isn't finding the libgcc installed by conda, I guess.

Incidentally, anyone holler if they have preferences for a conda-forge name.
https://github.com/conda-forge/staged-recipes/pull/10637#discussion_r368317883

@dHannasch the procedure for building Python wheels for PyPI is documented in https://github.com/ray-project/ray/blob/master/python/README-building-wheels.md.

the procedure for building Python wheels for PyPI is documented in https://github.com/ray-project/ray/blob/master/python/README-building-wheels.md

Ah, too bad. I don't think we can use Docker.

I've started a new attempt at this in conda-forge/staged-recipes#11160.

It's going to be a mammoth undertaking. Unravelling the bazel build script and building all required dependencies is going to take a while. Any feedback (or help) welcome!

@h-vetinari out of curiosity, what's reason for needing conda support? It sounds great, just want to understand some of the motivation behind the request!

@h-vetinari out of curiosity, what's reason for needing conda support? It sounds great, just want to understand some of the motivation behind the request!

@anabranch one reason I mentioned in https://github.com/ray-project/ray/issues/3377#issuecomment-571669913 is that conda dependencies like NumPy can be better optimized than their pip counterparts.

@odelalleau thanks! quick follow up, do you mean during install (faster install) or do you mean that the installation more matches the system and therefore will give better performance when actually run? Not familiar with the intricacies so please excuse my likely stupid question :)

@anabranch I mean better performance after installation. It will depend on your machine, but I remember testing back when I made that comment and I had better runtime performance with conda install numpy vs pip install numpy.

@anabranch: @h-vetinari out of curiosity, what's reason for needing conda support? It sounds great, just want to understand some of the motivation behind the request!

I'm a conda-user, plus I help with packaging (purely volunteer basis). That being said, conda provides a very comfortable user experience, as it works cross language, packages are pre-compiled (with ecosystem-wide compatibility across packages; per platform), and does much better dependency management than pip for example. It's also fundamentally better-suited (IMO) for complex projects such like arrow (with complicated CPP-requirements or newer compiler-requirements than e.g. manywheelX can offer), or GPU-enabled libraries. This is due to the huge amount of work that has gone into the compilers/packaging ecosystem (you can install a working c++ compiler anywhere, with one line of code).

The price for all of that (aside from the infrastructure/framework) is to package the respective libraries, but this has to be done only once - by the "recipe" maintainers (which is where I volunteer) - and not by every user themselves. The maturation of their build system has allowed them to now add more platforms (e.g. aarch/ppc64le), and even runtimes (pypy). Like with any tool, there is an initial learning curve, but for me personally, that has more than paid off (hence contributing something back).

TL;DR: Packaging ray in conda would allow me to install it with much less hassle, on any (standard) platform I choose, across all modern python versions.

PS. I'm not requesting anything - the issue here exists for a while. I've just picked it up again and wanted to notify interested parties here.

Yeah i know it's not a request and I'm super appreciative to hear the feedback. This makes perfect sense and thanks so much for your contribution! I guess I had never really sat down and thought about ... why make a conda package and this is a perfect explanation - thank you for taking the time to share it!

No problem. :)

Another thing I forgot to mention, which is IMO a good representation of the capabilities they have put up: You can switch your BLAS implementation (across all your installed packages) by changing one line in your requirements file. Granted, not every BLAS-dependent library on every platform may have builds for all blas variants (currently mkl, netlib, openblas, blis) already, but doing that manually without breaking your environment would normally be an extremely involved undertaking.

Thanks for humoring me with the details! I had glanced at the conda forge site and there wasn't a good explanation of "why" - your comment makes it super clear as to why someone would want to pursue this... and it makes a ton of sense. Ping me if there are bits of information about the project you'd like me to hunt down, I'm pretty new to it but happy to help if there are details you're having trouble chasing down...

I am just starting to use ray and found it very impressive. However, the lack of conda packaging makes installation much less convenient as mixing conda and pip installs can cause all sorts of headaches.

It looks like the main two efforts here are:

https://github.com/conda-forge/staged-recipes/pull/11160
https://github.com/IBM/powerai/blob/master/conda-recipes/ray-feedstock/recipe/meta.yaml

The IBM link seems to have something working already and sounds like we just need to port it to conda-forge.

Was this page helpful?
0 / 5 - 0 ratings