Conda-forge.github.io: Should we use gcc from the default channel for Linux (and maybe OS X)?

Created on 17 Feb 2016  路  81Comments  路  Source: conda-forge/conda-forge.github.io

Most of the time we are OK using the compilers installed in the CIs because we all have similar build tools pre-installed in our machines. However, every now and then someone tries to use the packages in a docker image without those tools. (For example https://github.com/ioos/conda-recipes/issues/723 and https://github.com/ioos/conda-recipes/issues/700.) A few questions:

  • Will that be fixed using the gcc from the default channel?
  • What would be the downside of that?
  • How about OS X? Are we relaying on clang or homebrew gcc? Or it does not matter?

Most helpful comment

@msarahan: Also, if you do decide to experiment with bumping Anaconda's minimum required version to RH6, then I at least will be extremely interested in what you find :-). This transition is going to happen soon-ish one way or another, since RH5's final EOL is <12 months away now (see also a RH engineer commenting on this here: https://github.com/pypa/manylinux/issues/46#issuecomment-206702714 -- "EL5 overall is now well into its wind-down days and we are working with folks still running it to move off").

(I guess in your position I would also explore the viability of convincing the dynd folks to cut it out with this whole "writing software that can't be distributed" thing...)

All 81 comments

I have no idea about most of this, but:

We should build with the same toolchain as Anaconda is built with as much as possible. Which is clang on OS-X, I think.

And the "manylnux" folks have been working on a Docker image for building manylinux wheels, which is derived from the Anaconda experience -- so that might be a good place to go for Linux:

https://github.com/pypa/manylinux

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong. My hunch therefore would be to stick with what we have until we find a problem with it. :+1: / :-1:?

馃憤

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I
know there is no time like the present to get this right, but I don't
really have any experience of it going wrong.

Well, it is not a matter of right and wrong. I am pretty happy too. The
issue arises when people use the conda package in minimalistic docker
images.

My hunch therefore would be to stick with what we have until we find a
problem with it. / ?

+1 let's just document that people should install _build_essentials_ and
etc.

let's just document that people should install _build_essentials_ and

The issue arises when people use the conda package in minimalistic docker images.

Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install build_essentials...

My bad, I am on the phone and the # refs above should point to the ioos conda recipe repo.

Ah OK. I've not seen these. I'm happy to tighten that requirement down
somewhat - it sounds like quite a big ask to install build_essentials...

build_essentials was a lazy solution from my part. Some cases need only
libgomp, others libgfortran.

In the few cases where I have had issues elsewhere, I find I can use install_name_tool or patchelf to link things to something like libgcc from conda to resolve these sorts of issue. A little inelegant I suppose, but I do like using the system compilers if I can.

So, this ( https://github.com/conda-forge/staged-recipes/pull/164 ) might be such a case where we would want to use conda's gcc.

Here's what I understand:

If you ship libgcc (more importantly, libstdc++, which comes with it) and shadow the system libstdc++, and the system libstdc++ is newer than the one you ship, you'll run into unresolved symbol errors at runtime and crash or fail to run.

This has been a huge motivator for me to get GCC 5.2 running in our docker build image.

I have argued very strongly internally against using the gcc that is in defaults. My main argument against even having this package is that people will use it on unknown platforms - and this means their packages will have an unknown version dependency on GLibC.

IMHO, Continuum should just ship all the runtimes, the same way we do with Windows. They are much more nicely backwards/forwards compatible on Linux, but I don't see harm in keeping them controlled on Linux.

I think an argument can also be made that you should ship no gcc, libstdc++, and similar runtimes and instead always depend on the system provided ones. This seems to be what the manylinux folks are doing with wheel files. I'm not sure which option is better but I think both should be on the table.

One of the other ideas, I was playing with in that PR is bundling only a few essential components like libgfortran or libgomp from the VMs we building in. Things that may not already be included on the system, but we are (or will be) linking against. I am just worried they will get crushed when someone installs defaults' libgcc package and am unclear on if (when) that leads to bad behavior. Also, I know a little less about how these fringe components interact with libgcc, which they are all linked to.

Alternatively static linkage remains a valid option here.

I have run into issues where a Fortran compiled extension linked against symbols in my system provided libgfortran that were not in the Anaconda provided one which caused the extension to fail to import. Using conda uninstall -f libgfortran fixed the issue but it is not ideal.

If runtimes are shipped on Linux it seems they must be the most up-to-date versions. Keeping these up to date may require significant maintenance.

Yeah, I am liking the static option more and more.

I'm not clear on how the Manylinux stuff works to depend on libstdc++ on the system. I'm sure they have something figured out, but I just don't understand it.

This is the article that convinced me to pursue the approach I'm behind: http://www.crankuptheamps.com/blog/posts/2014/03/04/Break-The-Chains-of-Version-Dependency/

Note that this is the same approach taken by the Julia team.

Found it. They place tight restrictions on ABI version:

Therefore, as a consequence of requirement (b), any wheel that depends on versioned symbols from the above shared libraries may depend only on symbols with the following versions:

GLIBC <= 2.5
CXXABI <= 3.4.8
GLIBCXX <= 3.4.9
GCC <= 4.2.0

Yup, from my understanding they are defining a base linux system that has a set of "core" libraries which they expect to 1) exist and 2) match a minimum version. But pip does not have a effective method for providing more up-to-date runtimes like conda does.

I'm warming more to the idea of providing the latest runtimes. Would this allow us to compile package with the GCC 5 libstdc++ ABI and run them on systems using the GCC 4 API?

I feel like Conda has a better approach here, making the assumption that we should provide it. People can conceivably pip install something without having libstdc++ installed, and end up confused. My wife had that happen with Steam on her Linux computer, for example. Good times. I never thought work would be so useful at home.

FWIW, I'm pretty sure Continuum is taking this route, and you can be certain that it will be maintained as long as we're pushing it, because we'll have customers screaming otherwise.

@jjhelmus yes. Here's my understanding with GCC5:

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" => GCC4 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC5 (abi 5)

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=1" => GCC5 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC4.

Continuum is planning on the former setting for now, with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

I have tried to make that ABI info readily accessible with startup scripts in the build docker image: https://github.com/ContinuumIO/docker-images/pull/20/files#diff-8320ce46adf2819c0900060bd6c14c43R16

(also see the start_c++??.sh scripts, which are meant to be simple front-ends)

Continuum is planning on the former setting for now...

Alright, this clarifies the Linux stuff for me.

...with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

That's going to be fun. Hopefully, conda-forge has everything and is super fast then. :smile:

I have tried to make that ABI info readily accessible with startup scripts in the build docker image...

Thanks. This is really useful.

I'm on board too. Thanks for the great explanation @msarahan. It took a bit but I'm seeing the light. Of course now I'm going to have to build GCC 5 tonight.

Sorry for the long tangent on this PR @jakirkham, did this answer your original concern?

This is what I am still unclear about, are we shipping gcc on Mac too?

My current opinion is yes. I'd like to avoid it if possible, but I see the need for OpenMP and Fortran. I'll keep you all in the loop on any discussions we have here.

Ok. With OpenMP, maybe we can get around it by doing something similar to the Linux strategy namely building the newest clang on our oldest Mac (10.7). Though Fortran remains a different problem.

Thanks for being receptive, both of you! Now let's go rule the world! (or maybe just build great software)

Thanks for keeping us in the loop.

Now let's go rule the world! (or maybe just build great software)

Are they mutually exclusive? :smiling_imp:

So, we might be able to pursue a similar approach as with gcc on Linux, but using clang on Mac. As Apple makes clangs source available. For instance, here is the most recent version of clang for Mac. Based on the llvm version (3.7) in the code, this should support OpenMP. If we build this on 10.7 and/or fix the min framework to 10.7, maybe we could use this to build code that needs OpenMP. As it is the Mac system compiler, it should still support all the special arguments that the actual system compiler would. By using this, it would keep us free from the gcc mess.

Unfortunately, fortran always needs to go through gfortran or some other compiler. I don't believe there is any actively maintained clang frontend for fortran that is stable. DragonEgg was the closest thing, but that has been unmaintained for ~1.5 yrs. However, it might be ok to partition fortran stuff into a special box using gcc. Perhaps we could even use a version from gcc 4.2 so it remains compatible with the old Mac gcc compiler. This way we only would ship libgfortran.dylib with things and not need to ship the other gcc libraries.

Another interim/partial solution for the Linux compiler problems (mainly missing C++11 support) might be just to add another CentOS repo for 6.x that has an acceptable version of gcc for us.

My past experience doing this has been pretty good. I don't find myself needing to distribute any libgcc package ever; even though, I dynamically link to the compiler's libraries. This includes situations where the system compiler and libraries are older. I have installed conda packages built this way in very minimal Docker containers and not run into any issues using them. Also, have installed these on most recent Linux systems and they have worked quite well. There is no need to worry about how the compiler is built as it is designed to integrate smoothy into the existing OS. Plus, the installation is fast. So, it will easily be included in any existing Docker Hub build that we are doing.

According to current information, CentOS provides a copy of gcc 4.9.1 as can be seen in their listing. This will, of course, require adding the appropriate SCL (devtoolset-3). This is not 5.x, but it is pretty new and will have full C++11 support and some C++14 support. Thus it will be easily sufficient in cases where programs expect C++0x support, which seems to be an increasingly common case. Though it appears SCL (devtoolset-4) does have gcc 5.2.1, which supports all of C++11 and C++14. So, it is possible to use that, as well.

In short, I think installing a newer gcc from another CentOS 6 repo would be a step in the right direction (even if it is not the final step). It would provide the functionality that we need from the compiler without otherwise hindering our packaging process. Thoughts?

Just for fun, I put a very simple mock up of this in a Docker container that is hosted on Docker Hub. It took ~5min to build and push. It has a link back to the code (again very simple). I added the gcc 5.2.1 compiler from devtoolset-4 and set it up so that at container startup one immediately has access to this.

To take it for a spin, I tried building a simple program that used C++11 features (shown below) with this compiler.

// hello.cpp

#include <iostream>
#include <functional>

int main()
{
    std::function<void()> f = [](){ std::cout << "Hello World" << std::endl; };
    f();
    return(0);
}

If I build this program ( g++ --std=c++14 hello.cpp ), the a.out file can be easily run (just prints Hello World), but it can also have its linkages inspected as shown below. Note these are all pointing to standard system libraries. Also, note that the libraries associated with our compiler live in this path ( /opt/rh/devtoolset-4/root/usr/lib/gcc/x86_64-redhat-linux/5.2.1/ ).

$ ldd a.out 
    linux-vdso.so.1 =>  (0x00007fff731e8000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f8313358000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f83130d4000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8312ebd000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f8312b29000)
    /lib64/ld-linux-x86-64.so.2 (0x00005570af8f0000)

If I get really creative, I can export the a.out file (using Docker's mounting features) and load it into a vanilla CentOS 6.6 Docker container. The program still runs just the same. No errors.

What is the take away message here? We can use the gcc provided with devtoolset with any version we want. All compiled programs will be linked in a way where it uses standard system libraries so there is nothing for us to ship (from the compiler). As it is CentOS 6, these system libraries will be really old and probably older than any other Linux distro this program will be installed on. Thus we need not worry about portability in the same way we did with the gcc package yet we retain the benefits.

If there is something I am missing or some problem you see here, please feel free to share. I am sure I can learn more about this too. :smile:

This is a bit of tangent about OpenMP and Fortran that doesn't really pertain to the stuff above. Though, I did try similar things with OpenMP and Fortran, which resulted in binaries linked against the system versions of these libraries. However, as many systems do not have OpenMP and Fortran libraries pre-installed, we can't ship something that will work out of the box. Still this situation is no worse than any of the other compiler solutions available. In other words, how can we let a user install packages built with Fortran and/or OpenMP support without an expectation that they have these?

The case of libgfortran is pretty straightforward if we statically link to it. This avoids the user having to install our libgfortran, which anyone who has been following issues on conda and conda build is aware causes countless problems. All we need to do is set a single flag. It is just a matter of getting the make build to recognize this option.

As for OpenMP, the situation is unfortunately not as clear cut. It is not recommend to statically link with OpenMP using gcc. Some have found distributing packages built with OpenMP so problematic that they have dropped using OpenMP altogether (though they mention Windows MOSEK is cross platform and I am sure they have seen this issue on Linux). Where possible, taking this strategy should work. Namely prefer using POSIX threads to OpenMP. This way we can guarantee the needed libraries will be available. However, if that is not an option and OpenMP support is the only threading option, I am not sure what the right answer here is. If we tried to statically link against libgomp (GNU OpenMP) we would also have to statically link against libpthread (POSIX threads). Statically linking to libpthread just sounds like a bad idea to me and I am not alone. So, maybe we could distribute the system libgomp. It will still be tied to an old version glibc so should work on older systems. I suppose this could apply for libgfortran if we really need to, but I don't think it is necessary in that case.

Honestly, I'm not really sure the right way forward with OpenMP. If we did packaging it, I would want to make sure we don't trample on the user's system. In other words, we should check to see if OpenMP support is already present. If it is, we simply don't install our packaged version of OpenMP. Only if there was no copy of OpenMP already, should we try to install ours.

Does anyone have better ideas on the OpenMP problem? Thoughts on the Fortran problem?

@jakirkham this is good information - especially about GCC 5.2 in devtoolset 4 - but why are you doing it? I don't really see this as an intermediate, more as something completely different.

I think we were already shipping libgfortran and libgomp in the libgcc package (but those need to be split out). Is that not a good idea? Should we reconsider? I think you'll still end up needing to ship libstdc++.so - your example is a good test, but I think it's likely that gcc was able to emit all ABI 4-compatible code here. I don't think that is always the case. Compile with -wabi=2 to have it warn you about incompatible stuff.

The troubles that people had with conda and libgfortran was some sloppy metadata. We updated scipy to build with GCC 4.4 recently because 4.1 was causing issues. The version spec for libgfortran was blank - so users were not necessarily getting the latest version. To make things worse, since both libgcc and libgfortran packages both include libgfortran, there was an ambiguity in which one might clobber the other (primarily just the symlink, since the so versions were different.) By both specifying libgfortran specifically, and by applying the appropriate version specifier, I don't think there would be issues.

Let's talk about this in tomorrow's call. I appreciate that we need to do something about a C++11 compiler, and I want to make sure that we're not spending effort now that will end up wasted.

Thanks @msarahan for giving this some thought.

this is good information - especially about GCC 5.2 in devtoolset 4 - but why are you doing it? I don't really see this as an intermediate, more as something completely different.

Not sure which question to answer here. :smile:

I thought of this as a potential intermediate solution for the following reasons:

  • CentOS 5 has been discussed as something we may use in the future (might be able to use a devtoolset there).
  • We discussed using the newest gcc compiler on CentOS 5. (having CentOS 5 support and the latest compiler through a devtoolset looks to be incompatible)
  • Using the pre-packaged gcc has some known downsides and this takes the first step away from that.
  • It fits within our current framework, without asking for much or really constraining us.
  • The step made here is simple enough that people can easily play with it.
  • It ensures our binaries are built with a compiler we already plan to use.
  • The binaries we build remain compatible with the CentOS 6 libraries included by default.

However, you are right in that this could be a complete and final alternative. Here are some reasons we might want to stick with it.

  • Building the compiler from scratch is a lot of work.
  • Debugging subtle compiler bugs is challenging. (for instance I could never compile OpenBLAS with the Continuum gcc compiler on Mac and never could figure out why)
  • This proposal is simple enough to be built on Docker Hub and allow for easy community interaction.
  • Some things seem to have issues compiling or simply won't work on CentOS 5.
  • CentOS 5 maintenance support ends March 2017.
  • It remains unclear (at least to me) whether the lowest common denominator for the community still needs to be CentOS 5.

I think we were already shipping libgfortran and libgomp in the libgcc package (but those need to be split out). Is that not a good idea? Should we reconsider?

On the libgfortran side, I really think static linking is simple and effective. Here is an example where some package expected to use the system libgfortran, but got really messed up. These are my thoughts with regards to linking libgfortran.

  • Static linking

    • Makes the binaries bigger. (how many packages really use Fortran?)

    • No longer misses libgfortran.

    • Has no chance of picking up the wrong or incompatible version of libgfortran.

    • No need to figure out how to ship libgfortran.

  • Dynamic linking

    • Keeps the binaries small.

    • Could miss libgfortran so we must ship it.

    • Could mask the system version of libfortran.

    • How do we determine when to install libgfortran?

In short, I think statically linking libgfortran is just less risky and the cost paid is so minimal that I am just willing to accept it. Plus, it frees up developer time from this otherwise hairy issue.

As for libgomp, I think we have to package it (static linking is not an option). We could expect users to have it, but that really goes against the spirit of conda. We should just proceed with caution.

I think you'll still end up needing to ship libstdc++.so - your example is a good test, but I think it's likely that gcc was able to emit all ABI 4-compatible code here. I don't think that is always the case. Compile with -wabi=2 to have it warn you about incompatible stuff.

So admittedly this is a very trivial example (it might be a painful demo if it wasn't :wink:), but I have used this to compile things like VIGRA, Boost, and a variety of other complex C++ code with C++11 support and use it out of the box on machines that don't have this support without shipping any libraries from the newer compiler. My understanding of the devtoolset is to provide the functionality of a newer gcc, but still be able to deploy on the same OS with the standard system libraries. This seems to be a convenience for developers trying to deploy code to a cluster with an old OS. My understanding is it does this by doing some static linking (hence why we see no linkages to libraries shipped with the devtoolset compiler). RedHat provides some good info on how devtoolsets work, where the binaries they build are expected to work, and how C++ compatibility works. If we really need CentOS 5.x support, we could guarantee that support with an older devtoolset (2.1) that still has C++11 support, but we would need to switch to 5.x too. If we decide to use the devtoolset proposal, we basically need to pick one devtoolset version and stick with it.

The troubles that people had with conda and libgfortran was some sloppy metadata. We updated scipy to build with GCC 4.4 recently because 4.1 was causing issues. The version spec for libgfortran was blank - so users were not necessarily getting the latest version. To make things worse, since both libgcc and libgfortran packages both include libgfortran, there was an ambiguity in which one might clobber the other (primarily just the symlink, since the so versions were different.) By both specifying libgfortran specifically, and by applying the appropriate version specifier, I don't think there would be issues.

Yes, there were these issues. I believe they have been mostly resolved. Though some like the example above remain. Though this raises a really good point that we should think about.

Having to ship libgcc puts us in a situation where we have one package that everything depends on. Any break in it weakens the whole ecosystem.

I think you are right to say we should move away from the gcc package. It had lots of weaknesses and took a lot of magic to work correctly. Some of it is a bit hand wavy unfortunately. For the most part it works, but when it doesn鈥檛 (except for a few common cases) it is almost impossible to tell what went wrong. Maybe moving away from libgcc mostly (if not entirely) is a good idea too. I don鈥檛 think any of us enjoys having to deal with these problems. Let鈥檚 find a way of avoiding letting them happen at all. Perhaps it is not possible, but if it is our lives would be that much simpler. :smile:

Let's talk about this in tomorrow's call. I appreciate that we need to do something about a C++11 compiler, and I want to make sure that we're not spending effort now that will end up wasted.

Absolutely! Thanks for your feedback on this. Certainly there is still room to discuss and think about all of this. It is a fairly challenging problem and there are many reasonable approaches. I do appreciate the thought and work that everyone has already put into this problem.

Just for more food for thought, I added a CentOS 5.11 container to Docker Hub, as well. Was automatically built from the same repo using the centos_5 branch. Took ~7min. Still pretty simple. The latest version of devtoolset for CentOS 5 is 2.1. This provides gcc 4.8.2, which does support C++11. Feel free to play with it if you are curious. Built the sample example from above without issues and worked great on CentOS 5.11 (without devtoolset installed) and CentOS 6.6 (also without devtoolset installed).

So, after a nice discussion with @msarahan, @pelson, and @ocefpaf, one of the important concerns that @msarahan raised was the need to be able to build 32-bit and 64-bit binaries. While there are 32-bit RPMs for devtools, they don't seem to want to install by default. Others have had this same issue, as well. After a bit of searching, I found a workaround in a gist, which influenced a re-write of the CentOS 5 docker image. At this point, it can compile 32-bit and 64-bit using devtools-2.1 with gcc 4.8.2. With this I built Fortran, OpenMP, C++ and C++11 code. I pushed it to Docker Hub and it took only 4mins to build. Please try it out and give me your thoughts. The same should be possible with devtools-2.1 on CentOS 6 though I may give this a try for demonstrative purposes.

For further independent verification, I have shared the test code and run this as part of the lastest DockerHub build.

Please let me know your thoughts.

@jakirkham thanks for continuing to work on this. I am going to spend some time this weekend carefully reading the Red Hat articles you linked, and also testing your test program, as well as seeing if there are any other more stringent tests for us to understand our boundaries.

If possible, I would like to keep GCC 5.2. 4.8 is attractive because it is in line with the manylinux effort, and because it is readily obtainable from the devtoolset, but I see little else going for it. It does not support as many modern optimization options (https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/4/html/User_Guide/sect-Red_Hat_Developer_Toolset-Features.html), and although it does compile C++11, there must be _some_ reason why it was necessary to have the new ABI. I want to understand what we might be missing by using an older approach at C++11.

My gut says it's a good idea to stick with Red Hat's compiler toolset. They have vastly more experience and a much larger user base. The only flaw here is that we're then at their mercy in terms of being able to update in the future. Decoupling the compiler from the OS version is a very good thing, and I put a lot of weight into that consideration because Continuum has been stuck with very old GCC versions for this reason (and because no one explored the devtoolset route earlier). I think it is worth our while to pick up these skills. Use Red Hat as an example, but let's maintain this capability.

Finally, there's already significant momentum at Continuum using the new image. The Linux build workers for Anaconda.org are using my image (building on top of it). Some groups are starting to use it as well - the ones I know of are DyND and the Anaconda group (myself and @groutr). For me to go back and say "oops, this older way was actually right" - I need to know that the older way is actually right, and I think the reason there needs to be much more than "the custom compiled gcc takes longer to set up." Especially because this all started with me trying to pitch use of the Holy Build Box (https://github.com/phusion/holy-build-box), which is pretty much exactly what you're proposing - and people asked me to pursue a newer compiler. The kinds of faults that I think might force this issue would be if we can or can't find a way to avoid shipping libstdc++ with 5.2 (with whatever partial static linking Red Hat does), or some large flaw in GCC 5.2.

More thoughts soon.

So, this is another reason to consider switching to something newer.

What I've found so far:

  • @mingwandroid recommends not statically linking anything (fortran or otherwise). From his point of view, without a way of knowing what you need to rebuild when you update something, it introduces more work than it saves. This is an important point: we should be recording information about the build environment into our packages. If it's a docker image, perhaps the tag or the hash. We also need to record library versions that we link against (and how - statically/dynamically) - it may help understand why something is going wrong at some point in the future.
  • I'm about 95% sure that the OS-level compatibility grid that @jakirkham posted on the RH site is essentially just stating GLibC compatibility. This is the sticking point with CentOS 5. If we build on any newer platform, GLibC is not backwards compatible.
  • I have not found devtoolset-4 available for CentOS. I see it on their build system: https://cbs.centos.org/koji/buildinfo?buildID=7738, but I can't get a CentOS 6 system to install it with yum from any source. I can't find anything confirming that it has been released.
  • Using the files posted on their build system, I have discovered how they do their partial static linking. It's clever. Download the src rpm from http://cbs.centos.org/kojifiles/packages/devtoolset-4-gcc/5.2.1/2.2.el6/src/devtoolset-4-gcc-5.2.1-2.2.el6.src.rpm and check out gcc.spec. You'll see bits like
echo '/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
%{oformat2}
INPUT ( %{?scl:%{_root_prefix}}%{!?scl:%{_prefix}}/lib/libstdc++.so.6 -lstdc++_nonshared )' > 32/libstdc++.so

The nonshared libraries come from stuff they add in their patches. See the gcc5-libstdc++-compat.patch file in that src rpm. This is stuff that is common to all devtoolsets, AFAICT.

We can totally replicate that behavior without using RH's devtoolset. Moreover, we can do so with arbitrary compiler versions, rather than being stuck with whatever devtoolset is current on the lowest possible CentOS we can tolerate. I'm happy to take that on, but I don't want to waste my effort if we're not going to commit to maintaining GCC.

DyND uses C++14, and they have stated their minimum requirement as GCC 4.9 for that reason. If we do not either update to CentOS 6 (which I'd like, but may not be possible due to customer support requirements), or employ the compiled compiler (yo dawg), then DyND will have to use a different toolset from us, and that feels like a major loss here, in terms of defining a standard build platform. They're way ahead of the curve here in terms of employing C++14, but this is also (I hope) a reasonably long-term decision.

@mingwandroid recommends not statically linking anything (fortran or otherwise). From his point of view, without a way of knowing what you need to rebuild when you update something, it introduces more work than it saves.

With other things, I think I would agree. With libgfortran (given the problems we have had and still have), I disagree. Our goal is to have a standard compiler in a standard docker image that we always use. This provides clear expectations of what libgfortran we are using to statically link against. So, I don't see this as a problem in that case.

This is an important point: we should be recording information about the build environment into our packages.

Agreed. Though this starts feel like changes need to go into conda-build to make this work correctly.

If it's a docker image, perhaps the tag or the hash.

Agreed.

Though we should really find a way to version the docker images though. It seems we are providing a fixed system and compiler toolset. So, it should be versionable or are there reasons this would work?

Again this will likely require changes to conda-build to work correctly.

We also need to record library versions that we link against (and how - statically/dynamically) - it may help understand why something is going wrong at some point in the future.

Yes, we want this information. Again conda-build changes are likely required to make this work correctly.

I'm about 95% sure that the OS-level compatibility grid that @jakirkham posted on the RH site is essentially just stating GLibC compatibility. This is the sticking point with CentOS 5. If we build on any newer platform, GLibC is not backwards compatible.

That may be true. I tried to find something more explicit, but did not see anything like this when I had looked.

However, we are already using CentOS 6 and so we already have this issue.

I have not found devtoolset-4 available for CentOS.

They put it in a different location. ( http://buildlogs.centos.org/centos/6/sclo/x86_64/rh/devtoolset-4/ )

...but I can't get a CentOS 6 system to install it with yum from any source.

Not sure why not. Definitely was able to get this to build and Docker Hub can confirm. Did you look at my Dockerfile?

Using the files posted on their build system, I have discovered how they do their partial static linking. It's clever....The nonshared libraries come from stuff they add in their patches. See the gcc5-libstdc++-compat.patch file in that src rpm. This is stuff that is common to all devtoolsets, AFAICT.

Nifty. Would that work for forcing static linking of the system libgfortran too?

We can totally replicate that behavior without using RH's devtoolset. Moreover, we can do so with arbitrary compiler versions, rather than being stuck with whatever devtoolset is current on the lowest possible CentOS we can tolerate.

Are we sure that is all we are missing?

I'm happy to take that on, but I don't want to waste my effort if we're not going to commit to maintaining GCC.

Completely understand. This is why I want to sort this out here.

DyND uses C++14, and they have stated their minimum requirement as GCC 4.9 for that reason. If we do not either update to CentOS 6 (which I'd like, but may not be possible due to customer support requirements), or employ the compiled compiler (yo dawg), then DyND will have to use a different toolset from us, and that feels like a major loss here, in terms of defining a standard build platform. They're way ahead of the curve here in terms of employing C++14, but this is also (I hope) a reasonably long-term decision.

Unfortunately, there are are more issues with adopting CentOS 5 than this one. We basically can never have GPU support ( https://github.com/conda-forge/conda-forge.github.io/issues/63 ) AFAICT. NVIDA only provides support for CentOS 6 and 7 not 5. So, having to go back to CentOS 5 (as we are using CentOS 6 now) is a huge problem IMHO.

To be completely clear, without some hard evidence (stats) as to why the switch to CentOS 5 makes sense I am against it. Even with this information, we may still find ourselves in a situation where we have 2 Docker images because of CentOS 5's limitations. Sorry, to be so strong on this point, but I do hope you understand my reasoning.

@jakirkham I missed your GCC 5 devtoolset image, and saw only your CentOS 5 devtoolset 2 image. Thanks for the example. I wasn't finding it because it only shows up in searches after you do

yum install -y centos-release-scl

I think there are a lot of complicated issues tied up here, and I want to try to untangle them. I see:

  • CentOS version. The variables that this tweaks are: Kernel version and GLibC version. Compilers can be retrofitted, and need not be limited to availability in RPMs here.
  • Compiler version, binutils version, etc. - build tools.
  • Compiler source (RPM vs build from source; if from source, what patches)
  • linking: partially static using Red Hat's patches, or shipping libraries? It's nice to say that it's not necessary to ship libraries, but at the same time, if it makes life much easier to ship them, then conda is well equipped to do so. Are there libraries you know of where having a newer version of the library will break the old one? It seems to me that things are mostly all very backwards compatible, and the real danger in shipping libraries is shipping old ones.

NVidia's last support of CentOS5 was Cuda 6.5: https://developer.nvidia.com/cuda-toolkit-65

Also, I'm not going to be happy with any configuration until everything works - whether it's my attempts at CentOS5, or anything more modern. If GPU stuff doesn't work, then the image is not the image we'll go with. Ultimately if Continuum has to bifurcate its build systems to keep supporting older customers, that's what we'll do, but that complicates the package ecosystem and makes neither Continuum nor conda-forge look good, and will cause some headaches no matter what.

Let's approach this from a list of features that we need, then figure out how to meet those needs. For me:

  • C++14 support (GCC>=4.9)
  • GPU support (NVidia/AMD/Intel Xeon Phi). What versions are necessary to have adequate support and lack of bugs?
  • As much backwards compatibility as possible. GLibC 2.5 would be ideal, but that's a hard pin to CentOS 5, and I don't want to do that.
  • Ability to update compiler and toolsets independently of OS (in the case when kernel upgrades are not necessary to update). If we use devtoolset-4, what are our paths forward if RH ceases to support CentOS 6 for future compilers, similar to how support for CentOS5 stopped with devtoolset-2? What condition(s) triggers motion down that path?

Sorry I am responding to these in reverse order and now it will be posted after another comment, but I think it is valuable for the discussion. Will try to address more recent comments after posting this.

@jakirkham thanks for continuing to work on this. I am going to spend some time this weekend carefully reading the Red Hat articles you linked, and also testing your test program, as well as seeing if there are any other more stringent tests for us to understand our boundaries.

Thanks again for looking into this.

Just to reiterate, I have always seen my proposal to be a partial or intermediate solution. That being said, we do need to do something about this problem. The full change proposed is a bit hard to swallow at present. Personally, just getting rid of the Continuum gcc package and maintaining an equivalent amount of support from the system is a good first step. This seems like something we all want. Let's see if we can find a way to get that.

If possible, I would like to keep GCC 5.2.

Just so that we are clear. We already do not have gcc 5.2 support. We have gcc 4.4 support with the option to install the Continuum gcc package, which provides gcc 4.8. So, this support does not exist now anyways.

That being said, I am willing to skip handling this problem for now as this is a partial fix after all. Forcing 5.2 would make a switch to CentOS 5 with devtools 2.1 harder. Even though I don't like that switch, we need to keep that path open. So, not supporting gcc 5.2 in the first iteration is ok to me.

4.8 is attractive because it is in line with the manylinux effort, and because it is readily obtainable from the devtoolset, but I see little else going for it. It does not support as many modern optimization options (https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/4/html/User_Guide/sect-Red_Hat_Developer_Toolset-Features.html)...

I disagree. It is nice because it has full C++11 support. A sizeable number of packages here rely on either C++0x or outright C++11. Being able to address that alone is huge. Not to mention the Continuum gcc package is 4.8. Keeping consistency with that during the transition is quite nice.

While it would be nice to have full C++14 support, I have yet to see a package proposed here that needs that. True DyND needs this. However, we already can't support C++14 as the Continuum gcc package won't do this either. As stated before, I think constraining this on a first pass is too strong of a constraint.

Some other newer features are always nice, but I think we have already improved the situation drastically if we have 4.8 support in the container. It brings us closer to what we both want even if it is not the full solution and it is a change that is easier to accept.

...although it does compile C++11, there must be _some_ reason why it was necessary to have the new ABI. I want to understand what we might be missing by using an older approach at C++11.

While I would like to understand that more too. I don't find this to be a blocker in keeping a compiler version that is already consistent with one that most packages are built with at present.

My gut says it's a good idea to stick with Red Hat's compiler toolset. They have vastly more experience and a much larger user base.

Agreed.

The only flaw here is that we're then at their mercy in terms of being able to update in the future.

Maybe not so much. Red Hat has been moving to dockerize the whole compiler toolset. I think they understand that there is friction between developers and system administrators that they need to address. While this is probably not the full solution yet and may not be usable by us at this point. We should keep an eye on it and see what we can learn from them.

Decoupling the compiler from the OS version is a very good thing, and I put a lot of weight into that consideration because Continuum has been stuck with very old GCC versions for this reason (and because no one explored the devtoolset route earlier). I think it is worth our while to pick up these skills. Use Red Hat as an example, but let's maintain this capability.

Not sure on this point. The general philosophy thus far has been to use system compilers. That still feels like a sound philosophy. As we have learned from trying to improve the gcc package, we discovered all sorts of specialized OS patches just amongst different versions of Linux. Even after all of this work, I still occasionally have had issues with that compiler and have largely gone back to system compilers where possible. Now certainly part of the problem was that we packaged the compiler at all. That much I think we agree on. It is still not clear to me that maintaining this compiler will not be really painful. This leads me to one of my biggest concerns.

It is still not clear to me that there is a good method for community maintenance of this image. While there does seem to be a concerted effort to keep this open sourced (which I do really appreciated), it still is not practical to build fixes. I would really want to see the following things.

  1. Community members are able to make changes.
  2. A rebuild and push is triggered automatically on merge to update the latest version. (ideally with Docker Hub or some other web-based system)
  3. A transparent versioning scheme.
  4. The source code lives at conda-forge.

I understand these are hard things to get and will require a fair bit of discussion, thought, and effort. This is exactly why I am trying to propose a stopgap that will get us some of what we want now without having to wait for a long time for the ideal fix.

Finally, there's already significant momentum at Continuum using the new image. The Linux build workers for Anaconda.org are using my image (building on top of it). Some groups are starting to use it as well - the ones I know of are DyND and the Anaconda group (myself and @groutr). For me to go back and say "oops, this older way was actually right" - I need to know that the older way is actually right, and I think the reason there needs to be much more than "the custom compiled gcc takes longer to set up."

To be clear, I still don't feel like my proposal is the whole solution. However, the complete solution is still a bit hard to support yet. In other words, I don't think there is an "oops". Though it is perfectly reasonable to put a fair bit of thought into how we proceed. At present, I think a step (even a small and not completely satisfying one) in the right direction is an improvement. We really should embrace that step as it brings us closer (while not all the way) in the direction we need to go.

Especially because this all started with me trying to pitch use of the Holy Build Box (https://github.com/phusion/holy-build-box), which is pretty much exactly what you're proposing - and people asked me to pursue a newer compiler. The kinds of faults that I think might force this issue would be if we can or can't find a way to avoid shipping libstdc++ with 5.2 (with whatever partial static linking Red Hat does), or some large flaw in GCC 5.2.

Hardly. All I am proposing (to restate) is we start using devtoolset-2 in our existing image. The reasons we would want to do this are as follows.

  1. Support C++11 (without new dependencies).
  2. Basically drop the Continuum gcc package.
  3. Basically drop libgcc package.
  4. Provide newer tool chain utilities.
  5. Preserve the existing ABI.
  6. Avoid a complete rebuild for now.

This is a stopgap, of course, I won't deny it. However, it is a simple change that gets us much closer to what we want. It is too hard to make this perfect in one go IMHO. Though if we are willing to take steps in that direction, I believe we can get there.

although it does compile C++11, there must be some reason why it was necessary to have the new ABI

The two changes in the C++11 spec that I've seen cited as driving the ABI breakage are (1) std::string is no longer allowed to be copy-on-write (so this means some operations are slower and some are faster), (2) std::list::size is now required to be O(1) instead of O(n). So neither affects code correctness, only the complexity of different operations (ref).

@msarahan: Also, if you do decide to experiment with bumping Anaconda's minimum required version to RH6, then I at least will be extremely interested in what you find :-). This transition is going to happen soon-ish one way or another, since RH5's final EOL is <12 months away now (see also a RH engineer commenting on this here: https://github.com/pypa/manylinux/issues/46#issuecomment-206702714 -- "EL5 overall is now well into its wind-down days and we are working with folks still running it to move off").

(I guess in your position I would also explore the viability of convincing the dynd folks to cut it out with this whole "writing software that can't be distributed" thing...)

The two changes in the C++11 spec that I've seen cited as driving the ABI breakage are (1) std::string is no longer allowed to be copy-on-write (so this means some operations are slower and some are faster), (2) std::list::size is now required to be O(1) instead of O(n). So neither affects code correctness, only the complexity of different operations (ref).

Thanks for the info, @njsmith. I remembered hearing about some change to std::string, but was fuzzy on the details.

I guess in your position I would also explore the viability of convincing the dynd folks to cut it out with this whole "writing software that can't be distributed" thing...

Just to give you a bit more context, @njsmith, (not knowing what you may or may not have read) the proposal that @msarahan is putting forth builds gcc 5.2.1 in a docker container with CentOS 5.11. This would allow C++14 to be supported and would allow compatibility with an old glibc; however, it comes at the cost of having a docker image that can be easily maintained.

builds gcc 5.2.1 in a docker container with CentOS 5.11.

Oh, I see -- so you'd still be able to use the system glibc, but give up on using the system libgcc, libstdc++, libgfortran. I guess that works OK if you're willing to ship those and are willing to accept that existing conda environments get broken every time a new GCC release comes out :-/. (I feel like the folks who want to distribute conda manifests as a mechanism for long-term software reproducibility might have opinions on this...)

Red Hat has been moving to dockerize the whole compiler toolset.

This looks good, but I don't think it's going to solve any fundamental problems. GLibC lives inside any given docker container. That is effectively the exact same argument we're having about choosing a particular CentOS version. Compilers will be tied to their product lifecycle, for better or worse.

Support C++11 (without new dependencies).
Basically drop the Continuum gcc package.
Basically drop libgcc package.
Provide newer tool chain utilities.
Preserve the existing ABI.
Avoid a complete rebuild for now.

With the exception of dropping libgcc (better called "gcc runtimes"- libstdc++, libgcc, libgomp, libquadmath, libgfortran), these were my explicit goals with the docker image that I have created. I had not seen how Red Hat's partial static linking worked, and had assumed that taking Julia's route would be best. I still think it might be. It works pretty well - the default ABI is compatible with GCC 4. It avoids a complete rebuild.

Can you explain why build time of the docker image is a concern to you? It is presently layered, so that the GCC compilation is a totally separate image with a totally separate docker file. It could just as easily be a conda package or an RPM. I don't see how this is any different from using Red Hat's package - just a question of who builds it.

Regarding your wishlist:

Community members are able to make changes.

I think a key idea here is that not everyone needs to use exactly the same build image. The base of it probably all needs to be the same (compiler, binutils, etc), but what you put on top is totally free. Also, community members are welcome to make PRs on the Continuum docker recipes. You'd be talking to me on those PRs.

A rebuild and push is triggered automatically on merge to update the latest version. (ideally with Docker Hub or some other web-based system)

You can do this on top of the GCC image readily. That's why I split them up. I'd like to do it with GCC, but as you know, build time is prohibitive. I could potentially set up Jenkins on an internal build server.

A transparent versioning scheme.

My versioning is presently (CentOS version)-(GCC version)-(docker image build number) for the GCC base image, and the same, but one additional build number for the conda layer on top. How would you improve on that? I certainly need to document it.

The source code lives at conda-forge.

I perceive this as some distrust towards Continuum. I'm sorry, I'm not sure what I can do here. Continuum must maintain its own build tools as core infrastructure. I would ask that you show the same trust that you do for Conda and conda-build. After all, the source code for devtoolset-4 doesn't live at conda-forge, does it?

it comes at the cost of having a docker image that can be easily maintained.

Please back this up with evidence - use cases at the very least. If the GCC works well (and it does, for me, and for other Continuum folks), then there is no additional maintenance cost for you, relative to Red Hat packages..

willing to accept that existing conda environments get broken every time a new GCC release comes out :-/

I don't think it's anywhere near that bad. I have found GCC releases to be remarkably backwards compatible. The only recent enormous mess-up I know of in conda was with fortran, and that was because gfortran and libgcc both packaged libgfortran, but were not created from the same source (and thus ended up out of sync). Dumb stuff happened.

The C++11 ABI will be a major break, but this docker image is not intended to make that change (though it can, if anyone wants)

No, backwards compatibility I'm not worried about, the (potential) brokenness I'm talking about is the issue with GCC runtimes not being forward compatible. As soon as the system libstdc++ and friends are _newer_ than the version in a conda image, bad things start happening, specifically for any users who have private compiled code that ends up linking against the system libstdc++ and then executing against the conda version. And specifically I'm pointing out that this may cause some headaches for people who download the conda image attached to a several-year-old paper and try to reproduce the results :-/

My gut feeling is that if for these runtimes whose version is so tightly tied to the compiler version, the options that make sense are for everyone to use the system runtimes + system compiler (or an older compiler), for everyone to use whatever compiler but encapsulate the runtime library so that this choice is only visible to the particular project being built (e.g. by statically linking it, or by renaming the runtime libraries like auditwheel is doing now), or to ship a specific runtime + ship a specific compiler and tell everyone that they have to use the conda gcc rather than their system gcc, so you can upgrade them in sync.

Community members are able to make changes.

I think a key idea here is that not everyone needs to use exactly the same build image. The base of it probably all needs to be the same (compiler, binutils, etc), but what you put on top is totally free. Also, community members are welcome to make PRs on the Continuum docker recipes. You'd be talking to me on those PRs.

...

The source code lives at conda-forge.

I perceive this as some distrust towards Continuum. I'm sorry, I'm not sure what I can do here. Continuum must maintain its own build tools as core infrastructure. I would ask that you show the same trust that you do for Conda and conda-build. After all, the source code for devtoolset-4 doesn't live at conda-forge, does it?

Sorry I won't respond to everything now (need to get some sleep), but I do want to address this before it is badly misinterpreted.

Sorry I made this unclear. My intent was never to indicate mistrust. The first point and the last point are connected in the following way (almost redundantly so). Supposing something horrible happens to the build system and you are on vacation, we need a way to apply a simple hotfix. We may need direct access to the repo to affect this change in a timely manner. I completely agree that nearly all of the time it will be direct interaction with you, which is completely fine with me. It is just that rare instance where there is no one there and we need to do something fast that I am concerned about. Based on our conversation about the Heroku buildpack, it was not clear to me that this would be achievable if it lived at the conda org. If it would be achievable there, then this is completely irrelevant.

I will try to respond to your other points later.

Supposing something horrible happens to the build system and you are on vacation, we need a way to apply a simple hotfix

I can see how conda-forge might want this kind of operational flexibility, and a simple way to do that would be for there to be a conda-forge docker image that's initially just the single line FROM continuum/conda-build-image -- but if you need to later apply hotfixes while waiting for stuff to migrate back upstream then you could add more lines for those :-).

That might be possible. I worry that if the problem is in the gcc build layer (just as a hypothetical a missing configure option). We will at best have a workaround that doesn't use that gcc compiler. This seems to be something we want to avoid as we want to try and use the same compiler.

A bus factor of 1 (me) isn't good for anyone. I will ask whether we can open up administration of some of the conda repositories (I kind of think the continuumio ones won't be open). I think this would benefit everyone, but it is also perceived as a loss of control by some, and I'm not sure how far I'll get.

Thanks for understanding. Sorry for the confusion.

by renaming the runtime libraries like auditwheel is doing now

I'd be totally in favor of this. If it means integrating auditwheel into conda-build, or augmenting auditwheel to work on conda packages, that would be time well spent.

or to ship a specific runtime + ship a specific compiler and tell everyone that they have to use the conda gcc rather than their system gcc, so you can upgrade them in sync.

This is sort of the idea with the docker image, right? We're saying "these are the official build tools" - and it's our responsibility to make sure our build tools stay as new or newer than user systems. I'm well aware of the libstdc++ shadowing problem. It's not intractable. Maybe I'm naive and needlessly hopeful, but I think we can stay on top of it - especially if we can improve the bus factor to be more than just me.

Ultimately, because system libraries can never be completely ruled out with conda, docker (with conda, if you want) will always be a better platform for reproducibility. Still, it is well worth stating the guidelines and limitations to creating pinned environments. If libstdc++ is _ever_ pinned, that's going to break someone at some point, as you point out. If results change because of an update to something like libstdc++, that is also "breaking," but I don't have enough experience to say how likely this is. Is it any different from different users on different systems running the same code linked to different system libraries and getting different outputs? If not, then I'm not sure Conda should be on the hook here, and where people need the utmost reproducibility, point them to docker. I just want to make sure that whatever environments people create with conda will reliably run with conda - now and into the future.

For clarity, I have made my proposal a PR ( https://github.com/pelson/Obvious-CI/pull/61 ) to the existing image. This is being made for the simple purpose of discussion and to better see what my intentions for this intermediate solution are. While it doesn't solve all problems, neither does the current image. This is merely an improvement on the situation. Please share your thoughts and feedback.

Just thinking about the Mac case for a moment, we may want to investigate what Homebrew does here. See this note.

Os-x is pretty good about backward compatibility -- you can do pretty well with up to date XCode.

The trick, however, is building stuff that will run on older machines. The way to do that is to set environment variables something like MACOS-DEPLOYMENT-TARGET and also an SDK setting.

Something likely that -- I'm on. Phone now.

But if you set those two for 10.7, you should be able to build with newer compilers.

( I think Anaconda is 10.7+)

Note that distutils does this for you with Python extensions, but we should probably put this in the recipes for libs.

This is all true. We basically try to do this, but we could probably do better by making some tweaks to Travis.

Though I am thinking more about standardization across platforms. We use gcc on Linux (and plan to continue no matter how any of this shakes out). We are planning on using gcc for Windows via the MinGW-64 toolset ( https://github.com/conda-forge/conda-forge.github.io/issues/112 ). So, maybe we should start thinking about how we can employ similar strategies for gcc's use on Mac. Also, if we are really tied to getting C++14 using the system compiler might just not cut it (when we take into account backwards compatibility constraints).

The issue with Mac is we need a Fortran compiler for some things. Also, we need OpenMP support for some other things. This is just unavoidable. If we use the latest version of XCode we can have OpenMP support, but they we don't really have backwards compatibility that far. We can build our own clang, but that really begs the question why not build a similar compiler that we use on the other platforms. Even if we do this, we will end up needing gcc unless we see llvm fortran soon, which I don't expect to see soon enough to affect us.

A little off topic, but it would be nice to include here as it is another compiler issue (just for Windows though). How do we handle OpenMP support on Windows? Should we package the library associated? How do we do this?

cc @msarahan @patricksnape

cc @gillins

So as far as I know:

It gets a bit confusing because I have all the compilers installed so the files are in weird places. So I have vcomp90.dll in a winsxs package but I have vcomp100.dll and vcomp140.dll in System32 and in the appropriate Visual Studio directories. I also have Pro VS2010 and Pro VS2014 so that could make a difference too.

One thing I know for certain is that (really strangely) VS2010 Express does _not_ support OpenMP compilation whereas VS2014 and VS2008 (with newer 64-bit compilers) do. That makes things pretty awkward. Maybe it is something we can ask the Microsoft guys about - getting a "Python C++ Compiler Tools for Python 3.4" that contains a decent VS2010 compiler.

In terms of redistribution, I assume that if we are OK to distribute all the MSVCRT files we are probably fine to distribute these too since they also come in the redistributable packages. If we can, I think we should also distribute these files.

I'm working on a gcc-6-on-centos-6 docker image at the moment, which I'll put up once I can get it into a shape where it can build all of Julia successfully (which depends on openblas).

-- quote from @tkelman



That sounds really cool, @tkelman. Thanks for sharing.

Interesting. Yeah, I think we are staying on CentOS 6 for present, but it is possible if we find it pressing enough that we would go back to CentOS 5. The current thought is that without more pressing reasons (people clamoring for that level of GLIBC compatibility) we will stay on CentOS 6.

There was a docker container that used CentOS 5 and gcc 5.2 that @msarahan had proposed. Though there are some concerns like having to rebuild everything on the old CentOS. Also there is an issue due to CentOS 5 being less than a year from EOL. There were some other concerns about dependencies, which are kind of up in the air (CUDA, cuDNN, etc.).

I know devtoolsets do interesting things when linking libraries so that things remain portable without needing to package libgcc. Though I kind of like this feature. It seems like you were running into issues with it though. Could you please explain? Do you have any thoughts on setting this up in your docker container?

Having a newer gcc sounds nice. However, I'm not sure what breaks there are in gcc 6 and am only aware of the breaks present in gcc 5. Do you know anything about this?

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

Personally, I would be really interested in being able to share a common framework with Julia (maybe even packages 馃槈). So would really love to discuss this more with you.

The devtoolset does things in a funny way where it is set up to statically link newer pieces of libstdc++ and libgfortran that might not exist on the default centos system compiler versions. We initially tried to use the devtoolset for Julia, but found when building openblas with the devtoolset the openblas shared library doesn't actually end up statically linked to libgfortran. So there's still a dependency on libgfortran which we have to bundle in our binaries, but we don't want to use the system centos libgfortran version as that's too old. So we transitioned to doing something very similar to what the Conda folks are now doing, building our own GCC 5.x from source on CentOS 5. It was ansible based and hooked up to buildbot and I'm now updating/re-doing that in Docker form with 6.x versions.

GCC 6 does break a fair amount of code. I'm mainly looking at it as slight future-proofing since Arch and unstable versions of Fedora and openSUSE are likely to upgrade to GCC 6 soon. Due to the glibc issue described in detail by @njsmith here https://sourceware.org/bugzilla/show_bug.cgi?id=19884, "generic linux binaries" need to be built on the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version (so arch/fedora/non-LTS-ubuntu drives this).

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

...there's still a dependency on libgfortran which we have to bundle in our binaries

Correct. We are aware of this. What do you do to solve this?

Wasn't sure if there were other weird things you noticed.

...but we don't want to use the system centos libgfortran version as that's too old.

If it has all been built with a new gfortran, why does one care about this?

GCC 6 does break a fair amount of code.

Do you have any examples.

the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version

I see so it is just the mad race to stay newer while supplying old GLIBC support. That makes sense.

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

Would be interesting to see what you discover.

Yeah, I've had so many issues with Docker Hub that I might just want to use quay if for no other reason than it is a little bit more stable.

Also, there is a similar story with OpenMP as with Fortran when using devtoolsets, if you haven't encountered that yet.

Some of the linking to system libgfortran with devtoolset might be resolvable with openblas makefile patches to remove things like hardcoded -lgfortran, but we didn't look too far into that.

If it has all been built with a new gfortran, why does one care about this?

For the standalone Julia binaries to work on systems that might not have libgfortran installed, we need to bundle a libgfortran. The devtoolset doesn't include its own separate modern shared-library version of libgfortran. But when you build gcc from source in the normal way, you will get a shared libgfortran that you can use just fine. So that's what we do. If any users try building C/C++/Fortran libraries with newer compiler versions than what we used to build Julia, they'll need to delete or rename the runtime libraries that we bundle in the Julia binaries in order to call into them from Julia.

Do you have any examples.

https://github.com/JuliaLang/julia/pull/14829 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550

Here's my WIP so far: https://github.com/tkelman/c6g6/blob/master/Dockerfile

@tkelman: I'm curious whetherv you've considered handling the system-vs-shipped gfortran issue the same way we are for manylinux builds, by renaming it to avoid triggering that glibc bug.

Considering we've had to deal with blas symbol name collisions for some time even from differently named libraries, I'm not sure changing the library file name alone without also renaming all the symbols would fix matters.

@tkelman: ah, right, you'd need to change the name and also clean up RTLD_GLOBAL usage. But those two things together should work, I think...

AFAIK the stuff to make Linux work without RTLD_GLOBAL should be equivalent to the stuff needed to make Windows and osx work at all, since they don't support elf's weird symbol collision semantics in the first place.

I'm still not entirely sure what visibility the automatic dlopen when you use Julia's C FFI uses by default on Linux. It might not be global at all unless you specifically call dlopen asking for it.

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out. Might not even need a source build of Julia, you could try downloading our binaries and calling patchelf on them directly as a proof of concept?

Just as an FYI. We now use devtoolset-2 (gcc 4.8.2) in our image.

I have gone through and audited all existing feedstocks to make sure that they only use the gcc package if they build OpenMP or Fortran support. In the future, we will want to address these as well, but a formal plan has not been made at this time.

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

For all new recipes, please only use the gcc package for OpenMP or Fortran code. All other cases should not need this.

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

I've now tried Docker Hub, Quay, Travis, Circle CI, and Shippable all building the same GCC source-build Dockerfile. I might be spoiled by having ssh access to a pretty nice server where the image takes about half an hour to build. Everywhere else I've tried takes long enough that it's hitting ~1hr timeouts on Hub and Travis, and still going for multiple hours on the others. Building and pushing locally isn't the end of the world as this shouldn't need updating too often, but it would be nicer if one of the hosted automated services were fast enough to handle this without a much longer turnaround time.

edit: quay did eventually finish, it just took a really long time

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out.

You can look at the auditwheel source code to see a fully automated script for this, but basically:
1) download and build an up-to-date git snapshot of patchelf (you need something with this fix and this one, neither of which is released yet)
2) rename your .so: mv libgfortran.so.3 libgfortran-${UNIQUE}.so.3
3) tell your .so that it's been renamed: patchelf --set-soname libgfortran-${UNIQUE}.so.3 libgfortran-${UNIQUE}.so.3
4) find all your executables and shared libraries (basically the same ones that you're currently setting the rpath on), and tell the ones that are currently looking for libgfortran.so.3 that they should look for your renamed version instead: patchelf --replace-needed libgfortran.so.3 libgfortran-${UNIQUE}.so.3 some-file.so

Thanks @njsmith. We're actually getting a little off topic here, maybe we should move this to an issue on JuliaLang/julia or one of the gcc-from-source dockerfile repos? In Julia's case there's a really easy workaround for running old Julia binaries on distros with newer gcc, of deleting the bundled runtime libraries so that the system versions get used instead. I'd need to be convinced renaming is worth it and won't break things, since some packages do need to be able to find Julia's libgfortran or libstdc++ for ffi purposes, linking and loading libraries that don't have rpath set right on their own, etc.

I distrust the devtoolset partial static linking approach since I've seen it not work correctly in complicated examples like openblas and other Julia dependencies. The C++ partial static linking had also caused issues, if I remember correctly. On a normal build of gcc -static-libgfortran rarely works correctly (especially if gcc was built with libquadmath support) and if what you want to build is a shared library, the static copies of libstdc++ and libgfortran have to be carefully built with -fPIC. We couldn't get the devtoolset to do the job for Julia.

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

Just as FYI, this is complete.

As this came up at the compiler meeting the other day, I figured I would share it (also posted on gitter). This is an ancient mailing list thread (had to get from archive) on the conditions under which libstc++ and libc++ can be mixed. Also, there is this info from FreeBSD. Also, an SO answer. The take home message is STL objects cannot be shared between a library built with libstdc++ and a library built with libc++. The only exception to this is exceptions, which can be thrown and caught in libraries of either type.

I wouldn't trust anything written prior to gcc 5 to still be relevant on this subject. ABI tags threw an additional wrench into this issue, and have still not been entirely implemented in LLVM. There are various patches floating around that I think Arch and a few others have been using, but nothing merged and released yet AFAIK.

Sure gcc 5 is different. Unfortunately, when it comes to Mac, we have been using gcc 4.8.5. So, it remains relevant here until we get a newer compiler.

Let's close this and re-discuss once we have a gcc package.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

croth1 picture croth1  路  5Comments

dharhas picture dharhas  路  3Comments

basnijholt picture basnijholt  路  4Comments

jakirkham picture jakirkham  路  5Comments

jakirkham picture jakirkham  路  4Comments