Julia: `Libdl.dlopen` doesn't find shared libraries anymore

Created on 21 Mar 2018  路  36Comments  路  Source: JuliaLang/julia

Libdl.dlopen doesn't seem to be able to find shared libraries the way it did on 0.6.x

On Julia 0.6.x

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> lib = Libdl.dlopen("libz")
Ptr{Void} @0x00007f47adcb91a0

On Julia 0.7.x-DEV

julia> versioninfo()
Julia Version 0.7.0-DEV.4631
Commit 9a55c8fbc* (2018-03-19 03:59 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:

julia> using Libdl

julia> lib = Libdl.dlopen("libz")
ERROR: could not load library "libz"
libz.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String) at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/Libdl/src/Libdl.jl:99
 [2] top-level scope
 [3] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [4] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> lib = Libdl.dlopen("libz.so")
ERROR: could not load library "libz.so"
libz.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String) at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/Libdl/src/Libdl.jl:99
 [2] top-level scope
 [3] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [4] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> lib = Libdl.dlopen("libz.so.1")
Ptr{Nothing} @0x00007f139153b520

As you can see, it works once I specify the exact name of the shared library, but not before then.

Other Details

  • Operating system: Linux (Fedora 27)
doc needs news

Most helpful comment

Julia is at least twice as convenient as C.

All 36 comments

Yes, you should either (1) use BinDeps (2) specify the name of the correct library version (3) install the libz-dev package. This was changed intentionally to reduce the number of test configurations, since many systems (Windows, Nix, etc.) require you to specify the correct name in full.

Reopening because I think this is the canary in the coal mine and we're going to get a lot of people hitting this problem with no clue what to do and there should be some better way of helping them.

Yeah, I didn't spot anything in NEWS.md about the behaviour change and didn't find anything relevant in a quick search through the PR's and issues. I suspect a lot of package breakage will occur because of this (This example comes from CodecZlib in fact..) and documented alternatives will be very helpful.

Out of curiousity, why does installing the header files (eg. libz-dev / libz-devel in this case) fix this issue?

It seems like many of the examples of ccall from the documentation will probably need to be updated. This first example from Calling C and Fortran Code for instance:

julia> t = ccall((:clock, "libc"), Int32, ())
ERROR: error compiling top-level scope: could not load library "libc"
/usr/bin/../lib64/libc.so: invalid ELF header
Stacktrace:
 [1] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [2] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> t = ccall((:clock, "/usr/lib64/libc.so.6"), Int32, ())
2667541

This is a breaking change and should have a deprecation to let people know what they should do.

I suspect a lot of package breakage will occur because of this

Not very many. CodecZlib is one of the only libraries that doesn't use BinDeps. And it'll force them to fix their package for use on less ubiquitous platforms like NixOS (instead of making the fix conditional on running on Windows). And force us to fix our documentation (while sometimes true that these functions are in a shlib named libc, that's not particularly universal).

This not only needs docs and NEWS, I really think we should not be just breaking this but providing a proper deprecation. We'll see how it goes but I think this just the first of many complaints we'll see.

Reading through the linked PR, I disagree that this should have been merged. Yes, there are reasons for wanting to encourage users to use versioned library names when opening them, but we don't force users to specify a major and minor version number of a Julia package when they load it, we just load the latest available and provide the option (through Pkg.pin() or whatever) to specify if they have the need to restrict what they load. I think the same argument could be made regarding dynamic libraries.

we just load the latest available and provide the option

That's not what the option did on the majority of platforms / configurations. Instead, we had to explain how it worked on some distributions in ideal circumstances (aka, pretty much just zlib), then explain that if that didn't work for whatever reasons, you should have actually just specified the actual version of the library you wanted in the first place, which would have simply worked on all platforms in the first place.

That's also not how package resolution works. We don't just randomly upgrade all of the packages, we explicit list which ones are compatible, and then ensure that the currently visible and enumerable environment is consistent and restricted to the versions that are specified in the local manifest. That's also how our build works now that the linked PR is merged (it creates folders mapping out the currently visible environment and searches that for the requested name).

We don't just randomly upgrade all of the packages

That's exactly what we do in Pkg2 馃槀

@vtjnash could you live with the old behaviour being back in the code base if it was well documented that its a bad approach for maintaining binary dependencies? It seems like it was removed in hopes of encouraging better behaviour from package developers, but maybe some documentation could achieve the same result.

I don't think that writing documentation about how not to do something is particularly advisable. I prefer we don't do it and instead explain up front how to write examples and code that'll work with all platforms.

For package development, I completely agree. Is it unreasonable to expect that people that aren't package developers are going to be using the C FFI though? Its seems strange that the barrier to entry to the C FFI should be higher than compiling a similar C program. Compare the following for instance:

Compiling a C program with libm

$ gcc prog.c -o main -lm

Calling libm from Julia 0.7.x

out = ccall((:sin, "libm.so.6"), Float64, (Float64,), 1.0)

Shouldn't Julia have the same basic level of convenience as GCC?

If we're going to talk about how you would write this in C, why not write it like we're using C and drop the library name. This works on all versions of Julia:

ccall(:sin, Float64, (Float64,), 1.0)

We can add additional entries to the list of libraries that we want to be visible to packages by default (https://github.com/JuliaLang/julia/blob/17e9abf2e2f2189e6353668bc27a491d165856cc/base/Makefile#L175). I initially figured that it's not really necessary to list libraries there that we statically linked against, since it's not necessary (or particularly advisable) to give the actual library name (too much variety across platform). But we can use this build function as a means of normalizing the names across platforms if we decide we want to (something again that the old mapping did incorrectly).

We don't just randomly upgrade all of the packages, we explicit list which ones are compatible, and then ensure that the currently visible and enumerable environment is consistent and restricted to the versions that are specified in the local manifest.

In package loading, I just say using Foo; I don't say using Foo:0.2.3. I argue the using construct is the more direct analogue to dlopen() than anything having to do with Pkg. Pkg is more similar to dpkg dependency lists and whatnot, for which I argue yes absolutely we should enforce as much versioning strictness as possible without making it onerous. But for actual code loading, it doesn't make sense to me to enforce this this strongly. We should provide the option, of course, as it can only help, but making it the default seems extreme to me.

Regarding @nsmith5's point above, there's actually an important difference as GCC is providing compile-time guarantees here; it compiles against version x.y.z of libm, and then encodes which version it was built against into the compiled binary. That dodges a lot of problems that we are trying to solve here in Julia, so I think that's an imperfect argument.

why not write it like we're using C and drop the library name.

Because that doesn't work in the general case? We're not going to symlink a random libfoo onto our library search path to rebuild functionality that we had in Julia. If you didn't like the functionality because it was badly tested, we should have written tests for it.

because it was badly tested

We had a test for it. Not a very good one, but it failed CI pretty frequently anyways, so it was setup internally to return success whenever it failed (specifically, this test https://github.com/JuliaLang/julia/pull/26581/files#diff-bf20429d6316882a26470433941b41c5R204)

We're not going to symlink a random libfoo onto our library search path

Er, are you not the same staticfloat that's building a package for linking a random libfoo into our library search path in preparation for handling this better in conjunction with Pkg3 :).

Reference: the script for testing the installation of an actual libfoo into the library search path: https://github.com/JuliaPackaging/BinaryProvider.jl/tree/e9dd1a8f39ba6ede973165512788cfa374ad7bf6/test/LibFoo.jl/deps

That dodges a lot of problems that we are trying to solve here in Julia, so I think that's an imperfect argument.

Is this an argument against the feature or the implementation? I mention the C compiler because the feature is evident. I can understand if our implementation is currently wanting, but is there some technical reason we won't ever be able to implement it properly?

In this context, isn't "the C compiler + linker + autoconf scripts" == "the BinDeps.jl compiler"? Like the existing meta-build systems for C, I think we've found it's more reliable to run these as a part of the build process, where it is able to run arbitrary user code, cause side-effects, and provide useful debugging information.

No, in this context the feature set would be just "linker + loader". No autoconf. Its not about being cross platform or reliable. Its a brittle approach to using a C library, but it is very simple.

The point I'm making with the C example is that the combined behaviour of the linker, loader and environment (LD_LIBRARY_PATH) hide the details of the library version you're using and where it is on your system. I think that we need the C FFI to have the same feature set. It is the entry point to C programming and it should be the entry point to using the C FFI in Julia.

I think that we need the C FFI to have the same feature set

Why? Does anyone else do this? What "feature set" are we talking about here? If you just want to make this a feature of the REPL, that's an entirely different question.

Hmm, sorry about the lack of clarity. Lets establish some more clear language: Lets say that a feature is some bit of code in the Julia landscape that provides some functionality. In your example the BinDep.jl compiler feature provides the functionality of "C Compiler + Linker + Autoconf Scripts".

I think there is a demand for the functionality of "Linker + Loader" in the C FFI. To be clear about what that functionality is, you provide,

  1. The name of a function
  2. The name of a library

and the "Linker + Loader" find a valid version of that library and makes calls to the member function you specified.

I think the Julia C FFI is the feature that should be providing this functionality. Specifically, ccall and dlopen should provide this functionality. Moreover, ccall and dlopen should behave no better and no worse than the linker and loader when you compile equivalent C code.

Here is a more detailed example of simple C program and equivalent C FFI call in Julia. I've pointed out the functionality I'm talking about in each case.

1.) C Example

foo.h

int foo(int, int);

main.c

#include <foo.h>
void main() {
    int a, b, c;
    b = 1;
    c = 2;
    a = foo(b, c);   // Just specify a function name
    return
}

compile and run

$ gcc main.c -lfoo # <-- Just specify a library name
$ ./a.out  # <-- works because the linker and loader deal with the details

2.) Julia Example

compile and run

julia> c = ccall((:foo, "libfoo"), Int, (Int, Int), 1, 2) # <-- This should work as well. I've specified a function name and library name and I want ccall and dlopen deal with the details.

Does that help clarify?

No, it doesn't. Why does the C example require 2 steps (the header file is extraneous), but the Julia example requires doing it in 1 step to achieve "equivalent functionality".

Julia is at least twice as convenient as C.

While that's a nice thought, it's worth pointing out that the distribution you are choosing to use is going out of its way to make sure that dlopen won't work unless you pass it a fully qualified name. So rather than ganging up against me for not wanting to implement work-arounds for your distribution, maybe you should open an issue with your libc maintainer and package manager and ask them why they have policies against this. By contrast, Apple and Microsoft generally do not try to prohibit this. Although on Windows, this feature was affectionally known as "dll hell", so that may provide a slight hint.

FWIW gcc main.c -lfoo will only work if you have libfoo.so somewhere, in which case ccall((:foo, "libfoo"), ...) will also work in Julia. Anyway I'm not sure the comparison with C is very interesting here since it's notoriously not the most convenient language around, and more importantly it's a static language. This means that the library is resolved at compile time and at runtime the version used for compilation will be loaded. In Julia these two steps happen at the same time, as if you called gcc every time you start the program.

Anyway I'm not sure the comparison with C is very interesting

Ok. So let's compare to Python then. Outside of performance, our competition is not (primarily) C.

https://docs.python.org/3/library/ctypes.html#finding-shared-libraries

On Linux, find_library() tries to run external programs (/sbin/ldconfig, gcc, objdump and ld) to find the library file. It returns the filename of the library file.

@vtjnash I am very sorry. I wouldn't want you to feel like you're being ganged up on. It looks like your opinion in the minority on this topic, but its far from unwelcome in my perspective. In fact, I'd like to know at lot more, because it seems like you have a heap of expertise in this area.

Speaking of which, whats the distribution issue you mentioned? Is this problem only faced by Redhat based distros? I'm happy to try to lobby them for a change if it makes our lives easier.

Having a find_library function that tries its best to find the library path is different from having ccall itself do that though, right? It seems Jameson is against ccall itself trying to do a bunch of magic, and want to offload that to some stdlib / package. This seems similar to how python has done it?

I could definitely get on board with having a find_library function, perhaps in Libdl? Would that satisfy your desire to remove this functionality from ccall, @vtjnash? It seems like it would also provide a fairly straightforward deprecation too 鈥撀爄.e. change ccall with a library name to calling find_library explicitly.

Also on board with that. In fact Libdl already has a find_library function, it just needs some improvement to provide the functionality we've been talking about.

It could also have a strict::Bool=false keyword argument that controls whether to do the numbered .so name search or not.

I would be happy with a find_library() function. (Ironic that I'm working on the counterpart, Sys.which() right now. :P )

The documentation problem that the very first ccall example in the manual on https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/ fails on Ubuntu 18.04 is still acute for Julia 1.0.2:

julia> t = ccall((:clock, "libc"), Int32, ())
ERROR: error compiling top-level scope: could not load library "libc"
/usr/lib/x86_64-linux-gnu/libc.so: invalid ELF header
Was this page helpful?
0 / 5 - 0 ratings

Related issues

felixrehren picture felixrehren  路  3Comments

helgee picture helgee  路  3Comments

StefanKarpinski picture StefanKarpinski  路  3Comments

wilburtownsend picture wilburtownsend  路  3Comments

Keno picture Keno  路  3Comments