Openssl: fips.so is not self-contained

Created on 5 Feb 2020  路  50Comments  路  Source: openssl/openssl

When we build fips.so we do so with the "-z defs" linker option (on Linux):

https://github.com/openssl/openssl/blob/d8d4e5fb32b3f7d9af28d21ce4c1c46cc1c7614c/Configurations/shared-info.pl#L31-L41

This is supposed to ensure that all symbols are resolved. Without this the link completes successfully even though some symbols are not defined in the resulting .so file. If that happens then I think those symbols get resolved from libcrypto at load time...which obviously would means that fips.so is not self-contained.

This used to work just fine. But it no longer seems to be the case. For example the symbol _bignum_modp_1536_p is defined here:

https://github.com/openssl/openssl/blob/d8d4e5fb32b3f7d9af28d21ce4c1c46cc1c7614c/crypto/bn/bn_dh.c#L630-L632

Since this is guarded with ifndef FIPS_MODE the symbol only exists in libcrypto and not in fips.so. The symbol is referenced from bn_const.c here:

https://github.com/openssl/openssl/blob/d8d4e5fb32b3f7d9af28d21ce4c1c46cc1c7614c/crypto/bn/bn_const.c#L85-L88

This code does get included in fips.so and the reference to _bignum_modp_1536_p is not guarded with FIPS_MODE guards like it should be. This should have resulted in a link failure - but it hasn't.

master help wanted OTC evaluated bug

All 50 comments

The ld manual says this:

       defs
           Report unresolved symbol references from regular object files.
           This is done even if the linker is creating a non-symbolic
           shared library.  This option is the inverse of -z undefs.

So the reference to "regular object files" may mean this doesn't work with ".a" files. Does anyone know what the right flags are to get the same effect with ".a" files?

A perl script to check the symbols could be written if there is no solution?

A perl script to check the symbols could be written if there is no solution?

There's also a half baked idea, that a library name can be used as a virtual container, i.e. when a library is mentioned somewhere, it gets "expanded" on the command line to all the object files it would contain. The idea is to replace this:

DEPEND[something]=libwhatever.a

with:

SOURCE[something]=libwhatever.a

and that would automatically trigger the container view of libwhatever.a

That idea is inspired from CMake's Object Libraries

The flip side of the "virtual container" idea is that we'll end up with really long command lines when building the FIPS module, and we already have a problem there...

I think adding an explicit check that the FIPS provider doesn't require external symbols is prudent even after fixing the linker problem. Something along the lines of:

if [ -n "`nm -g fips.so | grep ' U ' | grep -v GLIBC`" ]; then exit 1; fi

@levitte said to add a test recipe for this

Hmmmm... I currently have some trouble triggering this issue. I tried adding a call to OPENSSL_version_major() in rsa_newctx() (providers/implementations/signature/rsa.c), and here's what I get when linking the FIPS module:

gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O0 -g -DDEBUG_UNUSED -DPEDANTIC -pedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wshadow -Wformat -Wtype-limits -Wundef -Werror -Wmissing-prototypes -Wstrict-prototypes -Wl,-z,defs -Wl,-znodelete -shared -Wl,-Bsymbolic   \
    -o providers/fips.so -Wl,--version-script=providers/fips.ld \
    providers/fips/fips-dso-fipsprov.o \
    providers/fips/fips-dso-self_test.o \
    providers/fips/fips-dso-self_test_kats.o \
    providers/libimplementations.a providers/libcommon.a providers/libfips.a -ldl -pthread 
/usr/bin/ld: providers/libfips.a(libfips-lib-rsa.o): in function `rsa_newctx':
/home/levitte/gitwrk/openssl.net/official/_build/../master/providers/implementations/signature/rsa.c:162: undefined reference to `OPENSSL_version_major'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:22903: providers/fips.so] Error 1
make[1]: *** Waiting for unfinished jobs....
ranlib libcrypto.a || echo Never mind.
make[1]: Leaving directory '/home/levitte/gitwrk/openssl.net/official/_build'
make: *** [Makefile:3127: build_sw] Fel 2

So, errrr, did we somehow fix the problem already without knowing it, or is it a matter of running a more modern gcc? I currently have this version, which I did not more than half a year ago:

$ gcc --version
gcc (Debian 10.2.0-17) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

If this is still an issue, I would like some help figuring out how to trigger it, whether it's some small but intricate code change, or trying my trick with an older gcc...

I think it is just matter of whether --no-undefined is part of the default configuration for the linker options when invoked from gcc.

I don't think that libcrypto is linked into fips.so, which explains the error. I'd ideally like to have an explicit check for symbol imports in fips.so -- only OS libraries are allowed.

This is sufficient:

if [ -n "`nm -g providers/fips.so | grep ' U ' | grep -v GLIBC`" ]
then
    echo "Bad symbols in fips.so"
    exit 1
fi

@t8m says:

I think it is just matter of whether --no-undefined is part of the default configuration for the linker options when invoked from gcc.

That's a synonym for -z defs

@paulidale, your solution works... on select platforms, and in select circumstances, i.e. it has to be done specifically for the FIPS module, can't be generalised.

Another way to handle the same thing is to have a tiny program, not linked with libcrypto, that simply tries to dlopen() fips.so, and possibly check that the expected symbol OSSL_provider_init is present. If none of that fails, we're good to go.

This also gets us to the question, when do we check this? I've started to lean on testing it as part of our test suite. We already have a shlibloadtest, so having a very similar test for loading a module of any kind isn't hard work.

Another way to handle the same thing is to have a tiny program, not linked with libcrypto, that simply tries to dlopen() fips.so, and possibly check that the expected symbol OSSL_provider_init is present. If none of that fails, we're good to go.

Not sure I understand why - Doesn't just calling the fipsinstall app load the provider and run the entry point??

Not sure I understand why - Doesn't just calling the fipsinstall app load the provider and run the entry point??

The problem is when the FIPS module has unresolved OpenSSL symbols. When that's the case, the dynamic loader links them to whatever is already present in the loaded program, i.e. whatever symbols are available in the already linked in libcrypto.

Since openssl is linked with libcrypto, the fipsinstall app isn't useful for this case.

Note that this is specific for the Unix family, as far as I know... and possibly also just a select set of platforms in that family.

Another way to handle the same thing is to have a tiny program, not linked with libcrypto, that simply tries to dlopen() fips.so, and possibly check that the expected symbol OSSL_provider_init is present. If none of that fails, we're good to go.

No, this is not the way to test it. That only OSSL_provider_init is exported is irrelevant here. We need to check that fips.so doesn't have any undefined external references that are ours.

@paulidale, your solution works... on select platforms, and in select circumstances, i.e. it has to be done specifically for the FIPS module, can't be generalised.

Checking that a FIPS provider built for a Unix like operating system doesn't require external symbols would be sufficient -- it's unlikely that a Windows build of the same would gain references to libcrypto or libssl. Ideally, we'd check for every platform but that might be too difficult.

This is very FIPS specific and we don't want it generalised. FIPS mandates that only OS libraries can be called -- therefore we need it to not call our libraries. Currently, the build process ensures this but I'd like an additional verification step just to be sure it doesn't get changed by accident in the future.

This also gets us to the question, _when_ do we check this? I've started to lean on testing it as part of our test suite. We already have a shlibloadtest, so having a very similar test for loading a module of any kind isn't hard work.

I'd see it as either part of the test suite (sanity, shlibload or a new one) or as part of the build process. I've a preference for a build time failure -- one cannot forget to run the build, one can forget to run the tests.

I've a proposed solution in #13507.

@paulidale, we have seen enough strangenesses with certain tools that I would avoid adding such things to the build files. There's enough specialized hackery in there as it is. I don't trust nm to be consistent enough across platforms where the check you suggest might get run.

I think we only need nm working on one platform to have some degree of confidence that our build process isn't broken.

So do both the nm on a subset and a test also?

Apparently, there's something about my apprehension against more platform specific hackery in the build files that doesn't come through

It's coming through loud and clear, I'm not sure how a dlopen() based approach is going to work.

I'm not sure how a dlopen() based approach is going to work.

I'm not sure how it's not. The problem was unresolved libcrypto symbols in the FIPS module, right? If loaded without libcrypto present (and with RTLD_NOW), the load should fail, shouldn't it?

I think the @levitte proposed approach works fine. However it does not detect the case where we inadvertently add -lcrypto to the FIPS module shared library linkage command of course.

However it does not detect the case where we inadvertently add -lcrypto to the FIPS module shared library linkage command of course.

That's not the issue here

My problem was solved with an LD_LIBRARY_PATH.

However it does not detect the case where we inadvertently add -lcrypto to the FIPS module shared library linkage command of course.

This is exactly the problem I worried about. The FIPS provider won't link if it depends on unresolved symbols. To get past the link stage, -lcrypto needs to be included on the FIPS shared module link line. libcrypto will silently load if it was included.

Is building the FIPS provider with an external dependency but not linking against libcrypto possible? Is it easy? Or likely?

Is building the FIPS provider with an external dependency but not linking against libcrypto possible? Is it easy? Or likely?

This is precisely the situation described in this issue.

So if you are worried about -lcrypto showing up in the FIPS library build, do a grep on the Makefile. :)
Put a comment in the fips build.info file.
In #13507 it would help if @levitte by hand added a call into the FIPS source and did a manual negative test.

@richsalz, I tried that yesterday, see https://github.com/openssl/openssl/issues/11020#issuecomment-732829925
I don't know how to reproduce this issue, so it would be nice if someone gave me the magic recipe.

Looking at your comment, it makes sense that the build would fail because -lcrypto is not linked in. Adding that to the LIBS for fips.so would make the build succeed (I believe). Then you can try your test.

Yes, that would make the build succeed, but that isn't the point. The FIPS must never be linked with libcrypto. However, we ended up in a situation where, despite of that, we had a successful build with unresolved libcrypto symbols. That is possible to do on Unix... but I can't reproduce that situation.

I now FIPS must never be linked with libcrypto. I was suggesting that, for a negative test you try it by hand and see what your test recipe does. Otherwise, I don't see that this test as it stands now, without that out-of-band confirmation, does anything useful.

If the FIPS module has been linked with libcrypto when building it, then it will load successfully, of course.

If the FIPS module has been linked with libcrypto when building it, then it will load successfully, of course.

oh? dlopen tracks down all dependent libraries? I didn't know that. What about if it's a shared-lib reference? Or the application static-linked libcrypto or dlopen's it before this library?

If that is truly the case, then I don't see what this test is actually testing. As your comment above shows, if there's an unref'd symbol, the build fails.If someone adds -lcrypto, then according to the comment above it will work.

@richsalz, I would suggest you read the dlopen(3) manual, it explains all the ways symbols are resolved.

And yes, if you link a shared object with -lcrypto, it will end up having a dependency on libcrypto.so. You can see that for yourself with the legacy provider, which is linked exactly that way:

$ ./util/wrap.pl ldd providers/legacy.so 
    linux-vdso.so.1 (0x00007fff163f0000)
    libcrypto.so.3 => util/../util/../libcrypto.so.3 (0x00007f5d66045000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5d65fd5000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5d65e12000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5d65e0d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f5d66519000)

Here's the FIPS module for comparison:

$ ./util/wrap.pl ldd providers/fips.so 
    linux-vdso.so.1 (0x00007ffeda9c0000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007faa96312000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faa9614f000)
    /lib64/ld-linux-x86-64.so.2 (0x00007faa96525000)

One way for checking for undefined symbols is using the RTLD_NOW
option to dlopen(). The default is RTLD_LAZY, which will not
return an error.

I assume that we don't link fips.so to libcrypto. The way to check
what a shared libraries an ELF file is linked to is looking at the
dynamic section's NEEDED entries:
$ readelf -d providers/legacy.so |grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.3]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d providers/fips.so |grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d libcrypto.so.3 |grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]

Note that ldd shows more libraries, it also lists all libraries
those depend on, recursivly.

If dependencies where missing, but one of the other libraries you
link to has the dependency, ld used to add the depenencies, but
recent versions will give an error. For example, you use both
functions from libssl and libcrypto, but only link to libssl, ld
used to add a dependency (NEEDED entry) for libcrypto, but will now
give an error by default.

Since we don't intend to link fips.so to any library, I'm not sure
if we should check those dependencies.

I do wonder if something completely different is happening causing the symptoms noted in the original issue.

This is not the issue we (I) thought it was. In the original report the concern was the symbol _bignum_modp_1536_p which is not included in fips.so. In the report I said the symbol is referred to from BN_get_rfc3526_prime_1536 which is included in fips.so. Well it turns out this function is included in libfips.a which is not quite the same thing. In spite of being in libfips.a that function is never pulled into fips.so, so the missing reference is never noticed.

Making this trial change to fipsprov.c:
```` diff
diff --git a/providers/fips/fipsprov.c b/providers/fips/fipsprov.c
index ffb7510054..5f46d54014 100644
--- a/providers/fips/fipsprov.c
+++ b/providers/fips/fipsprov.c
@@ -496,6 +496,8 @@ int OSSL_provider_init(const OSSL_CORE_HANDLE *handle,
FIPS_GLOBAL *fgbl;
OSSL_LIB_CTX *libctx = NULL;

  • BN_get_rfc3526_prime_1536(NULL);
    +
    if (!ossl_prov_seeding_from_dispatch(in))
    return 0;
    for (; in->function_id != 0; in++) {
    This results in this error during linking:
    /usr/bin/ld: providers/libfips.a(libfips-lib-bn_const.o): in function BN_get_rfc3526_prime_1536': /home/matt/dev/openssl-write/crypto/bn/bn_const.c:87: undefined reference to_bignum_modp_1536_p'
    /usr/bin/ld: /home/matt/dev/openssl-write/crypto/bn/bn_const.c:87: undefined reference to `_bignum_modp_1536_p'
    ````

So, again, I ask what this PR is actually testing. That a program without -lcrypto and load the provider? What is the use of that?

So, again, I ask what this PR is actually testing.

Err, this is an issue, not a PR?

You're right, I'll go copy mjy comment over to #13507

If crypto/bn/bn_const.c ends up in libfips.a, but nothing calls
BN_get_rfc3526_prime_1536(), it's normal that there are references
to _bignum_modp_1536_p in libfips.a but not in fips.so.

So there is no issue?

I think the use of "_" to start the name of a symbol is not
something we should do, those names are reserved at file scope. You
can only use them in a local scope, inside a function.

So there is no issue?

Well, its slightly odd that we include that function in libfips.a, but then never refer to it and it is unusable. But the problem described in this issue does not exist.

Well, its slightly odd that we include that function in libfips.a, but then never refer to it and it is unusable. But the problem described in this issue does not exist.

You don't include a function in a .a file, you include a whole .o
file in a .a file. You can add .o files to a .a file without
needing anything from that .o file. When linking that .a file into
something, everything that's not needed gets removed.

So do we need anything from that file in fips.so?

I'm seeing multiple items that we should be checking for:

  1. That fips.so doesn't link any of our libraries -- readelf -d is the suggested way to catch this on platforms that support it.
  2. That fips.so doesn't have unresolved external references to undefined symbols -- nm is the suggested way to catch these, again on platforms that support it. RTLD_NOW should resolve them. However, there is a concern about what they get resolved by. I think it best, to just not allow any beyond a specific few (libc).
  3. That the public symbol OSSL_provider_init is exported by the module. #13507 is testing this.
  4. That there are no other public symbols exported. Currently, there are no checks for this and I doubt its usefulness. However, the FIPS mandate of a single entry point would imply that this isn't allowed.

A module that passes all four of these tests would meet the FIPS requirements well enough for me.

Currently, fips.so is being built correctly. There is nothing preventing us from saying. _"it's all find and dandy and there is no need for automatic checks"_. Then we'd have to be careful to not let anything slip by the review process. The purpose of these checks is, IMO, to avoid us having to be so careful. That we will make mistakes is a given.

So do we need anything from that file in fips.so?

Yes, we do.

Yes, that would make the build succeed, but that isn't the point. The FIPS must never be linked with libcrypto. However, we ended up in a situation where, despite of that, we had a successful build with unresolved libcrypto symbols. That is possible to do on Unix... but I can't reproduce that situation.

@levitte Passing -Wl,--allow-shlib-undefined when linking the libfips.so should allow you to have undefined symbol from libcrypto without linking it.

Of the items @paulidale line up, I'd say:

  1. Another way is to look at the chain of things the FIPS module depends on is another way. The data necessary for this is in configdata.pm, and we already do figure out that chain when emitting the linking command, so it's not like we need to re-invent anything.
  2. That's what #13507 is meant to do.
  3. Yup.
  4. test/recipes/01-test_symbol_presence.t does this for the shared libraries already. It should be easy to extend it to check the modules as well, given util/providers.num, and if we want to go so far, util/engines.num

@levitte Passing -Wl,--allow-shlib-undefined when linking the libfips.so should allow you to have undefined symbol from libcrypto without linking it.

I understand option is about libraries you're linking too, not the
one being linked. If you're linking libssl.so, linking it to
libcrypto.so, it controls allowing undefined symbols in libcrypto.so.

So for a test you can try to link fips.so into something using
that option.

On February 5th, @mattcaswell said (https://github.com/openssl/openssl/issues/11020#issue-560408735):

This should have resulted in a link failure - but it hasn't.

Yesterday, @mattcaswell said (https://github.com/openssl/openssl/issues/11020#issuecomment-733956605):

This results in this error during linking:
/usr/bin/ld: providers/libfips.a(libfips-lib-bn_const.o): in function `BN_get_rfc3526_prime_1536': /home/matt/dev/openssl-write/crypto/bn/bn_const.c:87: undefined reference to `_bignum_modp_1536_p' /usr/bin/ld: /home/matt/dev/openssl-write/crypto/bn/bn_const.c:87: undefined reference to `_bignum_modp_1536_p'

It's actually a relief, that you can't reproduce the issue either.

I'm actually starting to think that it means the issue is resolved... isn't it? I can't say what fixed it. Was it added FIPS_MODULE guards? Is this reproducible if the guards are removed in some spot? If not, then maybe we should just close this issue and leave it for another rainy day...

Was this page helpful?
0 / 5 - 0 ratings