Dealii: [8.5] p4est without zlib is broken

Created on 30 Mar 2017  路  28Comments  路  Source: dealii/dealii

While the current 8.5 branch is working on some machines, I now have a setup where pretty much all p4est based tests fail, including the quick_test:

$ mpirun -n 2 ./tests/quick_tests/p4est.debug
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.

Setup:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu Xenial Xerus (development branch)
Release:    16.04
Codename:   xenial

detailed.log:
#        CMAKE_SOURCE_DIR:       /home/bob/deal.ii-candi/tmp/unpack/deal.II-dealii-8.5
#                                (version 8.5.0-rc1, shortrev f756864)
#        CMAKE_CXX_COMPILER:     GNU 5.4.0 on platform Linux x86_64
#            TRILINOS_VERSION = 12.10.1
#            MPI_VERSION = 3.0
#            OMPI_VERSION = 1.0.2
#            P4EST_VERSION = 2.0

It is difficult to debug this because this is just happening inside a docker container, but I know it is happening in tria.refine_global.

Most helpful comment

If I see this correctly, there is

#ifndef P4EST_HAVE_ZLIB
#define P4EST_HAVE_ZLIB 1
#endif

in the install directory p4est_install/include/p4est_config.h can't we use that information to decide if we can use it?

All 28 comments

also happens with p4est 1.1

also happens with mpich instead of OpenMPI 1.10.2. I am out of ideas.

Does this also occur outside of docker, like, e.g., in a VM?

Does this also occur outside of docker, like, e.g., in a VM?

maybe I should try that next.

Works in a VM. Weird, but I guess I'll close and ignore this.

I managed to get a call stack and this is an annoying problem: our p4est test silently crashes if p4est couldn't find zlib during configuration:

#4  0x00007fffef4ce4d8 in sc_abort_collective (
    msg=0x7fffef7d4e50 "Configure did not find a recent enough zlib.  Abort.\n") at /home/ubuntu/libs/tmp/unpack/p4est-1.1/sc/src/sc.c:694
#5  0x00007fffef71ac25 in p4est_checksum (p4est=0x761dc0)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est.c:3175
#6  0x00007fffef72a761 in p4est_partition_given (p4est=0x761dc0, 
    new_num_quadrants_in_proc=0x766e10)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est_algorithms.c:2702
#7  0x00007fffef7194d0 in p4est_partition_ext (p4est=0x761dc0, 
    partition_for_coarsening=1, weight_fn=0x0)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est.c:2700
#8  0x00007ffff603c21f in dealii::parallel::distributed::Triangulation<2, 2>::execute_coarsening_and_refinement (this=0x7fffffffc670)
    at /home/ubuntu/libs/tmp/unpack/deal.II-v8.4.2/source/distributed/tria.cc:3786

Oh, I have had that problem before. Since then I always run make check after installing p4est. That's an easy way to catch it.

Any suggestions on what to do? p4est debug in debug mode will always crash silently when it doesn't find zlib. The configuration step produces a warning but I can not find a way to make the configuration fail by doing something like --with-zlib or something.
Ideas:

  • find a way to crash in a helpful way in debug mode, at least for the quick test
  • modify deal.II to require zlib inside p4est (HAVE_ZLIB in p4est_config.h)
  • modify the setup_p4est.sh script to require zlib or at least fail after the fact

Thoughts?

Wow. That is not what I expected the answer to be. Nice work!

It doesn't look like there is any check for zlib when we don't use VTK compression, so without zlib p4est_checksum crashes unconditionally :(

Would modifying the p4est build shell script to include the --enable-vtk-zlib fix things (i.e., prevent compilation of a copy of p4est that we can't use)?

It doesn't look like there is any check for zlib when we don't use VTK compression, so without zlib p4est_checksum crashes unconditionally :(

Correct and it is always called by partition (in debug mode only though).

Would modifying the p4est build shell script to include the --enable-vtk-zlib fix things (i.e., prevent compilation of a copy of p4est that we can't use)?

Sadly not, it just gives a warning at the end:

VTK compression is enabled, but we did not find a recent zlib.
This is OK if the following does not matter to you:
Calling p4est_vtk_write_file will abort your program.
You can fix this by compiling a working zlib and pointing LIBS to it,

Can't we just run make check in the script? That way we know that if the installation succeeded or not.

It sounds like the reasonable thing to do is just to check DEAL_II_WITH_ZLIB whenever we use p4est.

On 04/05/2017 06:41 AM, David Wells wrote:

Would modifying the p4est build shell script to include the
|--enable-vtk-zlib| fix things (i.e., prevent compilation of a copy of
p4est that we can't use)?

I think I remember that we explicitly switched that off because the testsuite
contains many tests that use p4est's VTK writers and we wanted the result to
be just text (for numdiff) rather than compressed data.

What about we let DEAL_II_WITH_P4EST depend on DEAL_II_WITH_ZLIB and
require a minimum version for DEAL_II_WITH_ZLIB? [1]

[1] After all it is just an ugly hack for our bundled boost...

I managed to get a call stack and this is an annoying problem: our p4est test silently crashes if p4est couldn't find zlib during configuration:

#4  0x00007fffef4ce4d8 in sc_abort_collective (
    msg=0x7fffef7d4e50 "Configure did not find a recent enough zlib.  Abort.\n") at /home/ubuntu/libs/tmp/unpack/p4est-1.1/sc/src/sc.c:694
#5  0x00007fffef71ac25 in p4est_checksum (p4est=0x761dc0)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est.c:3175
#6  0x00007fffef72a761 in p4est_partition_given (p4est=0x761dc0, 
    new_num_quadrants_in_proc=0x766e10)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est_algorithms.c:2702
#7  0x00007fffef7194d0 in p4est_partition_ext (p4est=0x761dc0, 
    partition_for_coarsening=1, weight_fn=0x0)
    at /home/ubuntu/libs/tmp/unpack/p4est-1.1/src/p4est.c:2700
#8  0x00007ffff603c21f in dealii::parallel::distributed::Triangulation<2, 2>::execute_coarsening_and_refinement (this=0x7fffffffc670)
    at /home/ubuntu/libs/tmp/unpack/deal.II-v8.4.2/source/distributed/tria.cc:3786

Can't we just run make check in the script?

I don't understand. What script are you talking about?

It sounds like the reasonable thing to do is just to check DEAL_II_WITH_ZLIB whenever we use p4est.

Sadly that won't work: if you install zlib somewhere and set ZLIB_DIR or point deal.II at it, you can have zlib inside deal.II but not in p4est. I don't even know how to tell p4est's configure where to find zlib...

What about we let DEAL_II_WITH_P4EST depend on DEAL_II_WITH_ZLIB

Won't work, see above.

I don't understand. What script are you talking about?

I am talking about the setup_p4est.sh script. I would think the huge majority of people use it to install p4est and if they don't, that's their responsibility to make sure p4est is installed correctly.

If I see this correctly, there is

#ifndef P4EST_HAVE_ZLIB
#define P4EST_HAVE_ZLIB 1
#endif

in the install directory p4est_install/include/p4est_config.h can't we use that information to decide if we can use it?

I am talking about the setup_p4est.sh script.

But that script can not call the deal.II make test because deal.II is not installed at that point. We could check for P4EST_HAVE_ZLIB in there though. Or are you referring to some p4est make test that I am not aware of?

in the install directory p4est_install/include/p4est_config.h can't we use that information to decide if we can use it?

Yes, that is one way to at least detect it.

Does anybody know how to point p4est to a custom zlib install (if it is not a system libary)?

@tjhei This should be possible to do by setting LIBS and adding more CFLAG options (i.e., -I/path/to/zlib.h). When would zlib not be installed in a standard location?

Yes, that is one way to at least detect it.

Pull request to check this condition incoming. I have to check a couple
of corner cases first.

Pull request to check this condition incoming. I have to check a couple
of corner cases first.

Thanks. I was going to take a stab at it but I like this even better.

Are you adding a compile test that checks for the define or are you parsing the .h?

Are you adding a compile test that checks for the define or are you parsing the .h?

Parsing the .h file

Is that not enough?

Does anybody know how to point p4est to a custom zlib install (if it is not a system libary)?

I am not sure how this works, but in Spack p4est ends up being properly linked against the custom zlib, for example on my mac:

$ otool -L ~/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/p4est-2.0-b45itslpzikkf5qihry63mtshcektsrd/lib/libp4est.dylib
/Users/davydden/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/p4est-2.0-b45itslpzikkf5qihry63mtshcektsrd/lib/libp4est.dylib:
    /Users/davydden/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/p4est-2.0-b45itslpzikkf5qihry63mtshcektsrd/lib/libp4est-2.0.dylib (compatibility version 0.0.0, current version 0.0.0)
    /Users/davydden/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/zlib-1.2.11-6mjl676npv2ostr56efqo5hda6wkdrmr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /Users/davydden/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/openmpi-2.1.0-rh7brts6lzesj46zopjj5rzmkcyiktx7/lib/libmpi.20.dylib (compatibility version 31.0.0, current version 31.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.51.1)

as well as on the Centos7 cluster:

$ ldd spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/p4est-2.0-jj7ikuvugi4eummhe3i7xjddolnrhcjs/lib/libp4est.so
    linux-vdso.so.1 =>  (0x00007ffe6a1d8000)
    libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fb3c605d000)
    libz.so.1 => /home/woody/iwtm/iwtm108/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/zlib-1.2.11-yrijbmajetpxrkyaa4uclsyw62zbhc5m/lib/libz.so.1 (0x00007fb3c5e3e000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fb3c5b3c000)
    libmpi.so.20 => /apps/OpenMPI/2.0.2-gcc//lib/libmpi.so.20 (0x00007fb3c5859000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb3c563c000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fb3c527b000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fb3c6541000)
    libopen-rte.so.20 => /apps/OpenMPI/2.0.2-gcc/lib/libopen-rte.so.20 (0x00007fb3c4ff9000)
    libopen-pal.so.20 => /apps/OpenMPI/2.0.2-gcc/lib/libopen-pal.so.20 (0x00007fb3c4d05000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fb3c4b01000)
    libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fb3c48f5000)
    libpciaccess.so.0 => /lib64/libpciaccess.so.0 (0x00007fb3c46ea000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fb3c44e2000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007fb3c42df000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb3c40c8000)

But we do not indeed explicitly tell p4est where to find zlib, which is not mentioned anywhere in the build log either.

Apart from compiler wrappers which would have -I and -L, prefix to zlib appears in PKG_CONFIG_PATH=/Users/davydden/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/zlib-1.2.11-6mjl676npv2ostr56efqo5hda6wkdrmr/lib/pkgconfig environment variable.
That's my best guess how p4est picks it up.

Is that not enough?

somewhat of a hack but probably good enough.

prefix to zlib appears in PKG_CONFIG_PATH environment variable.
That's my best guess how p4est picks it up.

I tried this manually and it didn't work.

I had to do

CFLAGS=-I/home/bob/deal.ii-candi/zlib-1.2.8/include
LIBS=-L/home/bob/deal.ii-candi/zlib-1.2.8/lib

I had to do

that explains why this works in Spack, where those things are in compiler wrappers.

Was this page helpful?
0 / 5 - 0 ratings