looks serious:
$ ctest -V -R "mpi/distribute_flux_sparsity_pattern.mpirun=2.release"
UpdateCTestConfiguration from :/home/davydden/libs/dealii-tests/DartConfiguration.tcl
UpdateCTestConfiguration from :/home/davydden/libs/dealii-tests/DartConfiguration.tcl
Test project /home/davydden/libs/dealii-tests
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 6024
Start 6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release
6024: Test command: /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/cmake-3.7.2-qfd24w4jz4yvsvcy5foddwgiwyykqkuq/bin/cmake "-DTRGT=distribute_flux_sparsity_pattern.mpirun2.release.diff" "-DTEST=mpi/distribute_flux_sparsity_pattern.mpirun=2.release" "-DEXPECT=PASSED" "-DBINARY_DIR=/home/davydden/libs/dealii-tests/mpi" "-DGUARD_FILE=/home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/interrupt_guard.cc" "-P" "/home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/share/deal.II/scripts/run_test.cmake"
6024: Test timeout computed to be: 600
6024: make[3]: *** [distribute_flux_sparsity_pattern.release/mpirun=2/output] Error 1
6024: make[2]: *** [CMakeFiles/distribute_flux_sparsity_pattern.mpirun2.release.diff.dir/all] Error 2
6024: make[1]: *** [CMakeFiles/distribute_flux_sparsity_pattern.mpirun2.release.diff.dir/rule] Error 2
6024: make: *** [distribute_flux_sparsity_pattern.mpirun2.release.diff] Error 2
6024: Test mpi/distribute_flux_sparsity_pattern.mpirun=2.release: RUN
6024: =============================== OUTPUT BEGIN ===============================
6024: Built target distribute_flux_sparsity_pattern.release
6024: Generating distribute_flux_sparsity_pattern.release/mpirun=2/output
6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release: BUILD successful.
6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release: RUN failed. ------ Return code 59
6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release: RUN failed. ------ Result: /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/mpirun=2/failing_output
6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release: RUN failed. ------ Partial output:
6024:
6024: mpi/distribute_flux_sparsity_pattern.mpirun=2.release: RUN failed. ------ Additional output on stdout/stderr:
6024:
6024: [0]PETSC ERROR: [1]PETSC ERROR: ------------------------------------------------------------------------
6024: [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
debug versions passes.
This happens with [email protected] and [email protected]. For the record, all other 8678 tests but two opencascade related https://github.com/dealii/dealii/issues/3915 pass!
the test also fails with a single mpi core.
some valgrind output
$ valgrind --tool=memcheck ./distribute_flux_sparsity_pattern.release
==20695== Memcheck, a memory error detector
==20695== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==20695== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==20695== Command: ./distribute_flux_sparsity_pattern.release
==20695==
==20695== Invalid read of size 8
==20695== at 0x4272F2: LinearAdvectionTest::AdvectionProblem<2>::assemble_system() (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695== by 0x41663E: main (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695== Address 0x39c16608 is 8 bytes before a block of size 8 alloc'd
==20695== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20695== by 0xA4C0815: void std::vector<dealii::internal::Triangulation::TriaLevel<2>*, std::allocator<dealii::internal::Triangulation::TriaLevel<2>*> >::_M_emplace_back_aux<dealii::internal::Triangulation::TriaLevel<2>*>(dealii::internal::Triangulation::TriaLevel<2>*&&) (in /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/lib/libdeal_II.so.8.5.0-pre)
==20695== by 0xA4D7A21: void dealii::internal::Triangulation::Implementation::create_triangulation<2>(std::vector<dealii::Point<2, double>, std::allocator<dealii::Point<2, double> > > const&, std::vector<dealii::CellData<2>, std::allocator<dealii::CellData<2> > > const&, dealii::SubCellData const&, dealii::Triangulation<2, 2>&) (in /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/lib/libdeal_II.so.8.5.0-pre)
==20695== by 0xA50D4A4: dealii::Triangulation<2, 2>::create_triangulation(std::vector<dealii::Point<2, double>, std::allocator<dealii::Point<2, double> > > const&, std::vector<dealii::CellData<2>, std::allocator<dealii::CellData<2> > > const&, dealii::SubCellData const&) (in /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/lib/libdeal_II.so.8.5.0-pre)
==20695== by 0xA825BD7: dealii::parallel::distributed::Triangulation<2, 2>::create_triangulation(std::vector<dealii::Point<2, double>, std::allocator<dealii::Point<2, double> > > const&, std::vector<dealii::CellData<2>, std::allocator<dealii::CellData<2> > > const&, dealii::SubCellData const&) (in /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/lib/libdeal_II.so.8.5.0-pre)
==20695== by 0xA1E85B5: void dealii::GridGenerator::subdivided_hyper_rectangle<2, 2>(dealii::Triangulation<2, 2>&, std::vector<unsigned int, std::allocator<unsigned int> > const&, dealii::Point<2, double> const&, dealii::Point<2, double> const&, bool) (in /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/lib/libdeal_II.so.8.5.0-pre)
==20695== by 0x4265D6: LinearAdvectionTest::AdvectionProblem<2>::AdvectionProblem() (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695== by 0x41662E: main (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695==
==20695== Invalid read of size 8
==20695== at 0x4272F6: LinearAdvectionTest::AdvectionProblem<2>::assemble_system() (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695== by 0x41663E: main (in /home/davydden/libs/dealii-tests/mpi/distribute_flux_sparsity_pattern.release/distribute_flux_sparsity_pattern.release)
==20695== Address 0xf8 is not stack'd, malloc'd or (recently) free'd
==20695==
any hints on how to keep the same level of compiler optimization in Release but add debug info from -g?
Can you run it in a debugger and get a backtrace? You don't need a lot of debug information just to get a backtrace...
since it appears in Release mode, there is not much info from bt:
Thread 1 "distribute_flux" received signal SIGSEGV, Segmentation fault.
0x00000000004272f6 in LinearAdvectionTest::AdvectionProblem<2>::assemble_system() ()
(gdb) bt
#0 0x00000000004272f6 in LinearAdvectionTest::AdvectionProblem<2>::assemble_system() ()
#1 0x000000000041663f in main ()
Hm that's not helpful :-( You can probably just set CXXFLAGS before running cmake again and compiling everything. Or maybe you just recompile the test.
i don't quite know how to append flags (-g or -ggdb) for Release build in deal.II build system.
in detailed.log i see which flags are currently used:
DEAL_II_CXX_FLAGS_RELEASE: -O2 -funroll-loops -funroll-all-loops -fstrict-aliasing -Wno-unused-local-typedefs
but how can I hack in -ggdb into that list?
Set CXXFLAGS, let cmake run, and it should pick up these flags.
I think that you can also set them via ccmake, but I've never really figured that out.
ok, this
cmake -DCMAKE_CXX_FLAGS_RELEASE="-ggdb" ../
resulted in appending flags to the default used by deal.II, will rebuild and re-try now...
we are getting somewhere:
(gdb) bt
#0 dealii::TriaAccessor<2, 2, 2>::has_children (this=<optimized out>)
at /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/include/deal.II/grid/tria_accessor.templates.h:1537
#1 dealii::CellAccessor<2, 2>::active (this=<optimized out>)
at /home/davydden/spack/opt/spack/linux-ubuntu16-x86_64/gcc-6.3.0/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/include/deal.II/grid/tria_accessor.templates.h:3365
#2 LinearAdvectionTest::AdvectionProblem<2>::assemble_system (this=this@entry=0x7fffffffbe90)
at /home/davydden/spack/var/spack/stage/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/dealii/tests/mpi/distribute_flux_sparsity_pattern.cc:250
#3 0x000000000041663f in LinearAdvectionTest::AdvectionProblem<2>::run (this=0x7fffffffbe90)
at /home/davydden/spack/var/spack/stage/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/dealii/tests/mpi/distribute_flux_sparsity_pattern.cc:307
#4 main (argc=<optimized out>, argv=<optimized out>)
at /home/davydden/spack/var/spack/stage/dealii-develop-wrha3kpxcvgg2wd54rjm5dxl6ksdihf3/dealii/tests/mpi/distribute_flux_sparsity_pattern.cc:321
so it is in dealii::TriaAccessor<2, 2, 2>::has_children:
return (this->objects().children[n_sets_of_two * this->present_index] != -1);
with n_sets_of_two=2 and unknown value of present_index:
p this->present_index
value has been optimized out
line 233:
typename DoFHandler<dim>::active_cell_iterator neighbor_cell =
neighbor_cell = current_cell->neighbor(face_n);
this looks suspicious!
@tjhei unfortunately, changing to typename DoFHandler<dim>::cell_iterator did not help.
the print from if (neighbor_cell->active()) step in the test:
p neighbor_cell
$1 = {<dealii::TriaRawIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> >> = {<std::iterator<std::bidirectional_iterator_tag, dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false>, long, dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false>*, dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false>&>> = {<No data fields>},
accessor = {<dealii::DoFAccessor<2, dealii::DoFHandler<2, 2>, false>> = {<dealii::CellAccessor<2, 2>> = {<dealii::TriaAccessor<2, 2, 2>> = {<dealii::TriaAccessorBase<2, 2, 2>> = {static space_dimension = 2, static dimension = 2, static structure_dimension = 2, present_level = -1, present_index = -1, tria =
0x7fffffffbe90}, <No data fields>}, <No data fields>}, static dimension = 2, static space_dimension = 2, dof_handler = 0x7fffffffc648}, static dim = <optimized out>,
static spacedim = <optimized out>}}, <No data fields>}
Note that present_index=-1, which explains why it fails above. So the question is why is it so in the first place?
@tjhei -- I must be dense. What exactly looks suspicious?
Just for the record, I suspect that because the test succeeds in debug mode, that it's not something obvious in the source code that breaks the release test. Rather, it may be something where we invoke undefined behavior, or a compiler bug.
I wrote this test. It might be helpful to note that it uses a fairly extreme situation: I think that each processor owns either one or two cells, so we might be hitting some odd assumption about a minimum size.
I think that each processor owns either one or two cells, so we might be hitting some odd assumption about a minimum size.
but the test fails already with a single core MPI run.
I don't know details of cell iterators, but should not we never have a neighbouring cell with present_index=-1 ? I think that's the clue to the issue.
I don't believe that we should ever have a neighbor with that index.
I can reproduce this issue with openmpi but not with mpich. Has anyone else noticed this?
On 02/10/2017 11:57 AM, Denis Davydov wrote:
I don't know details of cell iterators, but should not we never have
a neighbouring cell with |present_index=-1| ? I think that's the clue
to the issue.
Correct. But remember that we also don't have that problem in debug mode.
I must be dense. What exactly looks suspicious?
the assignment of neighbor_cell to itself.
Oh, YOU'RE TOTALLY RIGHT! @davydden, what happens if you remove the extraneous assignment?
the assignment of neighbor_cell to itself.
I also missed that part as well! I thought the problem was with usage of active_cell_iterator instead of cell_iterator because neighbouring cells may not be active.
@davydden, what happens if you remove the extraneous assignment?
i don't know, i am away from Ubuntu until Monday :smile:
i don't know, i am away from Ubuntu until Monday
You have got to be kidding keeping us in suspense for the whole weekend! ;-)
You got to be kidding keeping us in suspense for the whole weekend! ;-)
@drwells said he can reproduce it, maybe he can check if this fixes the issue? :smile:
I can take a look later this afternoon (perhaps I should qualify that by saying it is only 15:30 where I am).
Removing that additional assignment fixes it for me.
Removing that additional assignment fixes it for me.
great, please go ahead with the PR which closes this issue. I will check it on Monday as well and re-open in case it's still there (i hope it is not :smile:)