@trilinos/epetraext
I have a use case for multiplying matrices that may have empty columns (with respect to the domain map). These matrices describe interactions between degrees of freedom in a multiphysics application and the matrix-matrix product is used to construct the final sparsity pattern. In some cases, the interaction may be restricted to a subdomain of the global mesh, which leads to empty columns (which are defined on the full domain of a DoF). Unfortunately, this seems to fail in serial runs (but works in parallel).
Consider the following test:
#include "Epetra_MpiComm.h"
#include "Epetra_FECrsMatrix.h"
#include "EpetraExt_MatrixMatrix.h"
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
Epetra_MpiComm comm(MPI_COMM_WORLD);
Epetra_Map range(1LL, 0, comm);
Epetra_Map domain(2LL, 0, comm);
Epetra_FECrsMatrix A(Copy, range, 0, false);
long long const col = 1;
double const val = 1.0;
A.InsertGlobalValues(0LL, 1, &val, &col);
A.GlobalAssemble(domain, range);
Epetra_FECrsMatrix B(Copy, domain, false);
EpetraExt::MatrixMatrix::Multiply(A, true, A, false, B);
MPI_Finalize();
return 0;
}
This program creates a (distributed) 1x2 matrix, populates the (0,1) entry with a nonzero value, and attempts to compute B = A^T * A. It works fine with 2 MPI ranks, but in a serial run a segfault is encountered on Epetra_Import_Util.cpp:374 with the following stack trace:
Epetra_Import_Util::TUnpackAndCombineIntoCrsArrays<long long> Epetra_Import_Util.cpp:374
Epetra_Import_Util::UnpackAndCombineIntoCrsArrays Epetra_Import_Util.cpp:463
Epetra_CrsMatrix::FusedTransfer<Epetra_Export> Epetra_CrsMatrix.cpp:5030
Epetra_CrsMatrix::FusedExport Epetra_CrsMatrix.cpp:5211
EpetraExt::MatrixMatrix::Tmult_AT_B_newmatrix<long long> EpetraExt_MatrixMatrix_mult_A_B.cpp:1606
EpetraExt::MatrixMatrix::mult_AT_B_newmatrix EpetraExt_MatrixMatrix_mult_A_B.cpp:1626
EpetraExt::MatrixMatrix::TMultiply<long long> EpetraExt_MatrixMatrix.cpp:1270
EpetraExt::MatrixMatrix::Multiply EpetraExt_MatrixMatrix.cpp:1320
main main.cpp:20
The offending line is:
TargetPids[ToRow + j - FromRow] = (SourcePids[Source_colind[j]] != MyPID) ? SourcePids[Source_colind[j]] : -1;
Here, SourcePids is an empty vector, which was supposed to be populated in Epetra_Util::GetPids(*MyImporter,SourcePids,false) called on line Epetra_CrsMatrix:4890, but never was, because the MyImporter's Epetra_Distributor is null (it was never created, presumably because it is not needed for a serial run), and that error was silently ignored. I'm sorry if this makes no sense, I'm not familiar with Epetra internals at all, this is as far as I could trace it.
Does this look like a bug, or am I doing something wrong? Is there an easy fix/workaround I can do in my code?
-DTRILINOS_DIR=... to cmake): epetra_test.zipmpirun -n 2 ./epetra_test, it should run./epetra_test@trilinos/epetra @trilinos/epetraext
I wouldn't consider this function in EpetraExt a general sparse matrix-matrix multiply. It serves the needs of algebraic multigrid (e.g., constructing the coarse-grid operator R * (A * P)). One of the @trilinos/muelu developers might like to comment more.
Hi Mark,
Thank you for your reply. Just FYI, I did the same test with with Tpetra (using Tpetra::MatrixMatrix::Multiply), and it seems to work fine. The project I'm working on interfaces Epetra at the moment, but looks like it might be a good time for us to switch.
Yeah, this is a bug. And we should fix it.