Kratos: Multiple instances of GidPost detected and causing problems.

Created on 11 Dec 2018  路  26Comments  路  Source: KratosMultiphysics/Kratos

I created a customized GidIO for the PfemFluidApplication. It did not work correctly, as detected by @AFranci . These are the symptoms:

  • It runs perfectly in OpenSuse (FullDebug and Release), it also works good in Ubuntu in FullDebug. It fails in Ubuntu Release.
  • When debugging the Ubuntu in Release mode, GidPost creates a Hash Table (static variable), but in the following acess the table is not created yet. All methods possibly destroying the Hash Table are not visited.
  • We checked that tere's only one instance of GidIO or derived classes. Wherever we are (base GidIO or derived GidIO), the adress of the object is the same. msLiveInstances is 1 all the time.
  • Everything points to multiple instances of GidPost.
  • We disabled LTO for the Release mode in Ubuntu and suddenly everything worked nicely.

What could be the problem? Is LTO messing up the linking between libraries? Can we modify some flags to dodge this problem?

Most helpful comment

Hi all!

Lately I've been running some cases with the PFEM Fluid Dynamics App and I'm having some troubles when printing the postprocesses.

@roigcarlo proposed to add some Flush into the derived PfemFluidGidIO but no improvement was made.

  • I've tried to use the GiDIo instead of the PfemFluidGidIO for printing and it works

  • I've modified the GiDIo with the method overriden by the PfemFluidGidIO and it works

The case that does not work is when I want to use the derived class PfemFluidGidIO so maybe in windows is not enough to deactivate the LTO

@AFranci @maceligueta

All 26 comments

Any help is welcome.

The think is that we use LTO because pybind increased the compilation time of Kratos (liking time in fact)

I don't know if can be deactivated in certain apps, do you know if it is possible @roigcarlo ?

Have you tried to modify the CMakelist.txt :

target_link_libraries(KratosPfemFluidDynamicsCore PUBLIC KratosCore KratosDelaunayMeshingCore)

to

target_link_libraries(KratosPfemFluidDynamicsCore PUBLIC KratosDelaunayMeshingCore)

KratosCore is already linked with KratosDelaunayMeshingCore.
I don't know if this can cause some duplicated library linkage or if cmake is smart enough to detect that. I don't know if it solves the problem either.

just to tell that i grepped for "KratosDelaunayMeshingCore" and KratosCore is not linked with it (nor it should be). The inclusion list is the following: (and i agree that this inclusion should not be repeated)

~/Kratos  grep -r KratosDelaunayMeshingCore *
applications/DelaunayMeshingApplication/CMakeLists.txt:add_library(KratosDelaunayMeshingCore SHARED ${KRATOS_DELAUNAY_MESHING_APPLICATION_CORE})
applications/DelaunayMeshingApplication/CMakeLists.txt:target_link_libraries(KratosDelaunayMeshingCore PUBLIC KratosCore ${LIBS})
applications/DelaunayMeshingApplication/CMakeLists.txt:set_target_properties(KratosDelaunayMeshingCore PROPERTIES COMPILE_DEFINITIONS "DELAUNAY_MESHING_APPLICATION=EXPORT,API")
applications/DelaunayMeshingApplication/CMakeLists.txt:target_link_libraries(KratosDelaunayMeshingApplication PRIVATE KratosDelaunayMeshingCore)
applications/DelaunayMeshingApplication/CMakeLists.txt:    cotire(KratosDelaunayMeshingCore)
applications/DelaunayMeshingApplication/CMakeLists.txt:install(TARGETS KratosDelaunayMeshingCore DESTINATION libs )
applications/PfemSolidMechanicsApplication/CMakeLists.txt:target_link_libraries(KratosPfemSolidMechanicsCore PUBLIC KratosCore KratosSolidMechanicsCore KratosDelaunayMeshingCore)
applications/PfemFluidDynamicsApplication/CMakeLists.txt:target_link_libraries(KratosPfemFluidDynamicsCore PUBLIC KratosCore KratosDelaunayMeshingCore)
applications/ContactMechanicsApplication/CMakeLists.txt:target_link_libraries(KratosContactMechanicsCore PUBLIC KratosCore KratosDelaunayMeshingCore)
applications/PfemApplication/CMakeLists.txt:target_link_libraries(KratosPfemCore PUBLIC KratosCore KratosDelaunayMeshingCore KratosSolidMechanicsCore)

I guess @josep-m-carbonell meant that KratosDelaunayMeshingCore links with KratosCore (not the opposite).

@josep-m-carbonell We just tried removing KratosCorefrom
target_link_libraries(KratosPfemFluidDynamicsCore PUBLIC KratosCore KratosDelaunayMeshingCore)
but the problem persists.

It may be a bug in LTO. Is it happening in old gcc or new ones?

LTO may also change the order of initialization of static variables. I would check if it is initialized at the first use

Apparently, @ipouplana observed the anomaly with gcc 7.4.

Sorry, now that I check it, it was with gcc 7.3.0

Are you discard the static initialization issue?

Is there something that we can do with this variable?
https://github.com/KratosMultiphysics/Kratos/blob/93d81864ee402b35e819e69afa9467d61e55080b/external_libraries/gidpost/source/gidpostHash.c#L27
I was thinking of adding inline or noinline or whatever helps making this variable unique.

@pooyan-dadvand How can I know when is the initialization done? I have checked that hashTable is NULL, gets a new value, but later on is printed as NULL again. No method is setting it to NULL (I put prints)...

It should be one per each linking unit (library in our case) which includes this .c file and not depends to inline. And with LTO it should be less likely to be two.

What I'm not sure is if it get's NULL again because of initialization or because some other function is returning NULL.

Would you please check to give it a certain value (like 0xFAFAFAFA) and see if it converts to NULL? Note that you should change the first initialization test to check against your value instead of NULL.

@pooyan-dadvand , I have not tried what you suggested yet.
However, I followed the indications of the @KratosMultiphysics/technical-committee on testing the influence of LTO on the performance of Kratos.
I ran a DEM case 8 times without LTO and 8 times with LTO. The conclusion was 89.18 s with LTO and 89.066 s without LTO. My opinion is that LTO has no remarkable effect on the performance. In order to run the simulations without LTO, I just commented out lines 199 and 202 in this file:
https://github.com/KratosMultiphysics/Kratos/blob/1f0d15366a4b1db674da843eebed593d1280cb90/CMakeLists.txt#L199
Please @roigcarlo confirm that this was the correct spot.
If anyone else could do the same and test their cases with and without LTO, we could finally remove it and avoid those problems it is generating currently.
Referencing @AFranci , @loumalouomega , @AlejandroCornejo , @jcotela , @philbucher . Maybe you guys are interested in doing the same test and confirm that LTO is useless for us...

Yep those were the lines. Also in the sight of the results I would vote to remove LTO ( or in any case make it optional and not default)

for the intel-compiler it is already disabled in any case (see #4092) because it creates random segfaults!

@maceligueta do you also know how much it affects the linking-times?

No, I did not measure that.
Actually, I don't know how to force the linking without re-compiling...

For me on 16 cores machine almost as long as the compilation

With LTO? and without?

with LTO. without, less than a minute.

Hi all!

Lately I've been running some cases with the PFEM Fluid Dynamics App and I'm having some troubles when printing the postprocesses.

@roigcarlo proposed to add some Flush into the derived PfemFluidGidIO but no improvement was made.

  • I've tried to use the GiDIo instead of the PfemFluidGidIO for printing and it works

  • I've modified the GiDIo with the method overriden by the PfemFluidGidIO and it works

The case that does not work is when I want to use the derived class PfemFluidGidIO so maybe in windows is not enough to deactivate the LTO

@AFranci @maceligueta

Thanks Alejandro for debugging it! Just to add that in Linux everything seems to work properly...
Any help is welcome!

It seems like deactivating LTO only worked in Linux. So I wonder what kind o mess is generated in there during linking.

It stresses the hypothesis that somehow we are linking the gidpostlib more than once.

Closing this as it will be fixed through #6212

I suspect that the problem there was different (https://github.com/KratosMultiphysics/Kratos/issues/6210#issuecomment-573713899).
The problem there was a compilation problem, while the problem here was a malfunction at runtime.

Was this page helpful?
0 / 5 - 0 ratings