Pcl: Ninja build concurrent accesses on Windows PDB files

Created on 4 Sep 2018 · 15Comments · Source: PointCloudLibrary/pcl

Your Environment

Operating System and version: Windows 10
Compiler: Visual Studio Community 2017 - 15.6.27428.2043
PCL Version: 1.8.1-dev

Context

Configuring PCL under Windows 10 with Ninja build and VCPKG toolchain.

Expected Behavior

Configuration and compilation succesfull.

Current Behavior

Configuration is succesfull, but compilation fails due to multiple accesses to PDB files.
_Note that this error does not happen all the times_.
It depend on how PDB file are accessed during compilation.

Code to Reproduce

Configure and compile PCL with Ninja.

Possible Solution

To solve this problem I added /FS compiler option that however should be enabled, by default, setting /MP. See this commit on my PCL fork.
We can also use a guard blcok that enables the option only when Ninja generator is used, but is not strictly necessary.

My guess is that Ninja handles parallel calls to cl.exe and, as a consequence, /MP is (somehow) ignored. As a consequence specifying /FS helps this kind of parallel compilation.

cmake windows

Source

claudiofantacci

All 15 comments

@UnaNancyOwen we're going to need your help here.

Code to Reproduce

Configure and compile PCL with Ninja.

You might want to add more details there because that's exactly what we do on our windows CI script. That one configures and builds without a problem. What's the difference between the way you're building things and the way we do it on the CI?

SergioRAgostinho on 4 Sep 2018

@SergioRAgostinho I'm here to help 👍

I'll have a look at the script and do some comparative tests on my machine to see what's going on.

claudiofantacci on 4 Sep 2018

👍1

My two cents on the problem: Appveyor is running single core and the problem cannot be reproduced.
I'll try to run one test on my fork with Appveyor printing out the number of the core using your ProcessorCount script.

claudiofantacci on 4 Sep 2018

👍1

I'm not using Ninja when creating pre-built binaries. At that time, this problem has not occurred.
On the other hand, Ninja is used in vcpkg. But I have not confirmed this problem yet.
(At long time, I didn't use vcpkg. So, This problem may occur on recently environment.)

I have read the description of this issue by @claudiofantacci and the description of /FS option on MSDN.
I think /FS option should be added when case of Ninja.
(It might make the build time a bit longer. However, That option is relation of debug build, Probably It will not affect CI.)

UnaNancyOwen on 4 Sep 2018

I have read the description of this issue by @claudiofantacci and the description of /FS option on MSDN.
I think /FS option should be added when case of Ninja.

I agree.

It might make the build time a bit longer. However, That option is relation of debug build, Probably It will not affect CI.

AppVeyor uses 2 cores for compilation and yes, the problem is triggered when compiling Debug or RelWithDebInfo. I just activated a Debug build on AppVeyor to see what happens.

claudiofantacci on 4 Sep 2018

Well AppVeyor does not finish the build before the 60 minutes and it is interrupted 😭
Also AppVeyor uses 2 cores and it won't probably be affected by this issue.

I'll try to rerun the build on my Windows machine with the same settings an with 8+ parallel jobs the error should come out.

claudiofantacci on 4 Sep 2018

👍1

I'm able to reproduce the issue with the basic option of AppVeyor configuration plus CUDA and GPU options enabled.

With that one the following kind of errors pop out:

C:/Users/cfantacci/AppData/Local/Temp/tmpxft_00006104_00000000-9_sac_model_1point_plane.compute_61.cudafe1.cpp:
fatal error C1041: cannot open program database 'C:\Users\cfantacci\GitHub\pcl\buildd\cuda\sample_consensus\CMakeFiles\pcl_cuda_sample_consensus.dir\src\vc140.pdb';
if multiple CL.EXE write to the same .PDB file, please use /FS

I'm guessing that something is wrong with /MP and CUDA (?!) and to avoid this problem we can enforce /FS.

claudiofantacci on 5 Sep 2018

Thanks for looking it up. Care to submit a PR with the required changes? Since this is a problem which currently only seems to affect Ninja, I would target that generator specifically.

SergioRAgostinho on 5 Sep 2018

👍1

Thanks for looking it up. Care to submit a PR with the required changes? Since this is a problem which currently only seems to affect Ninja, I would target that generator specifically.

Sure I'll do.

The change will be something like this:

[...]

if(CMAKE_COMPILER_IS_MSVC)
    add_definitions("-DBOOST_ALL_NO_LIB -D_SCL_SECURE_NO_WARNINGS -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -DPCL_ONLY_CORE_POINT_TYPES /bigobj ${SSE_DEFINITIONS}")
    if("${CMAKE_CXX_FLAGS}" STREQUAL "/DWIN32 /D_WINDOWS /W3 /GR /EHsc")

        [all the present code]

        if(CMAKE_GENERATOR EQUALS "Ninja")
            set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /FS")
            set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /FS")
        endif()
    endif()
endif()

[...]

otherwise I can move the new lines outside the if("${CMAKE_CXX_FLAGS}" STREQUAL "/DWIN32 /D_WINDOWS /W3 /GR /EHsc") block that, to my understanding, is to avoid changing any non-default (i.e. user made) flags.

claudiofantacci on 5 Sep 2018

👍1

I'm guessing that something is wrong with /MP and CUDA (?!) and to avoid this problem we can enforce /FS.

Do we also need to filter out /MP?

taketwo on 5 Sep 2018

Do you mind doing me a little test just for the sake of curiosity? In the code you pasted, do you notice how we're adding a number of compiler options to CMAKE_FLAGS. There's actually a command to do it it cmake. add_compile_options. Can you try to use that instead of setting CMAKE_C_FLAGS and CMAKE_CXX_FLAGS?

SergioRAgostinho on 5 Sep 2018

Do we also need to filter out /MP?

I reckon we have to keep it. /MP automatically enable /FS, however, for reasons I cannot understand at this moment, there is a clash in the PDB accesses while compiling CUDA code. I would just keep both.

Do you mind doing me a little test just for the sake of curiosity? In the code you pasted, do you notice how we're adding a number of compiler options to CMAKE_FLAGS. There's actually a command to do it it cmake. add_compile_options. Can you try to use those instead of setting CMAKE_C_FLAGS and CMAKE_CXX_FLAGS?

Sure 👍 Let me have a look and be back to you with some thoughts.

claudiofantacci on 5 Sep 2018

After extensive tests, I regret to inform that we cannot use add_compile_options reliably because such flags are not passed to CUDA_HOST_FLAGS. Infact, FindCUDA.cmake reads CMAKE_C/CXX_FLAGS directly, see here.

Probably flags would be passed to CUDA if CUDA is enabled as a language as described here. This choice however produces mandatory changes in the whole projects and is not yet compatible with all the Generators (e.g. Xcode).

Please note that if you want to check CUDA_HOST_FLAGS you need to add message() within FindCUDA.cmake, not outside. Also I checked and cofirm that /MP is passed to CUDA, but apparently it does nothing because you need an extra /FS to avoid error C1041.

The only reliable way is thus to add the /FS to CMAKE_C/CXX_FLAGS.

I will open a PR asap. If you allow me, since I passed through quite some PCL CMake files, I also uniformed some code style here and there. As a consequence, my PR will have 3 commits, 2 of which can be amended or discarded entirely if you feel my changes are not ok.
From now on, I also suggest to move the conversation into the PR 😄

claudiofantacci on 6 Sep 2018

Probably flags would be passed to CUDA if CUDA is enabled as a language as described here. This choice however produces mandatory changes in the whole projects and is not yet compatible with all the Generators (e.g. Xcode).

Also requires us to have a huge leap in minimum required CMake version, I'd be against this.