Meson: Feature Request: Add NVCC CUDA Compiler

Created on 8 Nov 2016  路  29Comments  路  Source: mesonbuild/meson

Currently, nvcc can approximately be used using either custom_target() or generator()s (see #315).

It would be nice if NVCC were upgraded to the status of a full compiler with id 'nvcc', language 'cuda' and file extentions cu, ptx on top of the usual C/C++ extensions.

NVCC these days is a tool that separates CPU and GPU code, then compiles the former with a host C/C++ compiler and the latter with an internal, proprietary compiler. The GPU binary is finally embedded within the host binary.

compilers enhancement

Most helpful comment

I noticed this and filed NVIDIA internal enhancement request 1900600 to track it.

All 29 comments

I think we'd be happy to have NVCC support in Meson. Would you be willing to work on this? It's not very difficult to add support for a new programming language in Meson and we've had people who are totally new to the codebase jump in and do so. You can start at mesonbuild/compilers.py; the base classes Compiler and CCompiler would be good places.

@nirbheek There's the beginnings of some work here. Unfortunately this is looking harder than I'd expected, since:

  • There's a ton of methods to implement, almost none of which are doc'ed and some of which might have to return different things depending on the underlying host compiler nvcc selects.
  • nvcc doesn't implement good Makefile depfile generation. A fuller explanation is in the source, but basically NVCC insists on doing depfile generation and compilation in two steps, doesn't support -MMD, -MQ or -MF but does support the inferior -MT, and there seems to be no easy way to square this with Ninja's needs (e.g. not possible to run two command for one rule, and nvcc prints its dependencies lists to stdout and not to a file).

At any rate the hack I've got up there actually manages to compile CUDA code for my current needs but is nowhere close to ready for primetime. It supports:

  • The language 'cuda', by finding nvcc and extracting its .version()
  • cuda_args: ...
  • Compiling .cu files and linking them.

Anyways there are quite a few more random places than just mesonbuild/compilers.py I've had to touch already and I've no idea if there are more. I also frequently see quite repetitive code of the sort if langname_foo: langname_bar[] elif: langname2: langname2_bar[]; That stuff could in general be automated by computing the names from the language name, or perhaps factoring that into a class hierarchy.

@nirbheek Can I get some guidance on what to do next...?

Apologies, I've been busy working on the 0.36 release. There might be more places where you need to change things to have it work. My advice is to see the existing test cases for C-based support etc and try to see how it translates to use-cases in CUDA and write tests for those. This is a better approach (IMO) than implementing all the compiler methods from the get-go.

As for depfile generation, I'm not sure what you can do there; maybe @jpakkane will be able to comment when he has time (maybe after the release). But this is a common problem, and I'm sure we can sort it out.

We'd also be grateful if you could add comments for what methods are undocumented in a separate PR. I've been trying to add documentation myself but I'm now at a point where I understand the codebase well enough that I have a blind spot regarding what new contributors would be confused about.

@nirbheek Good to know that 0.36.0 is due soon, it has the reproducibility changes I care about.

I've added a bugfix and a testcase. If the test machine has the CUDA Toolkit installed it will run the CUDA testcase, and if it has a GPU it should pass.

The undocumented stuff is in general of two sorts:

  • _Precisely_ what all methods of Compiler() and subclasses should return. E.g., What exact effect should the returned flags achieve? When I return a flag, should it be the standardized Unix-like flag (-L, -l, -I) or not?
  • Methods that are not part of Compiler(), but for some reason are expected by other Meson code. Things like get_buildtype_args(), get_buildtype_linker_args(), get_dependency_gen_args(), thread_flags() and thread_link_flags().

What is the current status of the nvcc compiler support?

@adamryczkowski I've left my work on this a bit to the side. However, I've just rebased my clone's cudalang branch changes atop the Meson master branch's tip.

The biggest problem in my opinion is that NVCC has terrible depfile generation, meaning they always get rebuilt as if they're always changing. I need some help here.

I noticed this and filed NVIDIA internal enhancement request 1900600 to track it.

If this message gets inside the NVidia bastion, please fix the dependency tracking via -MMD, -MQ or -MF (as discussed above) working. That makes integration _so_ much easier.

I requested those options specifically in the enhancement bug, yes. :)

I'm not on the team that handles the compiler front-end, though, so I can't promise my bug report will get any traction.

@aaronp24 Any news about the internal enhancement request?

It seems like my use of raw CUDA is cyclical, and once again I want to use both it and the world's best build system, but I still have to resort to hacks in 2018.

If you can point that out to the compiler team, since last year Meson has gained real traction, especially with the Linux graphics stack and GNOME. Meson isn't going away; There's real value in having support for it, including proper depfile generation.

Thanks for the reminder, @obilaniu. It looks like my bug didn't get any traction, so I'm pinging people to try to get it back on their radar.

It is possible to compile CUDA with clang since version 3.9. It would be nice if the CUDA support in Meson was not bound just to nvcc, like it used to be in Cmake before 3.8.

@obilaniu Would the options -M, -MM and -MF be sufficient in nvcc to cover your use case?

@t-lutz I suppose it would make things easier. With the recent evolution of Meson I'd perhaps rethink a bit how to include CUDA support in it. Of late I've considered an alternative, adding a python-like importable module called cuda. It could look like this:

cudamod = import('cuda')
cudainst = cudamod.find_installation(['path/to/nvcc?'], version: ['>=9.0', '<=9.2'])
cuda_files = files('kernels.cu')
cuda_objs = cudainst.build(cuda_files, arch: '3.0 3.5 5.2(PTX)')

I haven't really fleshed this out in my mind though. The cuda_objs will not link properly unless the linking step is done by a C++ compiler (like g++; gcc will not work). My "solution" in my C projects is to add an empty .cpp file, but that's an ugly kludge.

Perhaps a new language & compiler still are a better way to do things. I understand that how Meson receives its dependencies has evolved a bit.

Hi, CUDA code could also be compiled with Clang++ (albeit slightly different) , it would be nice if we can take that into account.

@trivialfis I don't want to rely on that. Clang isn't always installed, but whenever the Toolkit is installed nvcc should be available too. Also, Clang will always lag behind NVCC. For instance, CUDA 10 was just released; Only it will have support for Turing GPUs for the conceivable future.

@obilaniu That's mostly for development's benefit, clang gives much better error messages.

Currently CMake with CUDA language enabled can be configured to use clang, but linking isn't properly handled. I think it's doable to support both nvcc and clang.

If at all possible, CUDA declarations should behave just like any other language, such as:

project('foo', 'cpp', 'cuda')
executable('cudaprog', 'cudaprog.cpp', 'kernels.cu')

Sorry to disturb but I am waiting to port my CUDA project to Meson and the issue seems to have taken more years than I expected.

And even if NVCC had implemented -MM and -MF, the release cycle is too long, and the feature is unlikely to be back-ported to earlier versions, so it will be barely usable for Meson. (For high performance computing research, we usually need to do benchmark on the same version, or we need a specific CUDA version for a modified hardware, so we sometimes can't upgrade.)

May I propose two solutions?

  • Write a wrapper program to call nvcc -M, parse its output, filter out system header files, do the format conversion, and ask Ninja to call the wrapper.

    It might be slower, but compared to NVCC's extreme slow compilation time it's negligible.

  • Call gcc -MM -x c++ -I"${CUDA_HOME}/include" source.cu and pretend it's NVCC. (It works!)

    This works because the dependency walking in current version of CUDA C++ relies only on #include, which can also be parsed by GCC in standard C++ mode. This might change in the future when C++ has modules, but we will surely have better solutions at that time.

@m13253 I like your second solution best, but it's not quite complete. There's extra complexity with NVCC in that it defines custom preprocessor #define's, and under the control of such defines (especially __NVCC__, __CUDACC__ and __CUDA_ARCH__, but others as well) you can end up #include'ing various things. Which exact defines exist also depend on the exact compiler selected, and this of course depends on NVCC's -ccbin.

Hence it would be much easier if NVIDIA did in fact support proper dependency management, and backported it. But as you point out, the chances of them actually doing so seem slim to none, so we're at a practical impasse for perfect dependency tracking. Some sacrifices will be required.

As it stands, my pattern for compiling CUDA code requires that:

  1. CUDA code be in .cpp files (otherwise various things go horribly wrong)
  2. A generator(find_program('nvcc'), ...) process()'es the *.cpp files() into *.o.
  3. A static_library() collects the "pre-dlink" objects *.o together
  4. A custom_target() runs nvcc -dlink to produce one "dlink" object file .o.
  5. A static_library() wraps the "dlink" object
  6. A declare_dependency() is used to link in the two static libraries whenever needed.

I could not combine the object files into one static_library() because of various Meson-breaking dysfunctions when using .extract_all_objects(), link_whole: and all other alternatives I've explored.

I recently got a graphics card that supports Cuda so I can actually debug this meaningfully now. It would be useful if someone could provide me with some simple test projects that actually do something and show up how people are using Cuda in real projects out in the wild.

@jpakkane You can try to build the CUDA samples distributed with CUDA itself.

Someone needs to do a license review first because we need to be able to change and redistribute them and the linked page makes no mention what license they are under.

Note the linked EULA at the bottom. I got to chapter 1.1.2 when it got really icky and infeasible.

Note the linked EULA at the bottom. I got to chapter 1.1.2 when it got really icky and infeasible.

IANAL nor do I play one on TV, but I wonder if this doesn't only apply to the case in 1.1.1(3) where you want to redistribute parts of the SDK "as incorporated in object code format into a software application"?

I think the samples themselves are licensed under a 3-clause BSD license, which is also what the headers in the source files I've looked at say?

Except the simpleD3D12 sample, which was added in the same commit as the link to EULA: https://github.com/NVIDIA/cuda-samples/commit/fcb23487a8a919a88f82bc5b9792e7b30166ccdc#diff-9879d6db96fd29134fc802214163b95a

Closed with #3919 and #4835. Support for Ninja backend on Linux now present, other platforms dubious.

Dependency problems still not resolved, however; This can be the subject of another tracking issue.

Was this page helpful?
0 / 5 - 0 ratings