Dear Developers,
I'm having some issues building alpaka with CUDA.
I am building alpaka in a conda environment which has gcc 7.3 and am using Centos7. I am using OpenMP 4.5 and have tried to use CUDA 9.2 and 10.0. I have no problems building using OpenMP.
I currently run the following
git clone https://github.com/alpaka-group/alpaka.git
cd alpaka
mkdir build && cd build
cmake ..
make
I turn off building with cuda using the cmake flag and have no problems. Are you able to reproduce this error? I've attached the output below.
build.log
Thanks
John
Hi @jcob95,
just for completeness, can you please provide the whole set of commands on how you did set up conda and the conda environment in which you compile? Commands and packages, so that others can try to reproduce this :)
Hi @ax3l ,
I set up conda a while ago so I don't remember exactly, but I've attached the yaml so hopefully it's enough to reproduce the environment so that you can reproduce it.
name: alpaka-env
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_llvm
- afterimage=1.21=h37b8349_1003
- attrs=19.3.0=py_0
- backcall=0.1.0=py_0
- binutils-meta=1.0.4=0
- binutils_impl_linux-64=2.34=h53a641e_0
- binutils_linux-64=2.34=hc952b39_18
- bleach=3.1.4=pyh9f0ad1d_0
- boost=1.72.0=py38h9de70de_0
- boost-cpp=1.72.0=h8e57a91_0
- bzip2=1.0.8=h516909a_2
- c-compiler=1.0.4=h516909a_0
- ca-certificates=2020.4.5.1=hecc5488_0
- cairo=1.16.0=hcf35c78_1003
- certifi=2020.4.5.1=py38h32f6830_0
- cfitsio=3.470=hb60a0a2_2
- cmake=3.17.0=h28c56e5_0
- compilers=1.0.4=0
- cxx-compiler=1.0.4=hc9558a2_0
- cycler=0.10.0=py_2
- davix=0.7.5=hb54d6fb_1
- dbus=1.13.6=he372182_0
- decorator=4.4.2=py_0
- defusedxml=0.6.0=py_0
- entrypoints=0.3=py38h32f6830_1001
- expat=2.2.9=he1b5a44_2
- fftw=3.3.8=nompi_h7f3a6c3_1110
- fontconfig=2.13.1=h86ecdb6_1001
- fortran-compiler=1.0.4=he991be0_0
- freetype=2.10.1=he06d7ca_0
- fribidi=1.0.9=h516909a_0
- ftgl=2.4.0=hc56bac0_0
- gcc_impl_linux-64=7.3.0=hd420e75_5
- gcc_linux-64=7.3.0=h553295d_18
- gdk-pixbuf=2.38.2=h3f25603_3
- gettext=0.19.8.1=hc5be6a0_1002
- gfortran_impl_linux-64=7.3.0=hdf63c60_5
- gfortran_linux-64=7.3.0=h553295d_18
- giflib=5.2.1=h516909a_2
- gl2ps=1.4.0=he06d7ca_0
- glew=2.1.0=he1b5a44_0
- glib=2.64.2=h6f030ca_0
- gobject-introspection=1.58.2=py38h03d966d_1004
- graphite2=1.3.13=he1b5a44_1001
- graphviz=2.42.3=h0511662_0
- gsl=2.6=h294904e_0
- gst-plugins-base=1.14.5=h0935bb2_2
- gstreamer=1.14.5=h36ae1b5_2
- gxx_impl_linux-64=7.3.0=hdf63c60_5
- gxx_linux-64=7.3.0=h553295d_18
- harfbuzz=2.4.0=h9f30f68_3
- icu=64.2=he1b5a44_1
- importlib-metadata=1.6.0=py38h32f6830_0
- importlib_metadata=1.6.0=0
- ipykernel=5.2.1=py38h23f93f0_0
- ipyparallel=6.2.5=py38h32f6830_0
- ipython=7.13.0=py38h32f6830_2
- ipython_genutils=0.2.0=py_1
- jedi=0.17.0=py38h32f6830_0
- jinja2=2.11.2=pyh9f0ad1d_0
- joblib=0.14.1=py_0
- jpeg=9c=h14c3975_1001
- jsonschema=3.2.0=py38h32f6830_1
- jupyter_client=6.1.3=py_0
- jupyter_core=4.6.3=py38h32f6830_1
- kiwisolver=1.2.0=py38hbf85e49_0
- krb5=1.17.1=h2fd8d38_0
- ld_impl_linux-64=2.34=h53a641e_0
- libblas=3.8.0=16_openblas
- libcblas=3.8.0=16_openblas
- libclang=9.0.1=default_hde54327_0
- libcroco=0.6.13=h8d621e5_1
- libcurl=7.69.1=hf7181ac_0
- libcxx=10.0.0=0
- libcxxabi=10.0.0=0
- libedit=3.1.20170329=hf8c457e_1001
- libffi=3.2.1=he1b5a44_1007
- libgcc-ng=9.2.0=h24d8f2e_2
- libgfortran-ng=7.3.0=hdf63c60_5
- libglu=9.0.0=he1b5a44_1001
- libgomp=9.2.0=h24d8f2e_2
- libiconv=1.15=h516909a_1006
- liblapack=3.8.0=16_openblas
- libllvm9=9.0.1=he513fc3_1
- libopenblas=0.3.9=h5ec1e0e_0
- libpng=1.6.37=hed695b0_1
- librsvg=2.48.3=h33a7fed_0
- libsodium=1.0.17=h516909a_0
- libssh2=1.8.2=h22169c7_2
- libstdcxx-ng=9.2.0=hdf63c60_2
- libtiff=4.1.0=hc7e4089_6
- libtool=2.4.6=h14c3975_1002
- libuuid=2.32.1=h14c3975_1000
- libuv=1.34.0=h516909a_0
- libwebp-base=1.1.0=h516909a_3
- libxcb=1.13=h14c3975_1002
- libxkbcommon=0.10.0=he1b5a44_0
- libxml2=2.9.10=hee79883_0
- llvm-openmp=10.0.0=hc9558a2_0
- lz4-c=1.9.2=he1b5a44_0
- markupsafe=1.1.1=py38h1e0a361_1
- matplotlib=3.2.1=0
- matplotlib-base=3.2.1=py38h2af1d28_0
- metakernel=0.24.4=pyh9f0ad1d_0
- mistune=0.8.4=py38h1e0a361_1001
- nbconvert=5.6.1=py38h32f6830_1
- nbformat=5.0.6=py_0
- ncurses=6.1=hf484d3e_1002
- notebook=6.0.3=py38_0
- nspr=4.25=he1b5a44_0
- nss=3.47=he751ad9_0
- numpy=1.18.1=py38h8854b6b_1
- openssl=1.1.1g=h516909a_0
- pandas=1.0.3=py38hcb8c335_1
- pandoc=2.9.2.1=0
- pandocfilters=1.4.2=py_1
- pango=1.42.4=h7062337_4
- parso=0.7.0=pyh9f0ad1d_0
- pcre=8.44=he1b5a44_0
- pexpect=4.8.0=py38h32f6830_1
- pickleshare=0.7.5=py38h32f6830_1001
- pip=20.0.2=py_2
- pixman=0.38.0=h516909a_1003
- portalocker=1.7.0=py38h32f6830_0
- prometheus_client=0.7.1=py_0
- prompt-toolkit=3.0.5=py_0
- pthread-stubs=0.4=h14c3975_1001
- ptyprocess=0.6.0=py_1001
- pygments=2.6.1=py_0
- pyparsing=2.4.7=pyh9f0ad1d_0
- pyqt=5.12.3=py38hcca6a23_1
- pyrsistent=0.16.0=py38h1e0a361_0
- pythia8=8.244=py38h950e882_0
- python=3.8.2=he5300dc_6_cpython
- python-dateutil=2.8.1=py_0
- python_abi=3.8=1_cp38
- pytz=2019.3=py_0
- pyzmq=19.0.0=py38ha71036d_1
- qt=5.12.5=hd8c4c69_1
- readline=8.0=hf8c457e_0
- rhash=1.3.6=h14c3975_1001
- root=6.20.4=py38h0a2fd18_3
- root-binaries=6.20.4=py38h0a2fd18_3
- root-dependencies=6.20.4=py38h0a2fd18_3
- root_base=6.20.4=py38h974813e_3
- scikit-learn=0.22.2.post1=py38hcdab131_0
- scipy=1.4.1=py38h18bccfc_3
- send2trash=1.5.0=py_0
- setuptools=46.1.3=py38h32f6830_0
- six=1.14.0=py_1
- sqlite=3.30.1=hcee41ef_0
- tbb=2020.1=hc9558a2_0
- tbb-devel=2020.1=hc9558a2_0
- terminado=0.8.3=py38h32f6830_1
- testpath=0.4.4=py_0
- tk=8.6.10=hed695b0_0
- tornado=6.0.4=py38h1e0a361_1
- traitlets=4.3.3=py38h32f6830_1
- vdt=0.4.3=he1b5a44_0
- wcwidth=0.1.9=pyh9f0ad1d_0
- webencodings=0.5.1=py_1
- wheel=0.34.2=py_1
- xorg-fixesproto=5.0=h14c3975_1002
- xorg-kbproto=1.0.7=h14c3975_1002
- xorg-libice=1.0.10=h516909a_0
- xorg-libsm=1.2.3=h84519dc_1000
- xorg-libx11=1.6.9=h516909a_0
- xorg-libxau=1.0.9=h14c3975_0
- xorg-libxcursor=1.2.0=h516909a_0
- xorg-libxdmcp=1.1.3=h516909a_0
- xorg-libxext=1.3.4=h516909a_0
- xorg-libxfixes=5.0.3=h516909a_1004
- xorg-libxft=2.3.3=h71203ad_0
- xorg-libxpm=3.5.13=h516909a_0
- xorg-libxrender=0.9.10=h516909a_1002
- xorg-libxt=1.1.5=h516909a_1003
- xorg-renderproto=0.11.1=h14c3975_1002
- xorg-xextproto=7.3.0=h14c3975_1002
- xorg-xproto=7.0.31=h14c3975_1007
- xrootd=4.11.3=py38h84ce106_2
- xxhash=0.7.2=h516909a_0
- xz=5.2.5=h516909a_0
- zeromq=4.3.2=he1b5a44_2
- zipp=3.1.0=py_0
- zlib=1.2.11=h516909a_1006
- zstd=1.4.4=h6597ccf_3
- pip:
- pyqt5-sip==4.19.18
- pyqtwebengine==5.12.1
prefix: /hepgpu5-data1/johncob/condaDir/envs/alpaka-env
If I see it correctly, you are using a CUDA installation of the system and not a conda package. But the GCC is provided by a Conda package. Or am I wrong?
This could be the problem.
Hi @SimeonEhrig
You are correct, I am using a CUDA installation on that system and not a conda package. I was under the impression that it is not possible to get CUDA using conda. The gcc I am using is indeed from conda.
Is it possible to use conda to install CUDA and then compile it with whatever compiler I chose in conda?
Theoretical, it should be possible to install CUDA via conda. There is a conda package: https://anaconda.org/anaconda/cudatoolkit
But in the past, I had some problems with the conda package. My workaround is using the CUDA and GCC version provide by the module environment system of the HPC. But this is a lazy workaround. At the moment, I trying to find out if it possible to compile Alpaka with conda packages.
I found out, that the cudatoolkit just provide libraries to run CUDA applications. It doesn't contains developments tools like the nvcc compiler.
How is CUDA provided on your system?
The CUDA installation is installed in usr/local for each version and contains the nvcc compiler and development tools. I believe this is a driver installation.
I can verify, that the problem is caused by the conda package gxx_linux-64 in combination with CUDA. The error is typical for CUDA, if the host compiler is not correct set. Unfortunately I have no solution for the conda package. Do you have the option to use a modern GCC provided by the host system (the default GCC 4.8.5 of centos 7 is to old)?
This isn't really practical for me on my system without using conda since to build CUDA requires root access. A work around could be to use a container? Do you have singularity or docker containers with alpaka builds?
Yes, we have containers. There is a container recipe generator: https://github.com/ComputationalRadiationPhysics/crp-container/tree/master/Alpaka
The recipe generator is under development and there is no release yet, but it works in general. This means there is no constant testing via CI and images in a registry. I would create and and test a recipe for you. Is a Singularity container with CUDA 10.0 and GCC 7 fine? If you want, I can upload the image to a registry.
@SimeonEhrig
Thanks for the information. The singularity container you suggest is fine, certainly for testing. Uploading to the singularity library would also be helpful
Thanks!
The container is ready. Unfortunately I have some problems with the registry and can not upload the image. Instead I uploaded it to our HZDR owncloud. I have sent you an email with the access data and a curl command so that you can easily download the image on your cluster.
@jcob95 Does the container works?
@jcob95 Does the container works?
@SimeonEhrig my apologies other projects have distracted me from this, I will try the container as soon as I can.
No Problem :-)
Hi, the container worked perfectly with the GPU backend on our machines. Thanks for making the container.
That's great!
Can this issue be closed now?
I think, we solved the problem for @jcob95 and can close the issue.
@alpaka-group/alpaka-developers
Following of this discussion, there was a discussion in a VC, if we want to provide official Alpaka development containers. The result is that containers will probably become more and more important for HPC, but we do not want to provide official containers at the moment. The reason is that we have limited developer resources and currently lack in-house container users. This is not a general rejection of containers, but without need, the priority is currently too low to implement it.
Most helpful comment
I think, we solved the problem for @jcob95 and can close the issue.
@alpaka-group/alpaka-developers
Following of this discussion, there was a discussion in a VC, if we want to provide official Alpaka development containers. The result is that containers will probably become more and more important for HPC, but we do not want to provide official containers at the moment. The reason is that we have limited developer resources and currently lack in-house container users. This is not a general rejection of containers, but without need, the priority is currently too low to implement it.