Apex: Segmentation fault

Created on 10 Apr 2019  路  3Comments  路  Source: NVIDIA/apex

Hi, the problem Segmentation fault occurs when I use apex for mixed precision training, the test script is as follows:

import torch
import apex

input = torch.rand(3, 10).cuda()
fln = apex.normalization.FusedLayerNorm(10).cuda()
fln(input)

Then I refer to the answer to issue #156 and try to solve this problem. When I execute the following commands in turn. Environment: CUDA 10.0.30, torch-1.0.1.post2-cp37

$ pip uninstall apex
$ cd apex_repo_dir
$ rm-rf build
$ python setup.py install --cuda_ext --cpp_ext

It does not work for me, looking forward to your reply.

extension build

Most helpful comment

Thank you for your reply.
I don't know why it still has the above problems in the environment of gcc 4.8.5. However, when I upgrade the gcc environment to 4.9.2, re-clone the repository and follow your commands, it works for me.

All 3 comments

python setup.py install --cuda_ext --cpp_ext is not the recommended install command anymore. Try

$ pip uninstall apex
$ pip uninstall apex # repeat until you're sure it's gone, different install methods may create redundant installs
$ cd apex_repo_dir
$ rm -rf build # if "build" exists"
$ rm -rf apex.egg-info # if "apex.egg-info" exists
$  pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I updated https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976 to show the new command as well.

Thank you for your reply.
I don't know why it still has the above problems in the environment of gcc 4.8.5. However, when I upgrade the gcc environment to 4.9.2, re-clone the repository and follow your commands, it works for me.

I'm also not sure why gcc 4.8.5 fails, but thank you for the information, I'll remember this if future users have the same issue.

Was this page helpful?
0 / 5 - 0 ratings