Turicreate: issue running on Google Colab

Created on 27 Jul 2018  路  21Comments  路  Source: apple/turicreate

I was able to run turicreate on CoLab up until recently using some workarounds, lately it is not working.
Environment
google CoLab
Python 3.6
GPU Telsa K80
When I tried to create a model following error was displayed

ERROR: Incomplete installation for leveraging GPUs for computations.
Please make sure you have CUDA installed and run the following line in
your terminal and try again:

pip uninstall -y mxnet && pip install mxnet-cu90==1.1.0

Adjust 'cu90' depending on your CUDA version ('cu75' and 'cu80' are also available).
You can also disable GPU usage altogether by invoking turicreate.config.set_num_gpus(0)
An exception has occurred, use %tb to see the full traceback.

on traceback


MXNetError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/_mxnet_utils.py in get_mxnet_context(max_devices)
64 for ctx_i in ctx:
---> 65 _mx.nd.array([1], ctx=ctx_i)
66

/usr/local/lib/python3.6/dist-packages/mxnet/ndarray/utils.py in array(source_array, ctx, dtype)

/usr/local/lib/python3.6/dist-packages/mxnet/ndarray/ndarray.py in array(source_array, ctx, dtype)

/usr/local/lib/python3.6/dist-packages/mxnet/ndarray/ndarray.py in empty(shape, ctx, dtype)

/usr/local/lib/python3.6/dist-packages/mxnet/ndarray/ndarray.py in _new_alloc_handle(shape, ctx, delay_alloc, dtype)

/usr/local/lib/python3.6/dist-packages/mxnet/base.py in check_call(ret)

MXNetError: [22:40:31] src/storage/storage.cc:118: Compile with USE_CUDA=1 to enable GPU usage

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x192112) [0x7f25fc9c5112]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x192738) [0x7f25fc9c5738]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27c553a) [0x7f25feff853a]
[bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27ca134) [0x7f25feffd134]
[bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27ca607) [0x7f25feffd607]
[bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x22ac511) [0x7f25feadf511]
[bt] (6) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(MXNDArrayCreateEx+0x169) [0x7f25feadfd99]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f26880eee18]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x32a) [0x7f26880ee87a]
[bt] (9) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x4cd) [0x7f268830296d]

During handling of the above exception, another exception occurred:

SystemExit Traceback (most recent call last)
in ()
1
----> 2 model = tc.image_similarity.create(reference_data)

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/image_similarity/image_similarity.py in create(dataset, label, feature, model, verbose, batch_size)
121 feature = _tkutl._find_only_image_column(dataset)
122
--> 123 feature_extractor = _image_feature_extractor._create_feature_extractor(model)
124
125 # Extract features

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/_image_feature_extractor.py in _create_feature_extractor(model_name)
22 if system() != 'Darwin' or _mac_ver() < (10, 13):
23 ptModel = MODELSmodel_name
---> 24 return MXFeatureExtractor(ptModel)
25
26 download_path = _get_model_cache_dir()

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/_image_feature_extractor.py in __init__(self, ptModel)
84 self.feature_layer = ptModel.feature_layer
85 self.image_shape = ptModel.input_image_shape
---> 86 self.context = _mxnet_utils.get_mxnet_context()
87
88 @staticmethod

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/_mxnet_utils.py in get_mxnet_context(max_devices)
77 print("Adjust 'cu90' depending on your CUDA version ('cu75' and 'cu80' are also available).")
78 print('You can also disable GPU usage altogether by invoking turicreate.config.set_num_gpus(0)')
---> 79 _sys.exit(1)
80 return ctx
81

SystemExit: 1

bug image similarity linux gpu need user repro p3 toolkits

Most helpful comment

Works for me: https://medium.com/@nickzamosenchuk/training-the-model-for-ios-coreml-in-google-colab-60-times-faster-6b3d1669fc46
You need to uninstall Cuda9 (with lots of packages), add Cuda8 repo, install cuda8, but a couple of packages will need manual installation. Then installing mx-net for Cuda8.
Feel free to ping me if you need any help.

(upd 11.Nov) The code is now properly formatted in the post and has a couple of minor improvements.

All 21 comments

@johnyquest7 - It looks like the CUDA version installed on the system is not CUDA9. Installing mxnet-cu90 will only work if a version of CUDA 9 is installed on the system.

You'll need to figure out which CUDA version is installed and install the corresponding mxnet-cuXX package, where XX is the version.

The next likely version of CUDA is 8. So you could just try pip install mxnet-cu80==1.1.0 and see if that works.

Tried installing CUDA 8, it did not work
Could be due to the way CUDA is installed in google colab

@johnyquest7 What version of CUDA is installed in Google Colab? Where is that CUDA installation located on disk? Note that you'll need to install the corresponding version of mxnet, so if you have CUDA 9.1, you'll need to use mxnet-cu91 specifically.

Unable to find cuda version using standard commands :-(

@johnyquest7 Can you try either nvcc --version or type /usr/local/cuda (the most common installation destination) and hit tab. Either it might be /usr/local/cuda-8.0, in which case the version is clear. If there is no dash after cuda, then look at /usr/local/cuda/version.txt.

@gustavla
nvcc --version
returned

/bin/sh: 1: nvcc: not found

under /usr/local there is no cuda directory

Unfortunately I could not find a reference for the location of Cuda directory in Google Colab

@johnyquest7 Are you sure then that CUDA is installed? Since it can't find nvcc, it means at least your PATH is not pointing to the CUDA location.

If CoLab installs it in a different path, perhaps it also sets an environment variable describing where it is. Check:

env | grep CUDA

You can also try something like this to see if if the CUDA include directory is being picked up:

echo '#include <cuda.h>' | cpp -H -o /dev/null 2>&1 | head -n1

If neither of those work, it looks from our end that CUDA is not installed. As a sanity check you should also check if you can run the command nvidia-smi. However, this command comes with your NVIDIA drivers and not with CUDA, but it will be a pre-requisite for installing CUDA, so that command should be available at least.

I was able to use turicreate in the past and use GPU. So I am assuming that CUDA is installed. Tensor flow is able to use the GPU in CoLab.
Tried this
echo '#include <cuda.h>' | cpp -H -o /dev/null 2>&1 | head -n1
Output was
stdin>:1:10: fatal error: cuda.h: No such file or directory

Tried
nvidia-smi
output was
Wed Aug 1 01:47:38 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 31C P8 27W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Since this is not resolved, I have tried to use my laptop. Surprisingly it was fast even without GPU - took only 3 minutes!
Thanks you all for looking into this.

@johnyquest7 Is this issue still able to repro for you? If so, can you try looking in /usr/local to see if there may be a hint as to the installed CUDA version there?

@znation
Following is a Colab notebook where I tried to look under different folders.
https://colab.research.google.com/drive/12uvYRM04mdrI12B0zAtsTfvtRqDes0jE

@johnyquest7 From that notebook, I don't see any evidence that CUDA is installed at all. Are you sure it is?

TuriCreate used to work in CoLab and was able to use GPU. TensorFlow is able to use GPU in CoLab. Not sure whether it is in a docker container.

@johnyquest7 I see - can you confirm that TensorFlow with GPU still works in the same environment that you're trying to run Turi Create in? Please follow the instructions here and paste the output from the first cell. From my reading, you may need to change some settings in the notebook UI (something like Runtime -> Change or Accelerator -> GPU). Those settings may also affect whether CUDA is present in the environment for other purposes, so make sure that's enabled for Turi Create, and when looking on the filesystem for the presence of CUDA.

@znation GPU is enabled in my notebooks.
Output for the first cell was

Found GPU at: /device:GPU:0
According to CoLab documentation this means GPU is available.

I also ran the second cell to confirm and following was the result

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
9.67492699623
GPU (s):
0.87723493576
GPU speedup over CPU: 11x

There's a relevant thread on Stackoverflow: https://stackoverflow.com/questions/50560395/how-to-install-cuda-in-google-colab-gpus/51029933#51029933

Perhaps start with the second response: https://stackoverflow.com/a/51029933

@johnyquest7 Can you give this a try? Let us know how it goes.

Initially when I encountered the problem, I tried all the solutions mentioned in that stackoverflow thread. Nothing worked for me.

Works for me: https://medium.com/@nickzamosenchuk/training-the-model-for-ios-coreml-in-google-colab-60-times-faster-6b3d1669fc46
You need to uninstall Cuda9 (with lots of packages), add Cuda8 repo, install cuda8, but a couple of packages will need manual installation. Then installing mx-net for Cuda8.
Feel free to ping me if you need any help.

(upd 11.Nov) The code is now properly formatted in the post and has a couple of minor improvements.

Thanks @nzamosenchuk Will try it out

Closing this right now. Please re-open if we have a repro.

Was this page helpful?
0 / 5 - 0 ratings