I have a project contains dlib and tensorflow models, when inference, just got a cuda error.
F tensorflow/stream_executor/cuda/cuda_driver.cc:334] current context was not created by the StreamExecutor cuda_driver API: 0x793dc70; a CUDA runtime call was likely performed without using a StreamExecutor context
I don't know why, how to disable CUDA support for default so that i can using these 2 at the some time?
It's a cmake option. You can use cmake's GUI (ccmake or cmake-gui) or the CLI to disable it when you initially invoke cmake.
Sorry for bother but I disabled USE_CUDA in cmake-gui and even delete CUDA_DIR let it blank, after cmake --build ., I run python3 setup.py install to build python depencies, it still find cuda:
-- Looking for cuDNN install...
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- Checking if you have the right version of cuDNN installed.
-- Found cuDNN: /usr/local/cuda/lib64/libcudnn.so
-- Configuring done
-- Generating done
This is really not I want to see, I set CUDA_VISIABLE_DEVICES to blank in envirment not work at all.
Could you gentlely tell me how to diable cuda in detail (what I am missing)?
You don't call cmake directly if you are using python. So I'm not sure what you are talking about compiling when you say you compiled something with cmake --build ..
In any case, you can pass options like --no DLIB_USE_CUDA. This is discussed in a comment at the top of the setup.py file.
If I run python3 setup.py install without mkdir build; cd build; cmake ..; cmake --build . , is ok? I thought run python install will call cmake files. If directly run python3 setup.py install then how to not let it find CUDA?
python setup.py install --no DLIB_USE_CUDA
Is it possible to disable CUDA support at runtime? Maybe with some environment variable?
No
What prevents runtime support toggling? What if two modules were built:
1) native CPU
2) CUDA support.
Then toggling or switching on the access layer?
This is important because in one step of a process I need an accurate face extraction using a CNN on a high-resolution feed. (Which runs to memory issues due to a malloc issue https://github.com/davisking/dlib/issues/1725 ) . The workarounds would be running it on the CPU.
The other step would be running a face_recognition_model_v1 encoder to generate face embed-dings. Which can benefit greatly from a GPU and should be able to run without the same malloc issue since the face crops are smaller.
There is no deep reason why dlib couldn't be upgraded to support this. See https://github.com/davisking/dlib/issues/1852. But it is not currently an option.
However, in this case what you are suggesting is not a good idea. This stuff runs much faster on the GPU than CPU. If your image is really super huge, you should just chop the image up into parts and run them individually. Like cut the image into 4 subimages or something like that. How best to divide it up depends on your application (i.e. where faces can appear, how big they might be, do you need to divide over scale space, etc.)
Most helpful comment
python setup.py install --no DLIB_USE_CUDA