Running dlib via Python should be using my GPU, not CPU (Haven't tried dlib examples in C++ yet, currently building. I suppose python is a wrapper, which invokes the C++ code, so python examples should also be the same behavior)
Running dlib results in nvidia-smi showing that the process is using GPU (and face recognition in cnn mode takes ~3-4 seconds on an Intel Xeon, 3.1GHZ, 4 core, 32GB with an NVIDIA 1050 Ti)
Run any program that uses dlib. I use Adam geitgy's face-recognition python tests.
python ./setup.py installPotentially relevant cmake output:
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
<deleted>
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1")
-- Checking for module 'cblas'
-- No package 'cblas' found
-- Checking for module 'lapack'
-- Found lapack, version 3.10.3
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Found LAPACK library
-- Found ATLAS BLAS library
-- Looking for cblas_ddot
-- Looking for cblas_ddot - found
-- Looking for sgesv
-- Looking for sgesv - not found
-- Looking for sgesv_
-- Looking for sgesv_ - found
-- Found CUDA: /usr/local/cuda (found suitable version "10.1", minimum required is "7.5")
-- Looking for cuDNN install...
-- Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- Checking if you have the right version of cuDNN installed.
-- Enabling CUDA support for dlib. DLIB WILL USE CUDA
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pp/fiddle/dlib/build/temp.linux-x86_64-3.6
Platform:
64 bit, Ubuntu 18.04
Compiler:
gcc 7
Python version: 3.6.7
I have an NVIDIA 1050 Ti GPU installed on my machine.
Drivers are correct:
Thu Jun 20 18:50:54 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:02:00.0 Off | N/A |
| 36% 42C P0 N/A / 75W | 0MiB / 4039MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I have DLIB compiled with GPU:
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
>>> print (dlib.cuda.get_num_devices())
1
However, when I run face_recognition examples, this is what I see:
pp@homeserver:~$ nvidia-smi pmon
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 1755 C 0 0 0 0 python
0 1755 C 1 0 0 0 python
0 1755 C 6 2 0 0 python
The output of nvidia-smi seems to show its using "C" (CPU?) and not GPU?
Please let me know if you need any more info.
I think you're confusing the terms G and C in the nvidia-smi util:
G means graphicalC means computingBut both happen in the GPU, processes not using GPU are not even listed in nvidia-smi.
oh darn! Talk about goofy assumptions. A simple google search would have answered this if I had bothered to search what the terms imply! Thank you very much.
That being said, I'm surprised dlib face_detection for a 600px wide image (cnn mode) is taking 3-4 seconds on a 1050 GPU. Is this normal?
It does seem slow, yes... Is it just the first image, because usually the first one takes longer than the rest, since the network has to be loaded.
I'll give you a tip to monitor GPU processes:
watch -n0.1 nvidia-smi
This will call nvidia-smi every 0.1 seconds.
Ahh. Thanks for helping me here. So you are correct, the first call loads the models. Seems normal after that!
Test program:
import face_recognition
from timeit import default_timer as timer
from datetime import timedelta
import time
image = face_recognition.load_image_file("./face.jpg")
while True:
start = timer()
face_locations = face_recognition.face_locations(image, model="cnn")
end = timer()
print('Time to detect',timedelta(seconds=end-start))
print ('sleeping...')
time.sleep(4)
Output:
Time to detect 0:00:02.583082
sleeping...
Time to detect 0:00:00.605862
sleeping...
Time to detect 0:00:00.583118
sleeping...
Time to detect 0:00:00.584995
sleeping...
Time to detect 0:00:00.604632
sleeping...
Time to detect 0:00:00.586001
sleeping...
Time to detect 0:00:00.583874
Most helpful comment
I think you're confusing the terms
GandCin thenvidia-smiutil:Gmeans graphicalCmeans computingBut both happen in the GPU, processes not using GPU are not even listed in
nvidia-smi.