I have dlib installed with GPU support.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
I'm using batch_face_locations() to get image locations. Then for each location that was pulled out of the batch, I'm getting the encodings using face_encodings(). I time both of these operations, and the time get the encodings is about 3x longer than the time to get the locations. I supposed that I could speed up the time to get the encodings by getting them all in parallel. So I tried something like this:
import multiprocessing as mp
import face_recognition
def get_encoding(frame, face_locations, return_queue):
encode = face_recognition.face_encodings(frame, face_locations)
return_queue.put(encode)
all_batch_face_locations = ... # the frames and associated batch_face_locations returned for all images in my dataset
encodings = []
for frames, batch_face_locs in all_batch_face_locations:
# get the encodings for the current batch of images in parallel
procs = []
queues = []
for frame_number_in_batch, face_locations in enumerate(batch_face_locs):
q = mp.Queue()
p = mp.Process(
target=get_encoding,
args=(frames[frame_number_in_batch], face_locations, q))
p.start()
procs.append(p)
queues.append(q)
for p, q in zip(procs, queues):
p.join()
encoding = q.get()
encodings.append(encoding)
Yet this gives me an error:
...
RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file /tmp/pip-install-2vh9r_rp/dlib/dlib/cuda/gpu_data.cpp:178. code: 3, reason: initialization error
Now I can't actually find anywhere that says that face_recognition.face_encodings() uses the GPU. Even the dlib documentation for the function that face_recognition eventually calls doesn't mention it. But it seems to be using it nonetheless.
I see references in other issues (#98 #374 #649) to running face_encodings() on multiple CPU cores, and I'd at least like to try and experiment with that to see if I can get some improvement. Is there something I'm missing to allow me to run batch_face_locations on the GPU and face_encodings on the CPU? Or, if not, is there some way to also run the encodings on the GPU in batches?
Have the same issue here.
@dav-ell @AndyYangMao I got the same error. Is there any solution for this bug now?
face_recognition.face_encodings() runs a forward pass through a neural net, using the GPU if available. The only way to not use a GPU for that call is to compile dlib with GPU support disabled. Sorry that it doesn't document that it uses CUDA there.
If you do want to run it in parallel to see what happens, try importing face_recognition inside the function that is running in parallel instead of importing it at the top level of the program. The problem is that multiprocessing spawns a new instance of Python in the background in each process but those new processes don't have a handle to an initialized GPU since that happens at the time that face_recognition is imported.
Most helpful comment
face_recognition.face_encodings()runs a forward pass through a neural net, using the GPU if available. The only way to not use a GPU for that call is to compile dlib with GPU support disabled. Sorry that it doesn't document that it uses CUDA there.If you do want to run it in parallel to see what happens, try importing
face_recognitioninside the function that is running in parallel instead of importing it at the top level of the program. The problem is thatmultiprocessingspawns a new instance of Python in the background in each process but those new processes don't have a handle to an initialized GPU since that happens at the time thatface_recognitionis imported.