Face_recognition: Running face_encodings in parallel gives RuntimeError: cudaGetDevice()... reason: initialization error

Created on 21 Feb 2019 · 3Comments · Source: ageitgey/face_recognition

face_recognition version: 1.2.3
Python version: 3.7.2
Operating System: Scientific Linux 7.6

I have dlib installed with GPU support.

>>> import dlib
>>> dlib.DLIB_USE_CUDA
True

I'm using batch_face_locations() to get image locations. Then for each location that was pulled out of the batch, I'm getting the encodings using face_encodings(). I time both of these operations, and the time get the encodings is about 3x longer than the time to get the locations. I supposed that I could speed up the time to get the encodings by getting them all in parallel. So I tried something like this:

import multiprocessing as mp
import face_recognition

def get_encoding(frame, face_locations, return_queue):
    encode = face_recognition.face_encodings(frame, face_locations)
    return_queue.put(encode)

all_batch_face_locations = ... # the frames and associated batch_face_locations returned for all images in my dataset

encodings = []
for frames, batch_face_locs in all_batch_face_locations:
    # get the encodings for the current batch of images in parallel
    procs = []
    queues = []
    for frame_number_in_batch, face_locations in enumerate(batch_face_locs):
        q = mp.Queue()
        p = mp.Process(
            target=get_encoding, 
            args=(frames[frame_number_in_batch], face_locations, q))
        p.start()
        procs.append(p)
        queues.append(q)

    for p, q in zip(procs, queues):
        p.join()
        encoding = q.get()
        encodings.append(encoding)

Yet this gives me an error:

...
RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file /tmp/pip-install-2vh9r_rp/dlib/dlib/cuda/gpu_data.cpp:178. code: 3, reason: initialization error

Now I can't actually find anywhere that says that face_recognition.face_encodings() uses the GPU. Even the dlib documentation for the function that face_recognition eventually calls doesn't mention it. But it seems to be using it nonetheless.

I see references in other issues (#98 #374 #649) to running face_encodings() on multiple CPU cores, and I'd at least like to try and experiment with that to see if I can get some improvement. Is there something I'm missing to allow me to run batch_face_locations on the GPU and face_encodings on the CPU? Or, if not, is there some way to also run the encodings on the GPU in batches?

Source

dav-ell

Most helpful comment

face_recognition.face_encodings() runs a forward pass through a neural net, using the GPU if available. The only way to not use a GPU for that call is to compile dlib with GPU support disabled. Sorry that it doesn't document that it uses CUDA there.

If you do want to run it in parallel to see what happens, try importing face_recognition inside the function that is running in parallel instead of importing it at the top level of the program. The problem is that multiprocessing spawns a new instance of Python in the background in each process but those new processes don't have a handle to an initialized GPU since that happens at the time that face_recognition is imported.