Google-cloud-python: Is the Python client library thread safe when using gRPC?

Created on 5 Apr 2017  路  9Comments  路  Source: googleapis/google-cloud-python

Hello all,

The documentation [1] makes it clear that http2lib objects aren't thread safe in the Python client library. Are clients that have gRPC support (such as Pubsub) thread safe when using gRPC? This question has been asked with the Java client library before [2] but I'd appreciate a firm answer for Python too.

Thank you.

[1] https://developers.google.com/api-client-library/python/guide/thread_safety
[2] https://github.com/GoogleCloudPlatform/google-cloud-java/issues/1320

question core

Most helpful comment

Sure! This is roughly the code where I have one datastore.Client() per thread:

from google.cloud import datastore

def get_data(symbol_):
    print('Init...')
    data_store_client = datastore.Client()
    print('Done...')
    query = data_store_client.query(kind=symbol_)
    query_iter = query.fetch()
    print_once = True
    for entity in query_iter:
        print(entity)


def parallel_function(f, sequence, num_threads=None):
    from multiprocessing import Pool
    pool = Pool(processes=num_threads)
    result = pool.map(f, sequence)
    cleaned = [x for x in result if x is not None]
    pool.close()
    pool.join()
    return cleaned

def run_query():
    [...]
    parallel_function(f=get_data, sequence=symbols, num_threads=4)

The other code is very similar except that I define a global variable DATA_STORE_CLIENT and this variable is visible across all the threads.

Both code do not work.

When num_threads=1 it runs smoothly.

All 9 comments

Hi @AdamLazarus,
Thanks for asking.

The short answer is: We _think_ so. :-)
(Additionally, if you find thread-safety issues, feel free to open them as bugs.)

Just for information, it seems that you cannot share your datastore.Client() object across all the threads. You're going to have something that looks like this:

E1130 10:54:55.377618000 140736526345152 ssl_transport_security.c:435] Corruption detected.
E1130 10:54:55.377821000 140736526345152 ssl_transport_security.c:411] error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
E1130 10:54:55.377891000 140736526345152 secure_endpoint.c:185]        Decryption error: TSI_DATA_CORRUPTED

@philipperemy I'd love to see an example that reproduces this. I've used Client()-s based on gRPC connections across multiple threads without issue.

Sure! This is roughly the code where I have one datastore.Client() per thread:

from google.cloud import datastore

def get_data(symbol_):
    print('Init...')
    data_store_client = datastore.Client()
    print('Done...')
    query = data_store_client.query(kind=symbol_)
    query_iter = query.fetch()
    print_once = True
    for entity in query_iter:
        print(entity)


def parallel_function(f, sequence, num_threads=None):
    from multiprocessing import Pool
    pool = Pool(processes=num_threads)
    result = pool.map(f, sequence)
    cleaned = [x for x in result if x is not None]
    pool.close()
    pool.join()
    return cleaned

def run_query():
    [...]
    parallel_function(f=get_data, sequence=symbols, num_threads=4)

The other code is very similar except that I define a global variable DATA_STORE_CLIENT and this variable is visible across all the threads.

Both code do not work.

When num_threads=1 it runs smoothly.

Has this ever been addressed? Creating a new client for each thread can effectively double the number of threads in the system.

@speedplane if you want something that can run in production, you might want to use something else. Those libs are not very stable unfortunately.

@philipperemy what other options are there for accessing the datastore? Isn't this the official library?

I'm looking at the code now, and it's much worse than 1 new thread per client. It seems that when using gRPC, there are 4 threads: a consumption thread, a channel spin thread, a delivering thread, and a polling thread. (I'm not sure what these threads do or if they're always used). This seems to be per client, and can get bad, take the following example:

  • You have a 4 core server with 4 worker request handler
  • Each worker has 20 threaded request handlers (so it can handle 80 simultaneous requests).
  • Each request handler thread needs access to 2 clients: the datastore and cloud storage.
  • Each of those clients spawns 4 gRPC threads.

That results in 720 threads (= 4 * 20 * (1 + 2 * 4)) when 80 would have worked fine.

@speedplane We expect that gRPC-based clients to be thread safe: the issues we know of are to do with multiprocessing (forking after creating a client).

Was this page helpful?
0 / 5 - 0 ratings