Google-cloud-python: [BigQuery Data Transfer Service, all GAPIC clients]: provide a close() method and context manager to clean up client-owned resources

Created on 11 Oct 2019  路  13Comments  路  Source: googleapis/google-cloud-python

Environment details

pip list output:

cachetools                         3.1.1    
certifi                            2019.9.11
chardet                            3.0.4    
google-api-core                    1.14.3   
google-api-python-client           1.7.11   
google-auth                        1.6.3    
google-auth-httplib2               0.0.3    
google-cloud-bigquery              1.20.0   
google-cloud-bigquery-datatransfer 0.4.1    
google-cloud-core                  1.0.3    
google-cloud-pubsub                1.0.2    
google-resumable-media             0.4.1    
googleapis-common-protos           1.6.0    
grpc-google-iam-v1                 0.12.3   
grpcio                             1.24.1   
httplib2                           0.14.0   
idna                               2.8      
pip                                19.2.3   
protobuf                           3.10.0   
pyasn1                             0.4.7    
pyasn1-modules                     0.2.6    
pytz                               2019.3   
requests                           2.22.0   
rsa                                4.0      
setuptools                         41.2.0   
six                                1.12.0   
uritemplate                        3.0.0    
urllib3                            1.25.6   
wheel                              0.33.6   

OS: Ubuntu
Python 3.5.2
API: BigQuery Data Transfer Service

Steps to reproduce

Initialize more than one data transfer client.

Code example

from google.cloud import bigquery_datatransfer_v1
import time

while True:
  dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
  try:
    dts_client.get_transfer_run('some run id').state
  except:
    print('error!')

  time.sleep(2)

Output of ll /proc/<pid>/fd after three seconds:
0 1 10 11 2 3 4 5 6 7 8 9
And keeps growing.

There's also no way to clean up the client as it doesn't implement __exit__.

feature request core bigquerydatatransfer

Most helpful comment

Thank you both.
@busunkim96 indeed unrelated. My specific use case required creating the client on the fly using some parameters that I receive as method args. The example above is a simplification that only reproduces the problem.

All 13 comments

This client is 100% auto-generated via our gRPC toolchain. I'll try to summon some of our folks who work at those lower levels to investigate.

@yebrahim This is unrelated, but you only need to create the dts_client once.

from google.cloud import bigquery_datatransfer_v1
import time

dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
while True:
  try:
    dts_client.get_transfer_run('some run id').state
  except:
    print('error!')

  time.sleep(2)

Thank you both.
@busunkim96 indeed unrelated. My specific use case required creating the client on the fly using some parameters that I receive as method args. The example above is a simplification that only reproduces the problem.

This isn't a "leak", per se: every GAPIC client has a transport attribute, which in turn holds a gRPC channel as its _channel attribute: each channel holds two open sockets.

There's also no way to clean up the client as it doesn't implement __exit__.

__exit__ is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage. You can, if desired, manually close the channel, e.g., client.transport.channel.close().

@tseaver:

  • Why doesn't the client fit the usage? Is it not Pythonic to encapsulate client libraries calls within a Python context?
  • If I do close the transport channel manually, can the client be reused afterwards or should I then del and recreate it if needed?

FYI: I am able to reproduce this by using the psutil.Process.connections method to observe leaked resources.

import psutil
import pprint

from google.cloud import bigquery_datatransfer_v1


current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
del client
print("connections after deleting client: {}".format(len(current_process.connections())))

Output from 鈽濓笍 :

connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after deleting client: 2

The workaround of client.transport.channel.close() does indeed clean up resources.

import psutil
import pprint

from google.cloud import bigquery_datatransfer_v1


current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
client.transport.channel.close()
print("connections after closing client.transport.channel: {}".format(len(current_process.connections())))

Output:

connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after closing client.transport.channel: 0

Does it make sense to wrap this call in an API method? Something like client.close()?

We have just added such a method in the main BigQuery client. Personally, I agree that this makes sense to do across all our client libraries. I've drafted an internal proposal to be reviewed by our core libraries team.

Googlers: see approved proposal to update generated clients at go/closable-clients-python

As discussed offline, since this requires changes to the generator, this feature will wait until after the new Python 3-only generator migration is underway.

I'm only asking this in terms of viability, regarding @tseaver 's comment :

__exit__ is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage

Is there anything wrong with this approach?

from google.cloud import bigquery
from google.cloud.bigquery.dbapi import Connection
from google.cloud.bigquery.dbapi.cursor import Cursor

class Pep249Cursor(Cursor):
    def __init__(self, client):
        super(Pep249Cursor, self).__init__(client)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()


def create_tbl_if_not_exists(c):
    c.execute("CREATE TABLE IF NOT EXISTS ...")


client = bigquery.Client()
conn = Connection(client)
with Pep249Cursor(conn) as c:
        create_tbl_if_not_exists(c)

The DB-API is a bit of an oddball in regards to context managers. I would expect a Curor context manager to commit a transaction but not close the connection, which doesn't apply to BigQuery as it is not a transactional database.

Was this page helpful?
0 / 5 - 0 ratings