pip list output:
cachetools 3.1.1
certifi 2019.9.11
chardet 3.0.4
google-api-core 1.14.3
google-api-python-client 1.7.11
google-auth 1.6.3
google-auth-httplib2 0.0.3
google-cloud-bigquery 1.20.0
google-cloud-bigquery-datatransfer 0.4.1
google-cloud-core 1.0.3
google-cloud-pubsub 1.0.2
google-resumable-media 0.4.1
googleapis-common-protos 1.6.0
grpc-google-iam-v1 0.12.3
grpcio 1.24.1
httplib2 0.14.0
idna 2.8
pip 19.2.3
protobuf 3.10.0
pyasn1 0.4.7
pyasn1-modules 0.2.6
pytz 2019.3
requests 2.22.0
rsa 4.0
setuptools 41.2.0
six 1.12.0
uritemplate 3.0.0
urllib3 1.25.6
wheel 0.33.6
OS: Ubuntu
Python 3.5.2
API: BigQuery Data Transfer Service
Initialize more than one data transfer client.
from google.cloud import bigquery_datatransfer_v1
import time
while True:
dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
try:
dts_client.get_transfer_run('some run id').state
except:
print('error!')
time.sleep(2)
Output of ll /proc/<pid>/fd after three seconds:
0 1 10 11 2 3 4 5 6 7 8 9
And keeps growing.
There's also no way to clean up the client as it doesn't implement __exit__.
This client is 100% auto-generated via our gRPC toolchain. I'll try to summon some of our folks who work at those lower levels to investigate.
@yebrahim This is unrelated, but you only need to create the dts_client once.
from google.cloud import bigquery_datatransfer_v1
import time
dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
while True:
try:
dts_client.get_transfer_run('some run id').state
except:
print('error!')
time.sleep(2)
Thank you both.
@busunkim96 indeed unrelated. My specific use case required creating the client on the fly using some parameters that I receive as method args. The example above is a simplification that only reproduces the problem.
This isn't a "leak", per se: every GAPIC client has a transport attribute, which in turn holds a gRPC channel as its _channel attribute: each channel holds two open sockets.
There's also no way to clean up the client as it doesn't implement __exit__.
__exit__ is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage. You can, if desired, manually close the channel, e.g., client.transport.channel.close().
@tseaver:
del and recreate it if needed?FYI: I am able to reproduce this by using the psutil.Process.connections method to observe leaked resources.
import psutil
import pprint
from google.cloud import bigquery_datatransfer_v1
current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
del client
print("connections after deleting client: {}".format(len(current_process.connections())))
Output from 鈽濓笍 :
connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after deleting client: 2
The workaround of client.transport.channel.close() does indeed clean up resources.
import psutil
import pprint
from google.cloud import bigquery_datatransfer_v1
current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
client.transport.channel.close()
print("connections after closing client.transport.channel: {}".format(len(current_process.connections())))
Output:
connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after closing client.transport.channel: 0
Does it make sense to wrap this call in an API method? Something like client.close()?
We have just added such a method in the main BigQuery client. Personally, I agree that this makes sense to do across all our client libraries. I've drafted an internal proposal to be reviewed by our core libraries team.
Googlers: see approved proposal to update generated clients at go/closable-clients-python
As discussed offline, since this requires changes to the generator, this feature will wait until after the new Python 3-only generator migration is underway.
I'm only asking this in terms of viability, regarding @tseaver 's comment :
__exit__ is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage
Is there anything wrong with this approach?
from google.cloud import bigquery
from google.cloud.bigquery.dbapi import Connection
from google.cloud.bigquery.dbapi.cursor import Cursor
class Pep249Cursor(Cursor):
def __init__(self, client):
super(Pep249Cursor, self).__init__(client)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
def create_tbl_if_not_exists(c):
c.execute("CREATE TABLE IF NOT EXISTS ...")
client = bigquery.Client()
conn = Connection(client)
with Pep249Cursor(conn) as c:
create_tbl_if_not_exists(c)
The DB-API is a bit of an oddball in regards to context managers. I would expect a Curor context manager to commit a transaction but not close the connection, which doesn't apply to BigQuery as it is not a transactional database.
Most helpful comment
Thank you both.
@busunkim96 indeed unrelated. My specific use case required creating the client on the fly using some parameters that I receive as method args. The example above is a simplification that only reproduces the problem.