Google-cloud-python: Pubsub: Memory leak in Publisher Client

Created on 29 Oct 2018  路  5Comments  路  Source: googleapis/google-cloud-python

Summary

When publishing many messages, the memory usage continues growing until an OOM error.
This is similar to the reported issue here and here, except the suggested fix of clearing out batch.message_ids was tested without solving the issue.

Environment

  • Google AppEngine, Python37 and Flexible (Python 3.6)
  • requirements.txt:
Flask==1.0.2
google-cloud-storage==1.7.0
google-cloud-pubsub==0.38.0
google-cloud-bigquery==1.1.0
google-api-python-client==1.6.2
google-cloud-datastore==1.6.0
google-cloud-logging==1.6.0
google-api-core==1.1.0
grpcio==1.15.0
gunicorn==19.7.1
jellyfish==0.5.6
numpy==1.15.0
pandas==0.23.4
pandas-gbq==0.4.1
protobuf==3.6.1
requests==2.18.4
retry==0.9.2
scikit-learn==0.19.2
scipy==1.1.0
unidecode==1.0.22

To Reproduce

Setup a simple publisher

# app.py
from flask import Flask, jsonify

from google.cloud import pubsub

TOPIC = 'projects/my-project/topics/my-test-topic'
pubsub_client = pubsub.PublisherClient(
    batch_settings=types.BatchSettings(max_messages=500)
)

def status():
    """Return the current status of the application."""
    pubsub_client.publish(TOPIC, 'healthcheck'.encode('utf8'))
    return jsonify({'status': 'ok'})

def create_app(config_object=None):
    app = Flask(__name__)
    app.config.from_object(config_object)
    app.add_url_rule('/_ah/health', view_func=status, methods=['GET'])
    return app

def main():
    app.run(host='0.0.0.0', port=80)

if __name__ == '__main__':
    main()

And make many requests

from time import sleep
from requests import get

cnt = 0
try:
    while True:
        get('http://localhost/_ah/health')
        sleep(0.1)
        cnt += 1
        print(f'{cnt} requests')
except KeyboardInterrupt:
    print('aborted')
duplicate question pubsub

Most helpful comment

Leaking RAM by creating a client-per-request is essentially the same issue as https://github.com/googleapis/google-cloud-python/issues/5523 (there the issue shows up as leaked filehandles). Apps which do this are going to need to clean the clients up manually, which for pubsub and other GAPIC-based API libraries is blocked on https://github.com/googleapis/gapic-generator/issues/2337 (I have a fix for that pending in https://github.com/googleapis/gapic-generator/pull/2396). Once that fix lands, you will be able to clean up the transport / channel via:

publisher_client.transport.channel.close()

For the BigQuery client (and the others which bind to REST endpoints), the underlying "transport" object is a google.auth.transport.requests.AuthorizedSession instance. That class derives from requests.session, which would allow you to clean it up via:

bq_client._http.close()

All 5 comments

Trying to reproduce locally (not on GAE). A couple of issues:

  • Your requirements.txt contains some elements which have conflicts in dependencies. I trimmed it to just include Flask==1.0.2 and google-cloud-pubsub==0.38.0.
  • Your app.py needs to use pubsub.types.BatchSettings.
  • When reproducing, I needed to pre-create the topic in the console.
$ python3.6 -m venv /tmp/gcp/6324
$ cd /tmp/gcp/6324
$ bin/pip install --upgrade setuptools pip wheel
...
Successfully installed pip-18.1 setuptools-40.5.0 wheel-0.32.2
$ vim requirements.txt
$ cat requirements.txt
Flask==1.0.2
google-cloud-pubsub==0.38.0
$ bin/pip install -r requirements.txt
...
Successfully installed Flask-1.0.2 Jinja2-2.10 MarkupSafe-1.0 Werkzeug-0.14.1 cachetools-2.1.0 certifi-2018.10.15 chardet-3.0.4 click-7.0 google-api-core-1.5.0 google-auth-1.5.1 google-cloud-pubsub-0.38.0 googleapis-common-protos-1.5.3 grpc-google-iam-v1-0.11.4 grpcio-1.16.0 idna-2.7 itsdangerous-1.1.0 protobuf-3.6.1 pyasn1-0.4.4 pyasn1-modules-0.2.2 pytz-2018.7 requests-2.20.0 rsa-4.0 six-1.11.0 urllib3-1.24
$ vim app.py  # add 'pubsub' as indicated above to 'types.BatchSettings'
$ . /path/to/my/local_google_cloud_env  # set up credentials / project env vars
$ bin/flask run -p 8080
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off

When running the client in a separate terminal:

$ cd /tmp/gcp/6324
$ vim client.py
$ bin/python client.py
1 requests
2 requests
3 requests
4 requests
5 requests
6 requests
7 requests
8 requests
9 requests
...

I don't see any growth in the size of the flask process on my machine. Can you provide more detail to help reproduce the leak?

Hi @tseaver , I've done a bit more digging and determined the leak is because we're actually creating a new client for each request. My previous sample was

# app.py
from flask import Flask, jsonify, g

from google.cloud import pubsub
from google.cloud.pubsub import types

TOPIC = 'projects/my-project/topics/my-test-topic'

def get_pubsub_client():
    if 'pubsub' not in g:
        pubsub_client = pubsub.PublisherClient(
            batch_settings=types.BatchSettings(max_messages=500)
        )
    return g.pubsub

def status():
    """Return the current status of the application."""
    pubsub_client = g.pubsub
    pubsub_client.publish(TOPIC, 'healthcheck'.encode('utf8'))
    return jsonify({'status': 'ok'})

def create_app(config_object=None):
    app = Flask(__name__)
    app.config.from_object(config_object)
    app.add_url_rule('/_ah/health', view_func=status, methods=['GET'])
    return app

def main():
    app.run(host='0.0.0.0', port=80)

if __name__ == '__main__':
    main()

Running it locally, this is absolutely causing the memory leak.
By instead binding the pubsub client to current_app, only a single client is created and the memory stays stable.

Still, creating many PublisherClients is leaking memory, and once the requests end (and the references are destroyed) the memory is not freed.

Interesting to note, the same leak occurs when creating a google.cloud.bigquery.Client instance, also binding it to the g context for Flask.

Leaking RAM by creating a client-per-request is essentially the same issue as https://github.com/googleapis/google-cloud-python/issues/5523 (there the issue shows up as leaked filehandles). Apps which do this are going to need to clean the clients up manually, which for pubsub and other GAPIC-based API libraries is blocked on https://github.com/googleapis/gapic-generator/issues/2337 (I have a fix for that pending in https://github.com/googleapis/gapic-generator/pull/2396). Once that fix lands, you will be able to clean up the transport / channel via:

publisher_client.transport.channel.close()

For the BigQuery client (and the others which bind to REST endpoints), the underlying "transport" object is a google.auth.transport.requests.AuthorizedSession instance. That class derives from requests.session, which would allow you to clean it up via:

bq_client._http.close()

Awesome. Thanks!

Was this page helpful?
0 / 5 - 0 ratings