Google-cloud-python: Django freezes when using Stackdriver logging

Created on 2 Mar 2017  路  9Comments  路  Source: googleapis/google-cloud-python

OS: MacOS 10.11.6
Python: Python 2.7.10
Package:

Name: google-cloud-logging
Version: 0.23.1
Summary: Python Client for Stackdriver Logging
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google Cloud Platform
Author-email: [email protected]
License: Apache 2.0
Location: /Users/briandeheus/.virtualenvs/proj/lib/python2.7/site-packages
Requires: gapic-google-cloud-logging-v2, google-cloud-core, grpcio

Steps to reproduce

  1. start a new Django project
  2. set up your logging as follows:
LOGGING = {
    'version': 1,
    'handlers': {
        'stackdriver': {
            'class': 'google.cloud.logging.handlers.CloudLoggingHandler',
            'client': logging.Client()
        }
    },
    'loggers': {
        '': {
            'handlers': ['stackdriver'],
            'level': 'INFO',
            'propagate': True,
        }
}

Your app will freeze after a few seconds. I've narrowed it down by adding some print statements to the stackdriver module:

_start: Starting
_start: Started
_Run: RUN WORKER
_Run: NOT STOPPING
_Run: WAIT
[2017-03-02 08:18:44,595] DEBUG Skipping environment variable credential check because profile name was explicitly set.
Enqueue: Acquiring
Enqueue: Acquired
Enqueue: Notifying
Enqueue: Notified
Enqueue: Releasing
Enqueue: Released
[2017-03-02 08:18:44,595] DEBUG Looking for credentials via: env
_Run: COMMITING
Commit: BATCH COMMIT
 Enqueue: Acquiring
Commit: WRITE ENTRIES

The Worker class has some modified methods to achieve this:

    def _run(self):
        """The entry point for the worker thread.

        Loops until ``stopping`` is set to :data:`True`, and commits batch
        entries written during :meth:`enqueue`.
        """
        try:
            self._entries_condition.acquire()
            self.started = True
            print "_Run: RUN WORKER"
            while not self.stopping:
                print "_Run: NOT STOPPING"
                if len(self.batch.entries) == 0:
                    print "_Run: WAIT"
                    # branch coverage of this code extremely flaky
                    self._entries_condition.wait()  # pragma: NO COVER

                if len(self.batch.entries) > 0:
                    print "_Run: COMMITING"
                    self.batch.commit()

            print "_Run: STOPPING"
        finally:
            print "_Run: RELEASING"
            self._entries_condition.release()

        # main thread may be waiting for worker thread to finish writing its
        # final entries. here we signal that it's done.
        self._stop_condition.acquire()
        self._stop_condition.notify()
        self._stop_condition.release()

    def _start(self):
        """Called by this class's constructor

        This method is responsible for starting the thread and registering
        the exit handlers.
        """
        try:
            print "_start: Starting"
            self._entries_condition.acquire()
            self._thread = threading.Thread(
                target=self._run, name=self.name)
            self._thread.setDaemon(False)
            self._thread.start()
            print "_start: Started"
        finally:
            self._entries_condition.release()
            atexit.register(self._stop)

    def enqueue(self, record, message):
        """Queues up a log entry to be written by the background thread."""
        try:
            print "Enqueue: Acquiring"
            self._entries_condition.acquire()
            if self.stopping:
                print "Enqueue: Stopping"
                return
            print "Enqueue: Acquired"
            info = {'message': message, 'python_logger': record.name}
            self.batch.log_struct(info, severity=record.levelname)
            print "Enqueue: Notifying"
            self._entries_condition.notify()
            print "Enqueue: Notified"
        finally:
            print "Enqueue: Releasing"
            self._entries_condition.release()
            print "Enqueue: Released"

I've tried both the SyncTransport and the BackgroundThreadTransport but both lock up. Whether running your project in the Django development mode or using uWSGI, the end result is always the same.

logging p1

Most helpful comment

Not sure will try to repro.

All 9 comments

Thanks for reporting @briandeheus!

@waprin's the most familiar with this service I believe. WDYT @waprin?

Not sure will try to repro.

@waprin were you able to reproduce this?

If not, I can try and reproduce it and get you some more information if possible.

Very sorry for delay here.

Unfortunately I am currently not able to reproduce this. This is on Django 1.10.5 , Python 2.7.10 (assuming you are also Django 1.10?) Copying and pasting your logging config is actually working fine for me and Django doesn't seem to be freezing up, and I see the logs.

@briandeheus any way you can provide any more steps to reproduce? Django version, Python version all helpful. A Docker container would be most helpful.

The most common reason I tend to see lockups is infinite recursions (a log statement that occurs in the logging code path restarts the process). The default excluded loggers should exclude all the loggers in the logging path, but that's something to consider.

The fact that it's locking up on the synchronous handler is the most strange, but could also fall victim to infinite recursion problem.

@daspecster if you could try to repro would also be great.

@waprin I'll see if I can create a case. We're actually using Django 1.9 on our end. I should have mentioned that.

Ah sorry, I got distracted and this slipped my memory.

@briandeheus that would be great! Thanks!

Hey guys, terribly sorry for the (really) late reply. I haven't been able to reproduce it with new installations, it must've been something I've done wrong.

If I have any more details I'll update this ticket, I think it's best to close it for now.

Thanks for coming back @briandeheus, sorry for the issues.

This is probably related to #3310 so there's probably an issue lurking somewhere (though maybe fixed in some release?) that's just been hard to repro :-

Was this page helpful?
0 / 5 - 0 ratings