celery -A proj report
in the issue.master
branch of Celery.My app is developed by using django1.11 and celery4.1[sqs]. It is deployed on Amazon AWS through Elastic Beanstalk. AWS SQS is the broker.
The issue shows up every night around 10:42 pm, lasting for about 1 hour. Tens of thousands of backend_cleanup tasks start to be executed. Per the list of the task results in the Admin, each task is executed successfully. The CPU usage sometimes stays at 100%. Every page of the app is hard to open. 1 hour after the issue, the CPU usage is back to normal. However, the queues are wedging. Any delay() tasks cannot be executed. I have to purge the queue in SQS, re-start the app, and re-upload the entire code to AWS Elastic Beanstalk. Then the delay() tasks will be executed normally. If I don't do the purging, re-starting or re-uploading, none of any async tasks will be executed. Even the built-in backendcleaup task cannot start. My app doesn't have any periodic or crontab tasks except the built-in celery.backend_cleanup task. Could anyone help me with this issue?
I need the entire app to perform normally after the backend_cleanup tasks are finished.
Described in the ## Steps to reproduce
I just found that the problem is probably about the built-in task, celery.backend_cleanup. I just ran the app in the local development environment. When it was 10:42 pm, tremendous backend_cleanup tasks started and executed successfully. However, after a couple of seconds, an error occurred. Here is the error.
[2017-10-23 22:51:27,677: CRITICAL/MainProcess] Unrecoverable error: Exception('Request Empty body HTTP 599 Server aborted the SSL handshake (None)',)
Traceback (most recent call last):
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 320, in start
blueprint.start(self)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 596, in start
c.loop(*c.loop_args())
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/celery/worker/loops.py", line 88, in asynloop
next(loop)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/hub.py", line 354, in create_loop
cb(*cbargs)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/http/curl.py", line 111, in on_readable
return self._on_event(fd, _pycurl.CSELECT_IN)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/http/curl.py", line 124, in _on_event
self._process_pending_requests()
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/http/curl.py", line 132, in _process_pending_requests
self._process(curl, errno, reason)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/http/curl.py", line 178, in _process
buffer=buffer, effective_url=effective_url, error=error,
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/promises.py", line 150, in __call__
svpending(*ca, **ck)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/promises.py", line 143, in __call__
return self.throw()
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/promises.py", line 140, in __call__
retval = fun(*final_args, **final_kwargs)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/funtools.py", line 100, in _transback
return callback(ret)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/promises.py", line 143, in __call__
return self.throw()
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/promises.py", line 140, in __call__
retval = fun(*final_args, **final_kwargs)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/funtools.py", line 98, in _transback
callback.throw()
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/vine/funtools.py", line 96, in _transback
ret = filter_(*args + (ret,), **kwargs)
File "/Users/user/code/accusize-021/lib/python3.4/site-packages/kombu/async/aws/connection.py", line 253, in _on_status_ready
raise self._for_status(response, response.read())
Exception: Request Empty body HTTP 599 Server aborted the SSL handshake (None)
Meanwhile, the queue disappeared from AWS SQS. Then a few of the backend_cleanup tasks continued to execute. And finally, a warning showed up.
[2017-10-23 22:51:28,967: WARNING/MainProcess] Restoring 10 unacknowledged message(s)
And finally, the HTTPS connection to the SQS queue stopped. The app could not re-connect to SQS unless I re-deployed the app to AWS Elastic Beanstalk.
Could anyone help me with this issue? Any response will be significantly appreciated.
I think I solved the issue. The problem never happens after I changed CELERY_RESULT_BACKEND to "redis" from "django-db", and change CELERY_TIMEZONE to "UTC". Although the Task Results in Django Admin stopped adding any task results, my problem had been majorly solved.
You can simply overwrite the schedule and avoid using crontab:
'celery.backend_cleanup': {
'task': 'celery.backend_cleanup',
'schedule': 86400, # every 24 hours instead of crontab('0', '4', '*') which is used in celery beat
'options': {'expires': 12 * 3600}}
,
Most helpful comment
You can simply overwrite the schedule and avoid using crontab:
'celery.backend_cleanup': { 'task': 'celery.backend_cleanup', 'schedule': 86400, # every 24 hours instead of crontab('0', '4', '*') which is used in celery beat 'options': {'expires': 12 * 3600}} ,