celery -A proj report
in the issue.software -> celery:4.0.2 (latentcall) kombu:4.0.2 py:3.4.3
billiard:3.5.0.2 py-amqp:2.1.4
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:disabled
master
branch of Celery.Upgrade to Celery 4.0.2.
No error.
Getting this error from my workers:
[2017-01-19 04:07:57,411: CRITICAL/MainProcess] Couldn't ack 461, reason:BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
File "/opt/app/current/venv/lib/python3.4/site-packages/kombu/message.py", line 130, in ack_log_error
self.ack(multiple=multiple)
File "/opt/app/current/venv/lib/python3.4/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/opt/app/current/venv/lib/python3.4/site-packages/amqp/channel.py", line 1408, in basic_ack
spec.Basic.Ack, argsig, (delivery_tag, multiple),
File "/opt/app/current/venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/opt/app/current/venv/lib/python3.4/site-packages/amqp/method_framing.py", line 174, in write_frame
write(view[:offset])
File "/opt/app/current/venv/lib/python3.4/site-packages/amqp/transport.py", line 269, in write
self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
This causes the worker to reset it's connection to RabbitMQ every few seconds, and sometimes leads to hitting the max connection limit allowed on the RabbitMQ server.
Possibly related to #3328 and #3377.
Not sure what's causing this, but it starts happening ever day or two. Restarting the workers fixes it for a while.
I'm also seeing this issue, albeit on a slightly older version of celery.
software -> celery:3.1.23 (Cipater) kombu:3.0.35 py:3.5.3
billiard:3.3.0.23 py-amqp:1.4.9
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:djcelery.backends.database:DatabaseBackend
+1
+1 seeing this issue lately on celery 3.2.1
+1 Seeing this issue lately on 3.1.18
+1 Seeing the issue on 4.0.2
+1 Seeing the issue on 4.0.2
+1 Seeing the issue on 4.0.2
+1 Seeing the issue on 4.0.2
+1 seeing this issue on 4.0.2
FYI, this and other Celery 4.x issues were fixed when I downgraded to 3.x. The long term solution is switch to https://github.com/Bogdanp/dramatiq, which has the same syntax as Celery .delay()
so very easy to switch.
+1 seeing on 4.0.2
+1 seeing on 4.0.2
+1 seeing on 4.1.0
+1 seeing on 4.1.0
+1 seeing on celery (4.1.0), gevent
+1 celery 4.1.0
+1
Seeing this on 4.1.0 with rabbitmq broker
+1
Happens when a revoke task with terminate
argument: app.control.revoke(t.id, terminate=True)
Same when terminate the task from Flower.
Python 3.6.2rc2
Celery 4.1.0 (latentcall)
broker: rabbitmq:3.6.12
backend: redis:3.2.10
[2017-10-11 09:31:49,471: ERROR/MainProcess] Control command error: BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/celery/worker/pidbox.py", line 42, in on_message
self.node.handle_message(body, message)
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/pidbox.py", line 129, in handle_message
return self.dispatch(**body)
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/pidbox.py", line 112, in dispatch
ticket=ticket)
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/pidbox.py", line 135, in reply
serializer=self.mailbox.serializer)
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/pidbox.py", line 265, in _publish_reply
**opts
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/amqp/channel.py", line 1734, in _basic_publish
(0, exchange, routing_key, mandatory, immediate), msg
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/amqp/abstract_channel.py", line 50, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/amqp/method_framing.py", line 166, in write_frame
write(view[:offset])
File "/home/gustavorps/.pyenv/versions/3.6.2rc2/lib/python3.6/site-packages/amqp/transport.py", line 258, in write
self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
+1 celery 4.0.0
see the last explanation of @mihajenko https://github.com/celery/celery/issues/3377
+1 celery (4.0.2)
Traceback (most recent call last):
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/kombu/message.py", line 130, in ack_log_error
self.ack(multiple=multiple)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/amqp/channel.py", line 1408, in basic_ack
spec.Basic.Ack, argsig, (delivery_tag, multiple),
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/amqp/method_framing.py", line 174, in write_frame
write(view[:offset])
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/amqp/transport.py", line 269, in write
self._write(s)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/eventlet/greenio/base.py", line 397, in sendall
tail = self.send(data, flags)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/eventlet/greenio/base.py", line 391, in send
return self._send_loop(self.fd.send, data, flags)
File "/Users/celery/.virtualenvs/opb/lib/python2.7/site-packages/eventlet/greenio/base.py", line 378, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe
@auvipy please re-open. This is a bug... a hack doesn't qualify as a fix.
@auvipy Could you give a quick fix? I'm not familiar with celery. Thank you!
+1 4.4
+1@_@
+1
rather then putting +1 please take your time to investigate the related issues so that you could find a possible solution. would be great if anyone can report the issue still persist in 4.2rc2?
@auvipy We have been investigating the issue, we have a rather large celery + rabbitmq infra. We today switched from 3.1.25 to a 4.2 patch. We started seein %4 of task drops. This error was coinciding with task drop/losts. Have you ever faced/heard similar issue in the transition ?
could you first try https://github.com/celery/celery/releases/tag/v3.1.26 locally then upgrading to 4.2rc2? we got a report from zapier then they were moving from 3.1 to 4.1, another thing is though celery is 4.2rc2 in pypi the dependencies are not yet in pypi. If you could take your time to update the dependencies from master branch with 4.2rc2 and see what happen and let us know, it would be really great. after that we could mark that as high priority/blocker for 4.2 final release.
also I request to raise the issue on mailing list
Sorry @auvipy, just to clarify: you'd like us to try the code @ 4.2rc2 from pypi, but with the dependencies updated to match the dependency list in the master branch?
not all but if anyone, celery 4.2rc2 from pypi, and other dependecnies of celery from github masters brnach in local/staging servers
I work with @burakbostancioglu, so yes, we can try this =]
thank you :)
any update guys? using 4.2rc4?
I'm seeing this occur fairly regularly on Celery 4.2.0 when under load (testing 3.1.25 -> 4.2.0 migration, but this is blocking it). I'm not monkey patching greenlets. RMQ as broker, redis as results backend.
We're dug in far too hard for a migration if we can avoid it, but I appreciate the tip :-)
Please let me know if there's anything I can dig into regarding this error.
I applied the fix mentionned in
https://github.com/celery/celery/issues/4226 and it solved the problem for me.
So is this issue resolved or in the way the be resolved ? All my workflow depends on celery and I really enjoy this tool.
If the bug is in the way to be resolved, is there a quickfick before the real resolution ?
@xirdneh if you can please investigate the related issued whenever you can.
I set static --concurrency
instead of --autoscale
and this solved the issue for me.
(I had the bug with concurrency 1 and no autoscale)
Still having the same issue with celery 4.2.1. Hope one more report will be helpful.
software -> celery:4.2.1 (windowlicker) kombu:4.2.1 py:2.7.15rc1
billiard:3.5.0.4 redis:2.10.6
platform -> system:Linux arch:64bit imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://127.0.0.1:6379/0
broker_url: u'redis://127.0.0.1:6379/0'
result_backend: u'redis://127.0.0.1:6379/0'
task_routes: {
u'celery.crawler.fbk.download_alt_text': { u'queue': u'fbk_alt_text'}}
Anyone up for testing with celery master?
this still exists .
default celery-67c7fff78b-g48vx celery Couldn't ack 23, reason:BrokenPipeError(32, 'Broken pipe')
default celery-67c7fff78b-g48vx celery Traceback (most recent call last):
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/kombu/message.py", line 130, in ack_log_error
default celery-67c7fff78b-g48vx celery self.ack(multiple=multiple)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/kombu/message.py", line 125, in ack
default celery-67c7fff78b-g48vx celery self.channel.basic_ack(self.delivery_tag, multiple=multiple)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/amqp/channel.py", line 1399, in basic_ack
default celery-67c7fff78b-g48vx celery spec.Basic.Ack, argsig, (delivery_tag, multiple),
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/amqp/abstract_channel.py", line 51, in send_method
default celery-67c7fff78b-g48vx celery conn.frame_writer(1, self.channel_id, sig, args, content)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/amqp/method_framing.py", line 172, in write_frame
default celery-67c7fff78b-g48vx celery write(view[:offset])
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/amqp/transport.py", line 288, in write
default celery-67c7fff78b-g48vx celery self._write(s)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/gevent/_socket3.py", line 458, in sendall
default celery-67c7fff78b-g48vx celery return _socketcommon._sendall(self, data_memory, flags)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/gevent/_socketcommon.py", line 374, in _sendall
default celery-67c7fff78b-g48vx celery timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/gevent/_socketcommon.py", line 303, in __send_chunk
default celery-67c7fff78b-g48vx celery data_sent += socket.send(chunk, flags)
default celery-67c7fff78b-g48vx celery File "/usr/local/lib/python3.7/site-packages/gevent/_socket3.py", line 439, in send
default celery-67c7fff78b-g48vx celery return _socket.socket.send(self._sock, data, flags)
default celery-67c7fff78b-g48vx celery BrokenPipeError: [Errno 32] Broken pipe
it seem to be a gevent issue? ile "/usr/local/lib/python3.7/site-packages/gevent/_socket3.py", line 439, in send
default celery-67c7fff78b-g48vx celery return _socket.socket.send(self._sock, data, flags)
default celery-67c7fff78b-g48vx celery BrokenPipeError: [Errno 32] Broken pipe
celery 4.4.0rc1?
tried on master branch with patch proposed in https://github.com/celery/celery/issues/3377
still gets stuck after an hour of operation
+1 celery 4.2.1
@auvipy I've seen this issue with Celery 4.3.0 + gevent + RabbitMQ. Using strace
I found that the issue relates to recvfrom
and epoll_wait
syscall. It seems that the socket cannot receive msg from RabbitMQ and fall into infinite loop, Celery process was stuck there.
26738
is Celery process ID
sudo strace -p 26738 -f
...
[pid 26738] recvfrom(5, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 441000388}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 441208787}) = 0
[pid 26738] epoll_wait(15, [], 64, 502) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944133989}) = 0
[pid 26738] recvfrom(21, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944706657}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944870229}) = 0
[pid 26738] epoll_wait(15, [], 64, 999) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 944471272}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 944612143}) = 0
[pid 26738] epoll_wait(15, [], 64, 1) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 945931383}) = 0
[pid 26738] recvfrom(21, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sudo strace -p 26738 -f
...
[pid 26738] recvfrom(5, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 441000388}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 441208787}) = 0
[pid 26738] epoll_wait(15, [], 64, 502) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944133989}) = 0
[pid 26738] recvfrom(21, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944706657}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399616, 944870229}) = 0
[pid 26738] epoll_wait(15, [], 64, 999) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 944471272}) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 944612143}) = 0
[pid 26738] epoll_wait(15, [], 64, 1) = 0
[pid 26738] clock_gettime(CLOCK_MONOTONIC, {3399617, 945931383}) = 0
[pid 26738] recvfrom(21, 0x7fbead4995c4, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
More information about the inode 5, 15, 21:
$ sudo ls -la /proc/26738/fd/5
lrwx------ 1 xxx xxx 64 Aug 2 06:05 /proc/26738/fd/5 -> socket:[99157475]
$ sudo ls -la /proc/26738/fd/21
lrwx------ 1 xxx xxx 64 Aug 2 06:05 /proc/26738/fd/21 -> socket:[99144296]
$ sudo ls -la /proc/26738/fd/15
lrwx------ 1 xxx xxx 64 Aug 2 06:05 /proc/26738/fd/15 -> anon_inode:[eventpoll]
$ sudo lsof -p 26738 | grep 99157475
celery 26738 xxx 5u IPv4 99157475 0t0 TCP xxx-1084:50954->rabbit.xxx-1084:amqp (ESTABLISHED)
$ sudo lsof -p 26738 | grep 99144296
celery 26738 xxx 21u IPv4 99144296 0t0 TCP xxx-1084:38194->rabbit.xxx-1084:amqp (ESTABLISHED)
$ sudo head -n1 /proc/26738/net/tcp; grep -a 99157475 /proc/26738/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
10: 8A01010A:C70A 5E00010A:1628 01 00000000:00000000 02:00000351 00000000 1005 0 99157475 2 0000000000000000 20 4 30 10 -1
Hope it helps.
+1
+1
@duydo I'm meet the same situation with you. And rabbitMQ couldn't ack anything.
Traceback (most recent call last):
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/kombu/message.py", line 130, in ack_log_error
self.ack(multiple=multiple)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/amqp/channel.py", line 1399, in basic_ack
spec.Basic.Ack, argsig, (delivery_tag, multiple),
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/amqp/abstract_channel.py", line 51, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/amqp/method_framing.py", line 172, in write_frame
write(view[:offset])
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/amqp/transport.py", line 288, in write
self._write(s)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/gevent/_socket3.py", line 458, in sendall
return _socketcommon._sendall(self, data_memory, flags)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/gevent/_socketcommon.py", line 374, in _sendall
timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/gevent/_socketcommon.py", line 303, in __send_chunk
data_sent += socket.send(chunk, flags)
File "/home/ymserver/.pyenv/versions/adspider/lib/python3.6/site-packages/gevent/_socket3.py", line 439, in send
return _socket.socket.send(self._sock, data, flags)
BrokenPipeError: [Errno 32] Broken pipe
I use strace to track PID
$ sudo strace -p 27977 -f
[pid 27977] getpid() = 27977
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372759, tv_nsec=108490973}) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372759, tv_nsec=108664276}) = 0
[pid 27977] epoll_wait(4, [], 64, 1999) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372761, tv_nsec=108549437}) = 0
[pid 27977] recvfrom(10, 0x7fc8c406a758, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372761, tv_nsec=109251966}) = 0
[pid 27977] getpid() = 27977
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372761, tv_nsec=109598283}) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372761, tv_nsec=109777281}) = 0
[pid 27977] epoll_wait(4, [], 64, 1529) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372762, tv_nsec=640754845}) = 0
[pid 27977] getpid() = 27977
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372762, tv_nsec=641809883}) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372762, tv_nsec=641940654}) = 0
[pid 27977] epoll_wait(4, [], 64, 467) = 0
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372763, tv_nsec=109901002}) = 0
[pid 27977] recvfrom(10, 0x7fc8c406a758, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 27977] clock_gettime(CLOCK_MONOTONIC, {tv_sec=10372763, tv_nsec=110880127}) = 0
[pid 27977] getpid() = 27977
$ sudo lsof -p 27977|grep 1651852686
python 27977 ymserver 10u IPv4 1651852686 0t0 TCP vuljp-ag-proxy-01:33006->10.55.4.70:amqp (ESTABLISHED)
+1 celery v4.4.2
+1 celery 4.3.0
+1 celery 4.3.0
rather than +1 spam please check the related PR & improve it with unit tests.
+1 for celery 4.4.6., no matter with python2 or python3 env.
software -> celery:4.4.6 (cliffs) kombu:4.6.11 py:2.7.10
billiard:3.6.3.0 py-amqp:2.6.0
platform -> system:Darwin arch:64bit
kernel version:17.7.0 imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:pyamqp results:redis://10.28.218.6/
Log is here:
worker: Warm shutdown (MainProcess)
[2020-07-17 22:27:51,844: DEBUG/MainProcess] | Worker: Closing Hub...
[2020-07-17 22:27:51,844: DEBUG/MainProcess] | Worker: Closing Pool...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Worker: Closing Consumer...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Worker: Stopping Consumer...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Connection...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Events...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Mingle...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Gossip...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Heart...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Tasks...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing Control...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Closing event loop...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Stopping event loop...
[2020-07-17 22:27:51,845: DEBUG/MainProcess] | Consumer: Stopping Control...
[2020-07-17 22:27:52,153: DEBUG/MainProcess] Closed channel #3
[2020-07-17 22:27:52,153: DEBUG/MainProcess] | Consumer: Stopping Tasks...
[2020-07-17 22:27:52,153: DEBUG/MainProcess] Canceling task consumer...
[2020-07-17 22:27:53,279: DEBUG/MainProcess] | Consumer: Stopping Heart...
[2020-07-17 22:27:53,280: DEBUG/MainProcess] | Consumer: Stopping Gossip...
[2020-07-17 22:27:53,842: DEBUG/MainProcess] Closed channel #2
[2020-07-17 22:27:53,842: DEBUG/MainProcess] | Consumer: Stopping Mingle...
[2020-07-17 22:27:53,842: DEBUG/MainProcess] | Consumer: Stopping Events...
[2020-07-17 22:27:53,842: DEBUG/MainProcess] | Consumer: Stopping Connection...
[2020-07-17 22:27:53,843: DEBUG/MainProcess] | Worker: Stopping Pool...
^@^@[2020-07-17 22:31:07,678: INFO/ForkPoolWorker-1] Task tasks.handle_compressed[5e493207-a3c5-422e-ab82-1f382342273c] succeeded in 200.757257249s: {'name': 'ray'}
[2020-07-17 22:31:09,188: DEBUG/MainProcess] | Worker: Stopping Hub...
[2020-07-17 22:31:09,189: CRITICAL/MainProcess] Couldn't ack 1, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/Users/ranc/p2dev/lib/python2.7/site-packages/kombu/message.py", line 131, in ack_log_error
self.ack(multiple=multiple)
File "/Users/ranc/p2dev/lib/python2.7/site-packages/kombu/message.py", line 126, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/channel.py", line 1394, in basic_ack
spec.Basic.Ack, argsig, (delivery_tag, multiple),
File "/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/abstract_channel.py", line 59, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/method_framing.py", line 172, in write_frame
write(view[:offset])
File "/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/transport.py", line 305, in write
self._write(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
[2020-07-17 22:31:09,195: DEBUG/MainProcess] | Consumer: Shutdown Control...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] | Consumer: Shutdown Tasks...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] Canceling task consumer...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] Closing consumer channel...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] | Consumer: Shutdown Heart...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] | Consumer: Shutdown Gossip...
[2020-07-17 22:31:09,196: DEBUG/MainProcess] | Consumer: Shutdown Events...
[2020-07-17 22:31:09,431: DEBUG/MainProcess] Closed channel #1
[2020-07-17 22:31:09,663: DEBUG/MainProcess] | Consumer: Shutdown Connection...
[2020-07-17 22:31:09,665: DEBUG/MainProcess] removing tasks from inqueue until task handler finished
FYI. disable broker_heartbeat (broker_heartbeat =0 )can fix my issue. Not sure it's a good practise.
Hope this triage info help root case analysis.
> ps -ef | grep worker
502 92665 75622 0 10:27PM ttys008 0:00.78 /Users/ranc/p2dev/bin/python /Users/ranc/p2dev/bin/celery -A tasks worker -l debug -c 1
502 92672 92665 0 10:27PM ttys008 0:00.01 /Users/ranc/p2dev/bin/python /Users/ranc/p2dev/bin/celery -A tasks worker -l debug -c 1
> sudo dtruss -p 75622 -f
......
92665/0xf94bb7: fcntl(0x5, 0x4, 0x4) = 0 0
92665/0xf94bb7: wait4(0x16A00, 0x7FFEEB4B0C10, 0x1) = 0 0
92665/0xf94bb7: wait4(0x16A00, 0x7FFEEB4B13E0, 0x0) = 92672 0
92665/0xf94bb7: close(0xB) = 0 0
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: sendto(0xD, 0x1068AF000, 0x15) = -1 Err#32
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/kombu/message.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/Users/ranc/p2dev/lib/python2.7/site-packages/kombu/message.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/kombu/message.py\0", 0x7FFEEB4AF680, 0x0) = 0 0
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/channel.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/channel.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/abstract_channel.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/abstract_channel.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/method_framing.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/method_framing.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: stat64("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/transport.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/Users/ranc/p2dev/lib/python2.7/site-packages/amqp/transport.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: mmap(0x0, 0x40000, 0x3, 0x1002, 0xFFFFFFFFFFFFFFFF, 0x0) = 0x1069D1000 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: stat64("/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py\0", 0x7FFEEB4AF280, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py\0", 0x0, 0x1B6) = 11 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AF0F8, 0x0) = 0 0
92665/0xf94bb7: fstat64(0xB, 0x7FFEEB4AD098, 0x0) = 0 0
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2175 (ID 945: syscall::read_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: close_nocancel(0xB) = 0 0
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: sendto(0xE, 0x1068F0000, 0x13) = 19 0
92665/0xf94bb7: recvfrom(0xE, 0x1069970E4, 0x7) = 7 0
92665/0xf94bb7: recvfrom(0xE, 0x1069970B4, 0x4) = 4 0
92665/0xf94bb7: recvfrom(0xE, 0x10676348C, 0x1) = 1 0
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: madvise(0x1068CF000, 0x21000, 0x9) = 0 0
92665/0xf94bb7: sendto(0xE, 0x1068F0000, 0x13) = 19 0
92665/0xf94bb7: recvfrom(0xE, 0x1069970E4, 0x7) = 7 0
92665/0xf94bb7: recvfrom(0xE, 0x1069970B4, 0x4) = 4 0
92665/0xf94bb7: recvfrom(0xE, 0x10676348C, 0x1) = 1 0
92665/0xf94bb7: shutdown(0xE, 0x2, 0x0) = 0 0
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: sendto(0xD, 0x1068AF000, 0x13) = -1 Err#32
92665/0xf94bb7: sendto(0xD, 0x1068AF000, 0x13) = -1 Err#32
92665/0xf94bb7: shutdown(0xD, 0x2, 0x0) = -1 Err#57
dtrace: error on enabled probe ID 2173 (ID 947: syscall::write_nocancel:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: select(0x4, 0x7FFEEB4A6E58, 0x7FFEEB4A6ED8, 0x7FFEEB4A6F58, 0x7FFEEB4A6FD8) = 0 0
92665/0xf94bb7: fcntl(0x4, 0x3, 0x0) = 1 0
92665/0xf94bb7: fcntl(0x4, 0x4, 0x1) = 0 0
dtrace: error on enabled probe ID 2172 (ID 161: syscall::write:return): invalid kernel access in action #13 at DIF offset 68
92665/0xf94bb7: lstat64("/var/folders/d9/xww6x1gj50j9769f0x47nw_h0000gp/T/pymp-kJBMDs\0", 0x7FFEEB4B34F0, 0x0) = 0 0
92665/0xf94bb7: open_nocancel("/var/folders/d9/xww6x1gj50j9769f0x47nw_h0000gp/T/pymp-kJBMDs\0", 0x1100004, 0x1213200) = 11 0
92665/0xf94bb7: fstatfs64(0xB, 0x7FFEEB4B2DB8, 0x0) = 0 0
92665/0xf94bb7: getdirentries64(0xB, 0x7FBE640A0A00, 0x1000) = 64 0
92665/0xf94bb7: getdirentries64(0xB, 0x7FBE640A0A00, 0x1000) = 0 0
92665/0xf94bb7: close_nocancel(0xB) = 0 0
92665/0xf94bb7: rmdir(0x7FBE61640B10, 0x0, 0x0) = 0 0
92665/0xf94bb7: sigaction(0x2, 0x7FFEEB4B4548, 0x7FFEEB4B4588) = 0 0
92665/0xf94bb7: sigaction(0x3, 0x7FFEEB4B4548, 0x7FFEEB4B4588) = 0 0
92665/0xf94bb7: sigaction(0xF, 0x7FFEEB4B4548, 0x7FFEEB4B4588) = 0 0
92665/0xf94bb7: sigaction(0x1E, 0x7FFEEB4B4548, 0x7FFEEB4B4588) = 0 0
92665/0xf94bb7: madvise(0x105752000, 0x3C000, 0x9) = -1 Err#22
92665/0xf94bb7: close(0xD) = 0 0
92665/0xf94bb7: madvise(0x1068AF000, 0x20000, 0x9) = 0 0
92665/0xf94bb7: madvise(0x106A11000, 0x23000, 0x9) = -1 Err#22
92665/0xf94bb7: close(0xE) = 0 0
92665/0xf94bb7: madvise(0x1068F0000, 0x20000, 0x9) = 0 0
92665/0xf94bb7: madvise(0x10486D000, 0x30000, 0x9) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C25000, 0x5000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C81000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C83000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C85000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C87000, 0x2000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C8A000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C8F000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C92000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C99000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62C9E000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CA1000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CA3000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CA6000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CAB000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CAE000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CB2000, 0x2000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CB6000, 0x4000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CBB000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CBE000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CEA000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: madvise(0x7FBE62CFB000, 0x1000, 0x7) = 0 0
92665/0xf94bb7: close(0x6) = 0 0
dtrace: error on enabled probe ID 2366 (ID 897: syscall::thread_selfid:return): invalid user access in action #5 at DIF offset 0
^@^@^@^@^@^C
ranc-a01:~ ranc$ lsof -p 92665
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 92665 ranc cwd DIR 1,4 2176 12342436 /Users/ranc/vsan_vsancertification/tools/graphrunner
python 92665 ranc txt REG 1,4 51744 12719059 /Users/ranc/p2dev/bin/python
python 92665 ranc txt REG 1,4 52768 378470 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_locale.so
python 92665 ranc txt REG 1,4 84240 378488 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
python 92665 ranc txt REG 1,4 83232 378468 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_json.so
python 92665 ranc txt REG 1,4 60192 378500 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
python 92665 ranc txt REG 1,4 71392 378444 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
python 92665 ranc txt REG 1,4 85136 378540 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
python 92665 ranc txt REG 1,4 108192 378532 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
python 92665 ranc txt REG 1,4 59984 378462 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
python 92665 ranc txt REG 1,4 48256 378458 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
python 92665 ranc txt REG 1,4 253840 378466 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
python 92665 ranc txt REG 1,4 73456 378552 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
python 92665 ranc txt REG 1,4 200880 378448 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_ctypes.so
python 92665 ranc txt REG 1,4 65952 378558 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
python 92665 ranc txt REG 1,4 88816 378494 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/array.so
python 92665 ranc txt REG 1,4 78896 378534 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/math.so
python 92665 ranc txt REG 1,4 58240 378460 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_hashlib.so
python 92665 ranc txt REG 1,4 52656 378478 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_random.so
python 92665 ranc txt REG 1,4 57040 378508 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
python 92665 ranc txt REG 1,4 74592 378446 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_csv.so
python 92665 ranc txt REG 1,4 43536 378526 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/grp.so
python 92665 ranc txt REG 1,4 65184 378562 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/zlib.so
python 92665 ranc txt REG 1,4 144736 378482 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_socket.so
python 92665 ranc txt REG 1,4 172352 378486 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_ssl.so
python 92665 ranc txt REG 1,4 44752 378480 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_scproxy.so
python 92665 ranc txt REG 1,4 43708 12722902 /Users/ranc/p2dev/lib/python2.7/site-packages/_scandir.so
python 92665 ranc txt REG 1,4 152272 378514 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/datetime.so
python 92665 ranc txt REG 1,4 65920 378550 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
python 92665 ranc txt REG 1,4 52048 378520 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
python 92665 ranc txt REG 1,4 67920 378476 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_multiprocessing.so
python 92665 ranc txt REG 1,4 146112 378506 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cPickle.so
python 92665 ranc txt REG 1,4 52048 378548 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/resource.so
python 92665 ranc txt REG 1,4 84000 378504 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/bz2.so
python 92665 ranc txt REG 1,4 43104 378430 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_bisect.so
python 92665 ranc txt REG 1,4 104256 378544 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/pyexpat.so
python 92665 ranc txt REG 1,4 25512 15160994 /Users/ranc/p2dev/lib/python2.7/site-packages/tornado/speedups.so
python 92665 ranc txt REG 1,4 138464 378452 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_curses.so
python 92665 ranc txt REG 1,4 1409984 378560 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/unicodedata.so
python 92665 ranc txt REG 1,4 27108 12722492 /Users/ranc/p2dev/lib/python2.7/site-packages/markupsafe/_speedups.so
python 92665 ranc txt REG 1,4 43588 14636845 /Users/ranc/p2dev/lib/python2.7/site-packages/_billiard.so
python 92665 ranc txt REG 1,4 65776 378536 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/mmap.so
python 92665 ranc txt REG 1,4 4096 20515699 /private/var/folders/d9/xww6x1gj50j9769f0x47nw_h0000gp/T/pymp-kJBMDs/pym-92665-St6myq
python 92665 ranc txt REG 1,4 841456 2716638 /usr/lib/dyld
python 92665 ranc txt REG 1,4 1172373504 13558190 /private/var/db/dyld/dyld_shared_cache_x86_64h
python 92665 ranc 0u CHR 16,8 0t1188168 1405 /dev/ttys008
python 92665 ranc 1u CHR 16,8 0t1188168 1405 /dev/ttys008
python 92665 ranc 2u CHR 16,8 0t1188168 1405 /dev/ttys008
python 92665 ranc 3 PIPE 0x75a36a27dcf37cad 16384 ->0x75a36a27dcf355ad
python 92665 ranc 4 PIPE 0x75a36a27dcf355ad 16384 ->0x75a36a27dcf37cad
python 92665 ranc 5 PIPE 0x75a36a27dcf3746d 16384 ->0x75a36a27dcf36dad
python 92665 ranc 6r CHR 14,1 0t15192 584 /dev/urandom
python 92665 ranc 7 PIPE 0x75a36a27dcf36dad 16384 ->0x75a36a27dcf3746d
python 92665 ranc 8u REG 1,4 4096 20515699 /private/var/folders/d9/xww6x1gj50j9769f0x47nw_h0000gp/T/pymp-kJBMDs/pym-92665-St6myq
python 92665 ranc 9u REG 1,4 4096 20515699 /private/var/folders/d9/xww6x1gj50j9769f0x47nw_h0000gp/T/pymp-kJBMDs/pym-92665-St6myq
python 92665 ranc 10r PSXSEM 0t0 /mp-r6SMtW
python 92665 ranc 11 PIPE 0x75a36a27dcf351ed 16384 ->0x75a36a27dcf36e6d
python 92665 ranc 12u systm 0x75a36a27dd0002cd 0t0 [ctl com.apple.netsrc id 8 unit 47]
python 92665 ranc 13u IPv4 0x75a36a280ae00f75 0t0 TCP 10.117.234.174:62477->prme-vsan-hol-vm-dhcp-218-6.eng.vmware.com:amqp (ESTABLISHED)
python 92665 ranc 14u IPv4 0x75a36a27fae89f75 0t0 TCP 10.117.234.174:62480->prme-vsan-hol-vm-dhcp-218-6.eng.vmware.com:amqp (ESTABLISHED)
python 92665 ranc 15u IPv4 0x75a36a27db0e88d5 0t0 TCP 10.117.234.174:62485->prme-vsan-hol-vm-dhcp-218-6.eng.vmware.com:amqp (ESTABLISHED)
python 92665 ranc 16 PIPE 0x75a36a27dcf351ed 16384 ->0x75a36a27dcf36e6d
anyone discovered a version w/o this bug w/ gevent
?
I am seeing this (or possibly a similar) issue using eventlet
.
Hi everyone!
+1 for celery 4.4.6.
I had this problem today) thanks to the comments above and careful research on my servers I found the issue)
the issue in RabbitMQ and Redis)
To solve it, we need to install HAProxy for RabbitMQ and Redis) unfortunately without it, they break the connection and we get this CRITICAL error) there is also additional information in the documentation)
https://www.rabbitmq.com/networking.html#proxy-effects
CELERY_BROKER_URL - RabbitMQ
CELERY_RESULT_BACKEND - Redis
I also configured the celery retry and other parameters so that the task is exactly executed because for me this data is important)
works with and without gevent) I hope this helps a lot)
celery is not the issue!
celery is not the issue!
Yes, Celery is the issue otherwise downgrading to Celery 3 wouldn't fix the problem.
@alanhamlett Hi)
Has been working with HAProxy for over 5 days and everything is fine)
I was getting this error all the time and my workers crashed in a few minutes)
I understand that this can be fixed in celery, but the fix can take a very long time) it is difficult to make the logic work like HAProxy)
and it is not necessary) HAProxy solves this problem perfectly)
+1 Same symptoms as @duydo
Celery 4.4.7
Not sure how this still hasn't been fixed.
39315 File "/usr/local/lib64/python3.6/site-packages/gevent/_socket3.py", line 534, in sendall
39316 return _socketcommon._sendall(self, data_memory, flags)
39317 File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 392, in _sendall
39318 timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
39319 File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 321, in __send_chunk
39320 data_sent += socket.send(chunk, flags)
39321 File "/usr/local/lib64/python3.6/site-packages/gevent/_socket3.py", line 515, in send
39322 return self._sock.send(data, flags)
39323 BrokenPipeError: [Errno 32] Broken pipe
getpid() = 6
epoll_wait(3, [{EPOLLIN, {u32=125, u64=266287972477}}], 262, 767) = 1
recvfrom(125, "\1\0\1\0\0\0#", 7, 0, NULL, NULL) = 7
recvfrom(125, "\0<\0<\6None12\0\0\0\0\0\0\4\220\0\rcelery.pidb"..., 35, 0, NULL, NULL) = 35
recvfrom(125, "\316", 1, 0, NULL, NULL) = 1
recvfrom(125, "\2\0\1\0\0\0G", 7, 0, NULL, NULL) = 7
recvfrom(125, "\0<\0\0\0\0\0\0\0\0\0c\370\0\20application/json\5"..., 71, 0, NULL, NULL) = 71
recvfrom(125, "\316", 1, 0, NULL, NULL) = 1
recvfrom(125, "\3\0\1\0\0\0c", 7, 0, NULL, NULL) = 7
recvfrom(125, "{\"method\": \"enable_events\", \"arg"..., 99, 0, NULL, NULL) = 99
recvfrom(125, "\316", 1, 0, NULL, NULL) = 1
getpid() = 6
write(2, "[2020-10-18 01:21:47,952: DEBUG/"..., 112) = 112
recvfrom(125, 0x7f8fbffe1078, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getpid() = 6
epoll_wait(3, [], 262, 108) = 0
getpid() = 6
epoll_wait(3, [], 262, 805) = 0
recvfrom(6, 0x7f8fbffe1de8, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getpid() = 6
epoll_wait(3, [], 262, 85) = 0
recvfrom(125, 0x7f8fbffe1d98, 7, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getpid() = 6
epoll_wait(3, ^Cstrace: Process 4844 detached
@yshaban Have you tried HAProxy?
@yshaban Have you tried HAProxy?
Stuck my stuff behind haproxy today, I'll let you know.
@yshaban Have you tried HAProxy?
As Rostislav suggested, creating a rabbitmq cluster and sticking it behind haproxy solved the issue for myself.
@yshaban nice!
This is only fixed when using haproxy, but the bug still exists when using Celery directly with RabbitMQ.
+1 celery v5.0.1
celery 4.3, haproxy sloves the problem锛寉ou guys should try.
Either make Haproxy required usage with Celery or re-open this issue.
An issue requires actionable items.
Since we don't know how to fix this and lack the environment to reproduce this issue I'd say that it should remain closed until someone who has access to an environment where this reproduces can provide some insight to what should be fixed or at least what is wrong.
I agree with others's there is obviously some root cause within celery that is causing this to happen and sticking it behind haproxy, while solving the issue, doesn't tackle the underlying cause.
I agree with others's there is obviously some root cause within celery that is causing this to happen and sticking it behind haproxy, while solving the issue, doesn't tackle the underlying cause.
my suspicion is that, as the amqp transport is not async, the connection might not live long.
people facing this can you check & try #6528
Most helpful comment
@auvipy please re-open. This is a bug... a hack doesn't qualify as a fix.