Pyzmq: IPC queue delay

Created on 28 Apr 2017 · 8Comments · Source: zeromq/pyzmq

Hello,

I am designing a new app which makes use of multiprocessing. From multiple sources I heard that zmq IPC may be faster than Python's Pipe (is that correct?).

Inside the program, there are two queues using zmq IPC whose logic is pretty simple: read -> process -> queue. If you would need more background on this, please let me know, I would be happy to expand.

What I have noticed is that even with a relatively low pace (100 messages per second) going through both queues, it takes about 14 seconds. The complete debug log is below:

2017-04-28 13:45:13,904,904 [napalm_logs.listener][DEBUG   ] Received <149>Mar 30 12:45:19 re0.edge01.bjm01 rpd[40158839]: BGP_PREFIX_THRESH_EXCEEDED: 172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master) from ('127.0.0.1', 63026)
2017-04-28 13:45:19,082,082 [napalm_logs.server][DEBUG   ] Dequeued message from 127.0.0.1: <149>Mar 30 12:45:19 re0.edge01.bjm01 rpd[40158839]: BGP_PREFIX_THRESH_EXCEEDED: 172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master)
2017-04-28 13:45:19,082,082 [napalm_logs.server][DEBUG   ] Identified OS: junos
2017-04-28 13:45:19,082,082 [napalm_logs.server][DEBUG   ] Queueing message to junos
2017-04-28 13:45:27,549,549 [napalm_logs.device][DEBUG   ] Received {'processId': '40158839', 'pri': '149', 'processName': 'rpd', 'host': 're0.edge01.bjm01', 'tag': 'BGP_PREFIX_THRESH_EXCEEDED', 'time': '12:45:19', 'date': 'Mar 30', 'message': '172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master)'} from 127.0.0.1
2017-04-28 13:45:27,549,549 [napalm_logs.device][DEBUG   ] Getting the YANG model binding
2017-04-28 13:45:27,549,549 [napalm_logs.device][DEBUG   ] Filling the OC model
2017-04-28 13:45:27,551,551 [napalm_logs.device][DEBUG   ] Generated OC object
2017-04-28 13:45:27,551,551 [napalm_logs.device][DEBUG   ] {'bgp': {'neighbors': {'neighbor': {u'172.17.17.1': {'state': {'peer_as': 123456L}, 'afi_safis': {'afi_safi': {u'inet': {'state': {'prefixes': {'received': 181L}}, 'afi_safi_name': u'inet', 'ipv4_unicast': {'prefix_limit': {'state': {'max_prefixes': 160L}}}}}}, 'neighbor_address': u'172.17.17.1'}}}}}
2017-04-28 13:45:27,551,551 [napalm_logs.device][DEBUG   ] Queueing to be published:
2017-04-28 13:45:27,551,552 [napalm_logs.device][DEBUG   ] {'host': 're0.edge01.bjm01', 'open_config': {'bgp': {'neighbors': {'neighbor': {u'172.17.17.1': {'state': {'peer_as': 123456L}, 'afi_safis': {'afi_safi': {u'inet': {'state': {'prefixes': {'received': 181L}}, 'afi_safi_name': u'inet', 'ipv4_unicast': {'prefix_limit': {'state': {'max_prefixes': 160L}}}}}}, 'neighbor_address': u'172.17.17.1'}}}}}, 'timestamp': '1490877919', 'error': 'BGP_PREFIX_THRESH_EXCEEDED', 'ip': '127.0.0.1', 'os': 'junos', 'model_name': 'openconfig_bgp', 'message_details': {'processId': '40158839', 'pri': '149', 'processName': 'rpd', 'host': 're0.edge01.bjm01', 'tag': 'BGP_PREFIX_THRESH_EXCEEDED', 'time': '12:45:19', 'date': 'Mar 30', 'message': '172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master)'}}

Very interesting that when there are 10k it takes 19 seconds, vs 14 when there are 100.

Looking straight at the first queue:

2017-04-28 13:45:13,904,904 [napalm_logs.listener][DEBUG   ] Received <149>Mar 30 12:45:19 re0.edge01.bjm01 rpd[40158839]: BGP_PREFIX_THRESH_EXCEEDED: 172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master) from ('127.0.0.1', 63026)

this is when the message is received and queued (code)

2017-04-28 13:45:19,082,082 [napalm_logs.server][DEBUG   ] Dequeued message from 127.0.0.1: <149>Mar 30 12:45:19 re0.edge01.bjm01 rpd[40158839]: BGP_PREFIX_THRESH_EXCEEDED: 172.17.17.1 (External AS 123456): Configured maximum prefix-limit threshold(160) exceeded for inet-unicast nlri: 181 (instance master)

this is when it's read from the queue (code)

Which introduces a considerable delay of about 6 seconds.

Please note that the SUB and the PUB are running in separate sub-processes started by the same parent. The app is using multiprocessing.

Is there anything I am doing wrong?

My versions:

libzmq-4.1.6
pyzmq-16.0.2
Python-2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2]

Thank you,
-Mircea

Source

mirceaulinic

Most helpful comment

I can't find a good doc to explain the proper bind/connect

Because of zmq's abstractions, every connection order and direction is valid. So it shouldn't be an issue for SUB to bind and PUB to connect. However, I have seen some issues mainly because this scenario is uncommon, and thus bugs are less likely to be found/prioritized. That said, it is rare that I would recommend using PUB-SUB if the receiver is the socket that binds.

Some tips about which socket should connect and which should bind:

if there is only one instance of one side, it should bind.
If a side has many instances, or they come and go often, it should connect.

Some tips for when to choose PUB-SUB:

Messages should be dropped if there are no SUB sockets connected and subscribed.
You have many (or sometimes zero) sockets receiving messages.

If the following conditions are met, PUB-SUB is probably not right for you:

messages should not be dropped
you have only one receiver
that receiver is ~always running

Often when I see PUB connecting and SUB binding, what makes more sense is the sink pattern, the bottom half of ventillator-sink in the Guide. That is, many PUSH sockets connecting to one PULL socket.

A PUSH-PULL sink has these differences from using PUB-SUB in the same arrangement:

messages will not be dropped
the receiver should always be running
if the receiver is not ready or not yet connected, messages will wait/block instead of being dropped.

Could you please share the implementation you used for the benchmark above?

Here's a gist that does a simple push/pull test. On my current machine (macOS 10.12.5, 4GHz Core i7, libzmq 4.1.6, pyzmq master, Python 3.5), I get ~1.3 million msgs/sec over tcp and ~1.6 million msgs/sec over ipc.

minrk on 25 Jul 2017

👍2

All 8 comments

I have switched to a PULL/PUSH pair rather than PUB/SUB, without seeing any improvements.

I think it would be also important to note that the message size varies between 200 and 300 bytes, being received over an UDP socket.
Reading more about IPC, my understanding that it's based on the unix pipe, hence the limit being the buffer size. According to https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer the buffer size is 65536 bytes (thus capable to store only 200-300 of the messages I send).

Switching to the multiptocessing Pipe, the performances have increased, the latency is _way_ lower, but I don't understand what's the difference. In particular, what's the difference between zmq IPC and multiprocessing Pipe? Could a maintainer please bring some light here and/or point me to the right docs?

Thanks,
-Mircea

mirceaulinic on 1 May 2017

@minrk does this sound familiar to you at all? Or I am doing something really dumb there that borked the performances?
How would you suggest to investigate?

Thanks,
-Mircea

mirceaulinic on 5 May 2017

I'm not sure what could precisely be the cause of a six second, but there may be a failure in libzmq to correctly wake the sub socket when the pub connects to it. Do you mainly see a delay for the first message after a socket connects, or continuous slow performance? Do you see a significant change if you switch to tcp on 127.0.0.1?

With a simple PUSH/PULL throughput test, I get a bit under one million 200B msgs/sec on ipc (macOS, libzmq 4.2.1, zmq 16.0.2).

minrk on 8 May 2017

Hi @minrk - much appreciated your answer.

Yes, I would have expected the same: to have a very high number of PPS. Just a quick question: do you think the size of the messages might affect the performances at all? Just a reminder (and trying to clarify and bring more background, without polluting this with unnecessary info):

There are two child processes (started with multiprocessing, from a different parent process):

listener: takes the messages from a UDP socket and pushes them on the IPC
server: pulls the messages from the IPC the does some parsing (which takes 1-2ms, if I recall correctly)

Do you see any potential culprit?
I am almost sure that something is wrong in my approach, but not finding the root cause. Any idea/indication would be very welcome.

Thanks,
-Mircea

mirceaulinic on 13 May 2017

When I've seen weird issues with ipc, it's sometimes been because two processes accidentally 'bind' to the same ipc interface, which doesn't work but won't raise errors. I have also seen issues with unusual bind/connect relationships such as SUB sockets binding. One thing to check is switching transport (try tcp, etc.) and see if things change.

minrk on 14 May 2017

👍1

One thing to check is switching transport (try tcp, etc.) and see if things change.

I did that, but I didn't see any improvement.

I have also seen issues with unusual bind/connect relationships such as SUB sockets binding.

it's very likely to be that. Unfortunately, I can't find a good doc to explain the proper bind/connect: http://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/patterns/pubsub.html says that the PUB executes the bind, while the SUB executes the connect. Looking at someone else's question (https://stackoverflow.com/questions/5060508/cant-get-zeromq-python-bindings-to-receive-messages-over-ipc), it looks like they did the opposite: PUB does the connect and the SUB does the bind. This is also the pattern you used in your example from https://github.com/zeromq/pyzmq/issues/710#issuecomment-291076906 -- which, apparently, doesn't seem to actually work.

With a simple PUSH/PULL throughput test, I get a bit under one million 200B msgs/sec on ipc (macOS, libzmq 4.2.1, zmq 16.0.2).

@minrk Could you please share the implementation you used for the benchmark above? It would help me too if you could point me to the right documentation; looking at the reference I linked above
vs. your example from the other thread, I'm a bit confused and not sure what's the right way.

Thank you!

mirceaulinic on 23 Jul 2017

I can't find a good doc to explain the proper bind/connect

Some tips about which socket should connect and which should bind:

if there is only one instance of one side, it should bind.
If a side has many instances, or they come and go often, it should connect.

Some tips for when to choose PUB-SUB:

Messages should be dropped if there are no SUB sockets connected and subscribed.
You have many (or sometimes zero) sockets receiving messages.

If the following conditions are met, PUB-SUB is probably not right for you:

messages should not be dropped
you have only one receiver
that receiver is ~always running

Often when I see PUB connecting and SUB binding, what makes more sense is the sink pattern, the bottom half of ventillator-sink in the Guide. That is, many PUSH sockets connecting to one PULL socket.

A PUSH-PULL sink has these differences from using PUB-SUB in the same arrangement:

messages will not be dropped
the receiver should always be running
if the receiver is not ready or not yet connected, messages will wait/block instead of being dropped.

Could you please share the implementation you used for the benchmark above?

minrk on 25 Jul 2017

👍2

Many thanks @minrk for the detailed answer, much appreciated your time!

I am going to close this, it is purely a design fault on my side, that I need to identify. I think your pointers already gave me enough input to know where to start looking from. When I will identify the culprit I will update this in case anyone else will need it later. Thanks once again!

mirceaulinic on 25 Jul 2017

Was this page helpful?

0 / 5 - 0 ratings