Pyzmq: zmq.Context() hangs

Created on 29 Aug 2018  路  13Comments  路  Source: zeromq/pyzmq

zmq.Context() hangs

Complete example

import zmq
zmq.Context()  # hangs

System and software info

>>> zmq.__version__
'17.0.0'
>>> zmq.zmq_version()
'4.2.5'

output of uname -a

Linux *** 2.6.32-696.23.1.el6.x86_64 #1 SMP Tue Mar 13 17:46:31 CDT 2018 x86_64 x86_64 x86_64 GNU/Linux

output of gcc --version

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

Python version 3.6.5, Anaconda version 5.1.0. Full list of packages: conda_list.txt

Output of print(sysconfig.get_config_vars()): python_config_vars2.txt

All 13 comments

That's unusual! What about upgrading pyzmq: conda upgrade pyzmq

I looked a bit deeper. This problem seems to be related to random number generation (see below).
The problem only appeared with pyzmq 17.0.0. With version 16.0.2, the code worked fine.

$ gdb python
[...]
(gdb) r
Starting program: /software/Anaconda3-5.1.0-el6-x86_64/bin/python 
[Thread debugging using libthread_db enabled]
[...]
>>> import zmq
>>> zmq.Context()
^C
Program received signal SIGINT, Interrupt.
0x00007ffff73b4348 in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64
(gdb) bt
#0  0x00007ffff73b4348 in poll () from /lib64/libc.so.6
#1  0x00007ffff48d6a90 in randombytes_sysrandom_init ()
   from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../.././libsodium.so.23
#2  0x00007ffff48d6bc9 in randombytes_sysrandom_stir ()
   from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../.././libsodium.so.23
#3  0x00007ffff48cb228 in randombytes_stir () from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../.././libsodium.so.23
#4  0x00007ffff48cc284 in sodium_init () from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../.././libsodium.so.23
#5  0x00007ffff4b3f6ca in zmq::random_open() () from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../../libzmq.so.5
#6  0x00007ffff4b18a2e in zmq::ctx_t::ctx_t() () from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../../libzmq.so.5
#7  0x00007ffff4b69ae3 in zmq_ctx_new () from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/../../../../../libzmq.so.5
#8  0x00007ffff3f3d35d in __pyx_tp_new_3zmq_7backend_6cython_7context_Context ()
   from /software/Anaconda3-5.1.0-el6-x86_64/lib/python3.6/site-packages/zmq/backend/cython/context.cpython-36m-x86_64-linux-gnu.so
[...]

I've certainly never seen zmq_ctx_new block before. @bluca any idea what calls in instantiating a zmq context could block?

That's in libsodium. Given it's running on RHEL 4 with kernel 2.6, which is ancient and EoL for more than 5 years, I assume libsodium doesn't support such an ancient and dead kernel anymore for the crypto primitives.

I strongly suspect that the libsodium initialization hangs on the getrandom() call. If the entropy pool of the kernel runs low, getrandom() blocks. I've checked the entropy (following the libsodium documentation). In agreement with the documentation, the initialization blocks and ZMQ hangs if there are less than 160 bytes of entropy available.

So is this the expected behaviour, i. e. is it my responsibility to make sure that enough entropy is available in order to be able to use ZMQ? Or is that a bug?

The funny thing is that I don't even need a secure connection. So probably if I could prevent initializing libsodium, this would solve the problem for me. Or is libsodium a vital dependency of ZMQ?

Let me add that likely there is also something wrong with the linux kernel that prevents it from collecting entropy. However I'm running ZMQ on a compute node on a scientific computing cluster. There are neither HIDs connected to that machine nor there is big network traffic. So possibly, the kernel just has a hard time gathering entropy.

You can compile the library yourself without curve support if you want, there's a build option for that.

Or you can try with daemon like jitterentropy-rngd to get more entropy at boot. But I'm surprised there is getrandom in kernel 2.6

Hi, we are seeing this very same issue but we are running pyzmq inside a container (Ubuntu 18.04) on various host with kernels 4.15.0 or so. We attempt to get the context on a newly forked sub-process.

>>> import zmq
>>> zmq.__version__
'17.1.2'
>>> zmq.zmq_version()
'4.2.5'

PyZMQ downloaded from PyPI as wheel.

Preliminar testing indicates that we were able to avoid the issue by compiling libzmq from source without curve and install pyzmq as no-binary.

In our Dockerfile:

RUN git clone https://github.com/zeromq/libzmq.git --branch v4.3.0 \
    && mkdir libzmq/build \
    && pushd libzmq/build \
    && cmake --disable-curve .. \
    && make -j $(python3 -c 'import multiprocessing; print(multiprocessing.cpu_count())') \
    && make install \
    && ldconfig \
    && popd \
    && rm -rf libzmq \
    && pip3 install pyzmq==17.1.2 --no-binary=pyzmq

Anyone else has experienced this and been able to find the root cause? I'm on Debian, and often hang on zmq.Context()

pyzmq==19.0.2 and zmq==4.3.2

I have not yet tried building libzmq from source but will try that next.

^C^C^CTraceback (most recent call last):
  File "server.py", line 5, in <module>
    context = zmq.Context()
  File "zmq/backend/cython/context.pyx", line 53, in zmq.backend.cython.context.Context.__cinit__
  File "zmq/backend/cython/checkrc.pxd", line 13, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt

For me the issue was that jupyter kernels were hanging during start at open("/dev/random", O_RDONLY) as reported in https://github.com/jupyter/help/issues/480#issuecomment-451353915

from jupyter_client.manager import start_new_kernel
km, kc = start_new_kernel()  # hangs
km.shutdown_kernel()

I confirm that the issue disappeared after compiling libzmq with --disable-curve as described in https://github.com/zeromq/pyzmq/issues/1224#issuecomment-444314061

As mentioned in https://github.com/ipython/ipykernel/issues/342#issuecomment-424478188, there is another workaround: make /dev/random inaccessible via chmod 640 /dev/random (works for me, without recompiling).

This is on a herokuish docker container (Dockerfile, based on Ubuntu 18.04), also using pyzmq==19.0.2 and zmq==4.3.2 (using pyzmq-19.0.2-cp37-cp37m-manylinux1_x86_64.whl).

Unfortunately, neither of those workarounds are adequate solutions for me, since I don't have control over the Docker container at that level when users are deploying apps via herokuish.

It would really be great, if there was a way to configure from the python side to e.g. use /dev/urandom instead of the blocking /dev/random.

P.S. My issue might be a slightly different one (e.g. a different function reading from /dev/random) than the originally posted one, since for me

import zmq
zmq.Context()

would not hang.

I observed that it often occurs in a docker container due to low entropy there when booting.

To solve the problem, this question along with all answers may be helpful.

I write the Dockerfile as

RUN yum install rng-tools -y
CMD ["/app/run.sh"]

And in "/app/run.sh" I add one line in the very beginning:

rngd -r /dev/urandom 

It works.

In fact, I am building an enclave image rather than docker image using AWS Nitro Enclave CLI, which does not support file mappings between container and host in run-time, neither does yum install haveged work with AWS's source. So I finally got the image running in such a way. But there are many elegant solutions in the question for references.

@fabian-paul Hi there, have you found a workaround for this issue? I have ran into exactly the same issue in a HPC environment.

I've been experiencing this issue from time to time, in particular now that everything we deploy is container based. In some situations we create our own container and we build PyZMQ from source as in my last comment. Other times the deployment could be really be simplified a lot using official Python containers, so we decided to just code some code that will check the Linux entropy level before starting PyZMQ, just as a measure to check that it won't deadlock, and die if necessary.

Similar to what is described here https://doc.libsodium.org/usage/#sodium_init-stalling-on-linux but In Python.

Snippet, if someone is interested:

from pathlib import Path
from time import monotonic

timeout_s = 60.0
entropytarget = 160
entropyfile = Path('/proc/sys/kernel/random/entropy_avail')
assert entropyfile.is_file()

start = monotonic()
while True:
    entropy = int(entropyfile.read_text().strip())
    if entropy > entropytarget:
        log.info(
            'Kernel available entropy at {}, which is > {}'.format(
                entropy,
                entropytarget,
            )
        )
        break

    if monotonic() > start + timeout_s:
        raise TimeoutError(
            'Available entropy never reached {} bytes or more '
            'after {:.2f}. Last read value was {}.'.format(
                entropytarget,
                timeout_s,
                entropy,
            )
        )

    log.warning(
        'Not enough entropy available in the Linux kernel, '
        'This Software requires at least {} bytes. '
        'Current value is {}. '
        'Startup will be delayed for 5 seconds in hope of gathering '
        'more entropy.'.format(
            entropytarget,
            entropy,
        )
    )
    sleep(5)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

f0t0n picture f0t0n  路  3Comments

Prokhozhijj picture Prokhozhijj  路  3Comments

mirceaulinic picture mirceaulinic  路  9Comments

AnthonyTheKoala picture AnthonyTheKoala  路  17Comments

marcociara379 picture marcociara379  路  6Comments