node-rdkafka 2.3.0 Segmentation fault

Created on 11 Mar 2018  路  17Comments  路  Source: Blizzard/node-rdkafka

Since 2.3.0, as soon as connect() method is called (from consumer or producer), a Segmentation fault occured.

Steps to reproduce using docker and e2e tests for brevity :

Dockerfile :

FROM node:9-stretch
RUN npm i node-rdkafka
WORKDIR /node_modules/node-rdkafka
RUN npm i
ENV KAFKA_HOST=localhost:29092

e2e tests run :

$ docker build -t rdkafka .
(...)

$ docker run --net host rdkafka make e2e
  Consumer
    commit
Makefile:63: recipe for target 'e2e' failed
make: *** [e2e] Segmentation fault (core dumped)

Most helpful comment

The problem here is that librdkafka built is not used inside node module, because node module links against system librdkafka, not the local librdkafka. And system's librdkafka is usually linked against OpenSSL 1.1 but node is built against OpenSSL 1.0, and this causes the segmentation fault. See PR #388 for the fix.

Also in Debian, you should make sure that you have libssl1.0-dev installed, not libssl-dev (1.1) so that npm builds librdkafka correctly against OpenSSL 1.0.

All 17 comments

Another try, same issue with node 8 on an up-to-date debian stretch :

producer.js :

const Kafka = require('node-rdkafka');
const producer = new Kafka.Producer({
  'metadata.broker.list': 'localhost:29092',
});
producer.connect();
producer.on('ready', () => console.log('producer ready'));
producer.on('event.error', err => console.error(err));
 ```
**test :**

$ uname -a
Linux laptop 4.9.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux

$ ldd --version
ldd (Debian GLIBC 2.24-11+deb9u3) 2.24

$ node --version
v8.10.0

$ npm i node-rdkafka
[....]

$ node producer.js
[1] 10260 segmentation fault node producer.js

$ rm -rf node_modules
$ npm i [email protected]
[...]

$ node producer.js
producer ready
```
Happy to help if I can provide more information

gdb session from last failure :
```
(gdb) run
Starting program: ~/.nvm/versions/node/v8.10.0/bin/node producer.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6b73700 (LWP 20874)]
[New Thread 0x7ffff6372700 (LWP 20875)]
[New Thread 0x7ffff5b71700 (LWP 20876)]
[New Thread 0x7ffff5370700 (LWP 20877)]
[New Thread 0x7ffff7ff4700 (LWP 20878)]
[New Thread 0x7fffe76e9700 (LWP 20879)]
[New Thread 0x7fffe6ee8700 (LWP 20880)]
[New Thread 0x7fffe66e7700 (LWP 20881)]
[New Thread 0x7fffe5ee6700 (LWP 20882)]

Thread 7 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe76e9700 (LWP 20879)]
__strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31
(gdb)

I also get segmentation fault, but when invoking the consumer.committed() method.

My Dockerfile starts from ubuntu, Node.js version is 9.

Can you . guys try this again on 2.3.1? I haven't tested it specifically for this issue but I have made a change to the build scripts.

Hi @webmakersteve @4ng3l0 and I are still having that some problem, both on the Ubuntu he has mentioned but also on certain versions on the Mac.

Could it be a problem of the librdkafka being used?

@webmakersteve now getting a missing symbol error when using 2.3.1 :

$ node producer.js
node: symbol lookup error: ./node_modules/node-rdkafka/build/Release/librdkafka++.so.1: undefined symbol: rd_kafka_conf_get

I have segfaults with 2.3.0 and 2.3.1 for debian 9.4 x64 and node.js 9.8.

Output of npm install for 2.2.3 and 2.3.1 here. There's a bunch of new build-time warnings for 2.3.1 but they sound harmless.

Approximate Dockerfile:

FROM debian:stretch

ENV LC_ALL C.UTF-8
ENV TERM xterm

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        curl \
        git \
        less \
        mercurial \
        nano \
        netcat \
        ssh-client \
        tcpdump \
        vim \
    && apt-get remove -y git-man \
    && apt-get autoremove -y \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

ENV NODE_PATH="/project:/usr/lib/node_modules"

#COPY .npmrc /root/

RUN curl -sL https://nodejs.org/dist/v9.8.0/node-v9.8.0-linux-x64.tar.gz -o /tmp/node.tar.gz \
    && mkdir -p /tmp/node \
    && tar -xzf /tmp/node.tar.gz -C /tmp/node --no-same-owner --strip-components=1 \
    && mv /tmp/node/bin/* /usr/bin/ \
    && rm /tmp/node.tar.gz \
    && rm -rf /tmp/node/

# this is just local copy of npm-install.sh with a fixed version number
COPY npm-install-5.sh /tmp/npm-install.sh
RUN chmod a+x  /tmp/npm-install.sh
RUN /tmp/npm-install.sh \
    && rm -rf /usr/lib/node_modules/npm/.nyc_output \
    && npm install -g nodemon \
    && npm cache clean --force \
    && mkdir -p /project


RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        librdkafka-dev librdkafka++1 librdkafka1 \
        libssl-dev liblz4-dev libsasl2-dev \
        libpthread-stubs0-dev \
        gcc g++ make build-essential pkg-config \
    && npm install -y -g node-gyp \
    && npm install -y -g --unsafe-perm [email protected] \
    && apt-get remove -y \
        gcc g++ make pkg-config \
        binutils cpp cpp-6 dpkg-dev g++-6 gcc-6 libasan3 libatomic1 \
        libc-dev-bin libc6-dev libcc1-0 libcilkrts5 libdpkg-perl libgcc-6-dev \
        libglib2.0-0 libgomp1 libisl15 libitm1 liblsan0 libmpc3 libmpfr4 \
        libmpx2 libquadmath0 libstdc++-6-dev libtsan0 libubsan0 linux-libc-dev \
        libsasl2-dev libpthread-stubs0-dev liblz4-dev \
        librdkafka-dev libssl-dev \
    && apt-get install -y libgomp1 \
    && apt-get autoremove -y \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

Segfaults every time I try to connect or create a stream.

On Fedora x86_64, with node 8.9.4 and suitable debug parts enabled (Ugh!):

~~~~
Thread 10 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe6108700 (LWP 3481)]
__strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31
31 movdqu (%rdi), %xmm1
(gdb) where

0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31

1 0x0000000000994b9a in lh_insert ()

2 0x00000000009a0081 in OBJ_NAME_add ()

3 0x00007fffe7938841 in ossl_init_ssl_base () at ssl/ssl_init.c:72

4 ossl_init_ssl_base_ossl_ () at ssl/ssl_init.c:25

5 0x00007ffff6ed4227 in __pthread_once_slow (once_control=0x7fffe7b77c3c , init_routine=0x7fffe7938600 ) at pthread_once.c:116

6 0x00007ffff6ed42e5 in __GI___pthread_once (once_control=once_control@entry=0x7fffe7b77c3c , init_routine=init_routine@entry=0x7fffe7938600 ) at pthread_once.c:143

7 0x00007fffe7d2d779 in CRYPTO_THREAD_run_once (once=once@entry=0x7fffe7b77c3c , init=init@entry=0x7fffe7938600 ) at crypto/threads_pthread.c:106

8 0x00007fffe793897b in OPENSSL_init_ssl (opts=opts@entry=2097154, settings=settings@entry=0x0) at ssl/ssl_init.c:227

9 0x00007ffff43fb13d in rd_kafka_transport_ssl_init () at rdkafka_transport.c:481

10 0x00007ffff43d811a in rd_kafka_global_cnt_incr () at rdkafka.c:134

11 rd_kafka_new (type=type@entry=RD_KAFKA_CONSUMER, app_conf=0x7fffd4000b90, errstr=errstr@entry=0x7fffe6107b70 "", errstr_size=errstr_size@entry=512) at rdkafka.c:1319

12 0x00007ffff46c0960 in RdKafka::KafkaConsumer::create (conf=, errstr="") at KafkaConsumerImpl.cpp:63

13 0x00007ffff48f036e in NodeKafka::KafkaConsumer::Connect() () from /home/andreas/project/collaborne-event-stream-library/node_modules/node-rdkafka/build/Release/node-librdkafka.node

14 0x00007ffff48fd847 in NodeKafka::Workers::KafkaConsumerConnect::Execute() () from /home/andreas/project/collaborne-event-stream-library/node_modules/node-rdkafka/build/Release/node-librdkafka.node

15 0x00007ffff48edcf2 in Nan::AsyncExecute(uv_work_s*) () from /home/andreas/project/collaborne-event-stream-library/node_modules/node-rdkafka/build/Release/node-librdkafka.node

16 0x000000000143c001 in worker (arg=) at ../deps/uv/src/threadpool.c:83

17 0x00007ffff6ecc50b in start_thread (arg=0x7fffe6108700) at pthread_create.c:465

18 0x00007ffff6c0416f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

~~~~

I'm using a non-SSL connection to a local kafka broker, but from what I understand this is generic SSL-support code initialization?

EDIT: On a whim I modified node_modules/node-rdkafka/configure in my project, added --disable-ssl, and executed npm rebuild. This fixes the immediate problem, and I can work again.

Maybe Openssl vs LibreSSL?

Good one, I think this is "standard OpenSSL":
~~
$ ldd node_modules/node-rdkafka/build/Release/librdkafka.so.1
linux-vdso.so.1 (0x00007fff681a5000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6a19ca1000)
libz.so.1 => /lib64/libz.so.1 (0x00007f6a19a8a000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f6a19602000)
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f6a19396000)
librt.so.1 => /lib64/librt.so.1 (0x00007f6a1918e000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6a18f8a000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6a18bd4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6a1a1a5000)
$ rpm -q --whatprovides /lib64/libcrypto.so.1.1
openssl-devel-1.1.0g-1.fc27.x86_64
$ rpm -q openssl-devel-1.1.0g-1.fc27.x86_64
Name : openssl-devel
Epoch : 1
Version : 1.1.0g
Release : 1.fc27
Architecture: x86_64
Install Date: Fri 17 Nov 2017 11:37:47 AM CET
Group : Development/Libraries
Size : 3066371
License : OpenSSL
Signature : RSA/SHA256, Mon 13 Nov 2017 08:47:30 AM CET, Key ID f55e7430f5282ee4
Source RPM : openssl-1.1.0g-1.fc27.src.rpm
Build Date : Mon 06 Nov 2017 09:26:11 AM CET
Build Host : buildvm-18.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager : Fedora Project
Vendor : Fedora Project
URL : http://www.openssl.org/
Summary : Files for development of applications which will use OpenSSL
Description :
OpenSSL is a toolkit for supporting cryptography. The openssl-devel
package contains include files needed to develop applications which
support various cryptographic algorithms and protocols.
$ rpm -q --whatprovides which openssl
openssl-1.1.0g-1.fc27.x86_64
$ openssl version -a
OpenSSL 1.1.0g-fips 2 Nov 2017
built on: reproducible build, date unspecified
platform: linux-x86_64
compiler: gcc -DZLIB -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DPURIFY -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config" -DOPENSSLDIR="\"/etc/pki/tls\"" -DENGINESDIR="\"/usr/lib64/engines-1.1\"" -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -Wa,--noexecstack
OPENSSLDIR: "/etc/pki/tls"
ENGINESDIR: "/usr/lib64/engines-1.1"
engines: dynamic
~
~

The problem here is that librdkafka built is not used inside node module, because node module links against system librdkafka, not the local librdkafka. And system's librdkafka is usually linked against OpenSSL 1.1 but node is built against OpenSSL 1.0, and this causes the segmentation fault. See PR #388 for the fix.

Also in Debian, you should make sure that you have libssl1.0-dev installed, not libssl-dev (1.1) so that npm builds librdkafka correctly against OpenSSL 1.0.

The problem here is that librdkafka built is not used inside node module, because node module links against system librdkafka, not the local librdkafka.

This doesn't seem to be the case for me: I don't have a system librdkafka. To be sure I also checked by:

  1. Clone node-rdkafka at 75cc037b9e8f0b8dc2315bdc98d28d167ddd71ff
  2. npm install && npm rebuild && npm link
  3. npm link node-rdkafka in my project that I can reproduce the segfaults with
  4. Start the project, and observe the same segfaults happening

The change from #388 looks allright, although the --rpath thing should have ensured that to happen anyways -- ldd node-rdkafka.node doesn't change with or without that change:
~~
$ objdump -x ./build/Release/node-librdkafka.node | grep RPATH
RPATH /home/andreas/project/node-rdkafka/build/deps
$ ldd ./build/Release/node-librdkafka.node
linux-vdso.so.1 (0x00007ffe4f3fb000)
librdkafka.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka.so.1 (0x00007f7c333ab000)
librdkafka++.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka++.so.1 (0x00007f7c3318e000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7c32e07000)
libm.so.6 => /lib64/libm.so.6 (0x00007f7c32abc000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7c328a5000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7c32687000)
libc.so.6 => /lib64/libc.so.6 (0x00007f7c322d1000)
libz.so.1 => /lib64/libz.so.1 (0x00007f7c320ba000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f7c31c32000)
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f7c319c6000)
librt.so.1 => /lib64/librt.so.1 (0x00007f7c317be000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f7c315ba000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7c338d1000)
~
~

Without d9c75d9:
~~
$ objdump -x ./build/Release/node-librdkafka.node | grep RPATH
RPATH /home/andreas/project/node-rdkafka/build/deps
$ ldd ./build/Release/node-librdkafka.node
linux-vdso.so.1 (0x00007ffc7abfd000)
librdkafka++.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka++.so.1 (0x00007f8b37cc9000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8b37942000)
libm.so.6 => /lib64/libm.so.6 (0x00007f8b375f7000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8b373e0000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8b371c2000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8b36e0c000)
librdkafka.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka.so.1 (0x00007f8b36b26000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8b38126000)
libz.so.1 => /lib64/libz.so.1 (0x00007f8b3690f000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f8b36487000)
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f8b3621b000)
librt.so.1 => /lib64/librt.so.1 (0x00007f8b36013000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f8b35e0f000)
~
~

And system's librdkafka is usually linked against OpenSSL 1.1 but node is built against OpenSSL 1.0, and this causes the segmentation fault.

So, looked further, and can definitely confirm that part:
~~
$ node -pe process.versions
{ http_parser: '2.8.0',
node: '8.11.1',
v8: '6.2.414.50',
uv: '1.19.1',
zlib: '1.2.11',
ares: '1.10.1-DEV',
modules: '57',
nghttp2: '1.25.0',
openssl: '1.0.2o',
icu: '60.1',
unicode: '10.0',
cldr: '32.0',
tz: '2017c' }
~
~

Question therefore becomes: How do I make the node process agree with the librdkafka SSL version? Seems I have both versions available:
~~
$ ls -l /lib64/libssl*
lrwxrwxrwx. 1 root root 16 Mar 29 18:32 /lib64/libssl.so -> libssl.so.1.1.0h
lrwxrwxrwx. 1 root root 16 Nov 13 12:52 /lib64/libssl.so.10 -> libssl.so.1.0.2m
-rwxr-xr-x. 1 root root 448640 Nov 13 12:52 /lib64/libssl.so.1.0.2m
lrwxrwxrwx. 1 root root 16 Mar 29 18:32 /lib64/libssl.so.1.1 -> libssl.so.1.1.0h
-rwxr-xr-x. 1 root root 451880 Mar 29 18:32 /lib64/libssl.so.1.1.0h
~
~

So, poking further through NodeJS: Essentially a third-party module cannot really link to OpenSSL without the danger of that conflicting in some way with whatever NodeJS comes bundled with.

How do I make the node process agree with the librdkafka SSL version?

Actually this is "reverse", instead I needed to get librdkafka/node-rdkafka agree with the version used by NodeJS. On Fedora this required to remove the openssl-devel package, and instead installing the compat-openssl10-devel package:
~~
$ sudo dnf install compat-openssl10-devel.x86_64 --allowerasing
$ rm -rf build && npm install # rebuild node-rdkafka
$ ldd build/Release/node-librdkafka.node
linux-vdso.so.1 (0x00007ffcc358e000)
librdkafka.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka.so.1 (0x00007f72323d6000)
librdkafka++.so.1 => /home/andreas/project/node-rdkafka/build/deps/librdkafka++.so.1 (0x00007f72321b9000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7231e32000)
libm.so.6 => /lib64/libm.so.6 (0x00007f7231ae7000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f72318d0000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f72316b2000)
libc.so.6 => /lib64/libc.so.6 (0x00007f72312fc000)
libz.so.1 => /lib64/libz.so.1 (0x00007f72310e5000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f7230c87000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007f7230a1b000)
librt.so.1 => /lib64/librt.so.1 (0x00007f7230813000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f723060f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f72328fc000)
~
~

With this setup my reproducible segfaults go away after running npm rebuild node-rdkafka in the affected projects.

Maybe this could be added as a requirement or such in the README file here?

@ankon
These directions worked for me as well, on Fedora 27. Thanks!

I am getting continuous SIGSEGV error whenever the connect method is called. I installed all the libraries mentioned here https://github.com/Blizzard/node-rdkafka/blob/master/examples/docker-alpine.md in my docker file, but still issue persists

Was this page helpful?
0 / 5 - 0 ratings

Related issues

michallevin picture michallevin  路  5Comments

idangozlan picture idangozlan  路  3Comments

jacob-bennett picture jacob-bennett  路  4Comments

JaapRood picture JaapRood  路  3Comments

natemccallum picture natemccallum  路  5Comments