Pulsar: sql worker on Ubuntu 18: libprocname.so failing, and failing JVM vendor check

Created on 11 Oct 2019  路  8Comments  路  Source: apache/pulsar

Describe the bug

On Ubuntu 18.04, using the pulsar 2.4.1 binary distribution, the sql worker doesn't start.

After fixing python/python3 (#5369), an error is returned about the ELF format of libprocname.so. file and nm say that the file is corrupt.

To Reproduce

$ bin/pulsar sql-worker run
ERROR: ld.so: object '/home/ubuntu/apache-pulsar-2.4.1/lib/presto/bin/procname/Linux-x86_64/libprocname.so' from LD_PRELOAD cannot be preloaded (ELF file's phentsize not the expected size): ignored.
Presto requires an Oracle or OpenJDK JVM (found Private Build)
$ file lib/presto/bin/procname/Linux-x86_64/libprocname.so
lib/presto/bin/procname/Linux-x86_64/libprocname.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), corrupted program header size, corrupted section header size
$ nm -D --defined-only lib/presto/bin/procname/Linux-x86_64/libprocname.so
nm: lib/presto/bin/procname/Linux-x86_64/libprocname.so: File format not recognized
$ ls -l lib/presto/bin/procname/Linux-x86_64/libprocname.so
-rw-r--r-- 1 ubuntu ubuntu 4700 Aug 28 09:06 lib/presto/bin/procname/Linux-x86_64/libprocname.so
$ shasum lib/presto/bin/procname/Linux-x86_64/libprocname.so
5849887050c21d07eb87b4cd513e9135de20942f  lib/presto/bin/procname/Linux-x86_64/libprocname.so

Desktop (please complete the following information):

  • OS: Ubuntu 18.04, openjdk-8-jre-headless:amd64 8u222-b10-1ubuntu1~18.04.1

Additional context
I have checked that the extracted file I have matches the one in the tarball:

$ tar -tvzf apache-pulsar-2.4.1-bin.tar.gz | grep procname
drwxr-xr-x jia/staff         0 2019-08-28 09:06 apache-pulsar-2.4.1/lib/presto/bin/procname/
drwxr-xr-x jia/staff         0 2019-08-28 09:06 apache-pulsar-2.4.1/lib/presto/bin/procname/Linux-x86_64/
drwxr-xr-x jia/staff         0 2019-08-28 09:06 apache-pulsar-2.4.1/lib/presto/bin/procname/Linux-ppc64le/
-rw-r--r-- jia/staff      4700 2019-08-28 09:06 apache-pulsar-2.4.1/lib/presto/bin/procname/Linux-x86_64/libprocname.so
-rw-r--r-- jia/staff     70397 2019-08-28 09:06 apache-pulsar-2.4.1/lib/presto/bin/procname/Linux-ppc64le/libprocname.so

4700 bytes does seem rather small for an .so, but not impossibly so. Indeed, if I separately download presto-server-3.2.0.tar.gz (895MB!) it contains a smaller file:

$ tar -tvzf presto-server-320.tar.gz | grep \\.so
-rw-r--r--  0 0      0        4144  6 Oct 20:27 presto-server-320/bin/procname/Linux-x86_64/libprocname.so
-rw-r--r--  0 0      0       69576  6 Oct 20:27 presto-server-320/bin/procname/Linux-ppc64le/libprocname.so

If I examine this file, it is not considered corrupt by Ubuntu:

$ file libprocname.so
libprocname.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped
$ nm -D --defined-only libprocname.so
0000000000200838 A __bss_start
0000000000200838 A _edata
0000000000200848 A _end
00000000000005c8 T _fini
0000000000000428 T _init
$

If I use this to replace the one supplied by pulsar, the original ELF message goes away, but the sql worker still won't start, continuing to complain about JVM version.

$ bin/pulsar sql-worker run
Presto requires an Oracle or OpenJDK JVM (found Private Build)

However, I am using OpenJDK:

$ dpkg-query -l | grep openjdk
ii  openjdk-8-jre-headless:amd64   8u222-b10-1ubuntu1~18.04.1         amd64        OpenJDK Java runtime, using Hotspot JIT (headless)

That error message appears in older versions of Presto source, but was removed in this commit in Aug 2018. Could it simply be that pulsar is bundling an obsolete version of presto?

I believe the source of the "Private Build" text is here:

$ java -XshowSettings 2>&1 | grep vendor
    java.specification.vendor = Oracle Corporation
    java.vendor = Private Build
    java.vendor.url = http://java.oracle.com/
    java.vendor.url.bug = http://bugreport.sun.com/bugreport/
    java.vm.specification.vendor = Oracle Corporation
    java.vm.vendor = Private Build
triagweek-43 typbug

Most helpful comment

I got it working with OpenJDK 8 by running
./bin/pulsar sql-worker run -D "java.vendor"="Oracle Corporation"
Also tested same with OpenJDK 11 but it failed with unrelated nullpointer exception.

All 8 comments

Could it simply be that pulsar is bundling an obsolete version of presto?

I think that's it:

-rw-r--r-- jia/staff   6858764 2018-09-12 09:54 apache-pulsar-2.4.1/lib/presto/lib/presto-main-0.206.jar

Presto 0.206 was tagged on Jul 17 2018. Current version is 0.226 (Sep 22 2019).

@candlerb the presto version was just updated recently in master.

the presto version was just updated recently in master.

Thanks!

I can't see the commit which does this, and pulsar-sql/presto-distribution/pom.xml still says <presto.version>0.206</presto.version>, but I'll test again when the next pulsar release comes out.

@candlerb ah sorry. the pull request is not merged yet. https://github.com/apache/pulsar/pull/5386

does it fixed? i download the binary file pulsar 2.4.1 from offical page: https://pulsar.apache.org/download/ and i checked the presto version is still 0.206???

@dramaPainter: pulsar 2.4.1 includes presto 0.206.

Note that #5386 was closed without merging, and git head https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-distribution/pom.xml still fetches presto 0.206.

Therefore, unless this changes, the problem will still be in 2.4.2/2.5.0 when they are released.

didn't realize #5386 was closed. we will pick it up again.

I got it working with OpenJDK 8 by running
./bin/pulsar sql-worker run -D "java.vendor"="Oracle Corporation"
Also tested same with OpenJDK 11 but it failed with unrelated nullpointer exception.

Was this page helpful?
0 / 5 - 0 ratings