Node: zlib Illegal Instruction on CPU without SSSE3 support

Created on 29 Mar 2020  Â·  19Comments  Â·  Source: nodejs/node

  • Node.js Version: 13 (works on 12) from nodesource ubuntu repos
  • OS: Ubuntu 16.04
  • Scope (install, code, runtime, meta, other?): npm install

Running npm i for the following package.json:

  "devDependencies": {
    "@types/react": "^16.9.26",
    "@types/react-dom": "^16.9.5",
    "@types/styled-components": "^5.0.1",
    "parcel": "^1.12.4",
    "typescript": "^3.8.3"
  },
  "dependencies": {
    "react": "^16.13.1",
    "react-dom": "^16.13.1",
    "styled-components": "^5.0.1"
  }

gives Illegal Instruction. In gdb, the backtrace is

(gdb) bt
#0  0x000000000132a53f in fill_window ()
nodejs/help#1  0x000000000132bdf0 in deflate_slow ()
nodejs/help#2  0x000000000132d06f in deflate ()
nodejs/help#3  0x0000000000ac098c in node::(anonymous namespace)::ZlibContext::DoThreadPoolWork() ()
nodejs/help#4  0x000000000134170e in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
nodejs/help#5  0x00007ffff711c6ba in start_thread (arg=0x7fffde7fc700) at pthread_create.c:333
nodejs/help#6  0x00007ffff6e5241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

With specifically >│0x132a53f <fill_window+1007> pshufb 0xebc708(%rip),%xmm0 # 0x21e6c50 being the issue, where pshufb is an SSSE3 instruction not supported by the CPU

zlib

Most helpful comment

The portable C code should work fine without SSSE3 flags.

Chromium has SSE2 as minimal requirements (https://support.google.com/chrome/a/answer/7100626?hl=en), so crashing on a Xeon without SSSE3 is not acceptable.

@addaleax yes, we are moving towards using __attribute__(target()) (e.g. https://cs.chromium.org/chromium/src/third_party/zlib/contrib/optimizations/insert_string.h?l=23) which I believe is better than using explicit compiler switches.

IIRC, node has its own build system so it should be simple to fix (i.e. don't pass the compiler flag to deflate.c).

If that requires further changes (i.e. moving code around), we can coordinate the work to make sure that upstream (Chromium zlib) adopts the same fixes.

I will have breakfast and start looking on which changes would be required on Chromium side.

All 19 comments

@mscdex This was almost certainly introduced in https://github.com/nodejs/node/pull/31201/.

That is a bit odd, since we guard the optimization on a flag that will be set only if the CPU supports the required instructions (https://cs.chromium.org/chromium/src/third_party/zlib/deflate.c?l=1525).

It seems that SSE3 was introduced in early 2004 (https://en.wikipedia.org/wiki/SSE3), or 16 years ago.

Unfortunately I don't have any hardware available without SSE3 to debug.

Could you provide further details about the environment? (i.e. CPU model, etc)

@Adenilson It looks like it’s fill_window_c, the plain C version of fill_window, and not fill_window_sse that’s leading to the pshufb being generated. It’s a bit ironic, but I think it makes sense given that the file is compiled with -mssse3 – that leaves the compiler the option to use SSSE3, whether it’s used through intrinsics or not.

I would have assumed that instead of passing compiler flags, the optimized zlib code uses __attribute__((target(...))) to specify that a function may contain code for certain CPU features. Is changing that an option for you?

(Also, minor side note just so we’re on the same page: SSSE3 !== SSE3.)

That makes sense, the 7100 series were probably the last Xeons without SSSE3 and SSE4.2 support.

I'm not sure what to do here. This kind of breakage in a release line isn't great but on the other hand, we're talking 14 year old systems here.

I'm inclined to mark the PR as "don't backport to the LTS branches" (edit: or remove the -mssse3 -msse4.2 -mpclmul flags when backporting) and leave it at that. Thoughts?

Agreed @bnoordhuis, even though we don't have any guarantees on which CPUs are supported within a major release (at least not according to https://github.com/nodejs/node/blob/master/BUILDING.md), we should prevent this breaking change at least on LTS.

or remove the -mssse3 -msse4.2 -mpclmul flags when backporting

We should test this firsts to make sure it works. Any tips on how @Strum355 can test it?

A docker image would probably be a convenient way for me to test this if that suits :)

The portable C code should work fine without SSSE3 flags.

Chromium has SSE2 as minimal requirements (https://support.google.com/chrome/a/answer/7100626?hl=en), so crashing on a Xeon without SSSE3 is not acceptable.

@addaleax yes, we are moving towards using __attribute__(target()) (e.g. https://cs.chromium.org/chromium/src/third_party/zlib/contrib/optimizations/insert_string.h?l=23) which I believe is better than using explicit compiler switches.

IIRC, node has its own build system so it should be simple to fix (i.e. don't pass the compiler flag to deflate.c).

If that requires further changes (i.e. moving code around), we can coordinate the work to make sure that upstream (Chromium zlib) adopts the same fixes.

I will have breakfast and start looking on which changes would be required on Chromium side.

Would be ok to share a test binary here so we validate if it is crashing (or not) on a machine without SSSE3?

Or maybe someone could provide me remote access to such machine?

Fwiw, here’s the somewhat straightforward diff I put together for testing this:

diff in the fold

diff --git a/deps/zlib/adler32_simd.c b/deps/zlib/adler32_simd.c
index 1354915cc099..0170c8df870d 100644
--- a/deps/zlib/adler32_simd.c
+++ b/deps/zlib/adler32_simd.c
@@ -53,6 +53,7 @@

 #include <tmmintrin.h>

+__attribute__((target("ssse3")))
 uint32_t ZLIB_INTERNAL adler32_simd_(  /* SSSE3 */
     uint32_t adler,
     const unsigned char *buf,
diff --git a/deps/zlib/crc32_simd.c b/deps/zlib/crc32_simd.c
index c8e5592f38ef..a77d629f2288 100644
--- a/deps/zlib/crc32_simd.c
+++ b/deps/zlib/crc32_simd.c
@@ -21,6 +21,7 @@
 #include <smmintrin.h>
 #include <wmmintrin.h>

+__attribute__((target("sse4.2,pclmul")))
 uint32_t ZLIB_INTERNAL crc32_sse42_simd_(  /* SSE4.2+PCLMUL */
     const unsigned char *buf,
     z_size_t len,
diff --git a/deps/zlib/crc_folding.c b/deps/zlib/crc_folding.c
index 48d77744aaf4..24621d6c5449 100644
--- a/deps/zlib/crc_folding.c
+++ b/deps/zlib/crc_folding.c
@@ -39,6 +39,7 @@
         _mm_storeu_si128((__m128i *)s->crc0 + 4, xmm_crc_part);\
     } while (0);

+__attribute__((target("sse4.2,pclmul")))
 ZLIB_INTERNAL void crc_fold_init(deflate_state *const s)
 {
     CRC_LOAD(s)
@@ -53,6 +54,7 @@ ZLIB_INTERNAL void crc_fold_init(deflate_state *const s)
     s->strm->adler = 0;
 }

+__attribute__((target("sse4.2,pclmul")))
 local void fold_1(deflate_state *const s,
         __m128i *xmm_crc0, __m128i *xmm_crc1,
         __m128i *xmm_crc2, __m128i *xmm_crc3)
@@ -79,6 +81,7 @@ local void fold_1(deflate_state *const s,
     *xmm_crc3 = _mm_castps_si128(ps_res);
 }

+__attribute__((target("sse4.2,pclmul")))
 local void fold_2(deflate_state *const s,
         __m128i *xmm_crc0, __m128i *xmm_crc1,
         __m128i *xmm_crc2, __m128i *xmm_crc3)
@@ -113,6 +116,7 @@ local void fold_2(deflate_state *const s,
     *xmm_crc3 = _mm_castps_si128(ps_res31);
 }

+__attribute__((target("sse4.2,pclmul")))
 local void fold_3(deflate_state *const s,
         __m128i *xmm_crc0, __m128i *xmm_crc1,
         __m128i *xmm_crc2, __m128i *xmm_crc3)
@@ -153,6 +157,7 @@ local void fold_3(deflate_state *const s,
     *xmm_crc3 = _mm_castps_si128(ps_res32);
 }

+__attribute__((target("sse4.2,pclmul")))
 local void fold_4(deflate_state *const s,
         __m128i *xmm_crc0, __m128i *xmm_crc1,
         __m128i *xmm_crc2, __m128i *xmm_crc3)
@@ -219,6 +224,7 @@ local const unsigned zalign(32) pshufb_shf_table[60] = {
    0x0201008f,0x06050403,0x0a090807,0x0e0d0c0b  /* shl  1 (16 -15)/shr15*/
 };

+__attribute__((target("sse4.2,pclmul")))
 local void partial_fold(deflate_state *const s, const size_t len,
         __m128i *xmm_crc0, __m128i *xmm_crc1,
         __m128i *xmm_crc2, __m128i *xmm_crc3,
@@ -269,6 +275,7 @@ local void partial_fold(deflate_state *const s, const size_t len,
     *xmm_crc3 = _mm_castps_si128(ps_res);
 }

+__attribute__((target("sse4.2,pclmul")))
 ZLIB_INTERNAL void crc_fold_copy(deflate_state *const s,
         unsigned char *dst, const unsigned char *src, long len)
 {
@@ -425,6 +432,7 @@ local const unsigned zalign(16) crc_mask2[4] = {
     0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF
 };

+__attribute__((target("sse4.2,pclmul")))
 unsigned ZLIB_INTERNAL crc_fold_512to32(deflate_state *const s)
 {
     const __m128i xmm_mask  = _mm_load_si128((__m128i *)crc_mask);
diff --git a/deps/zlib/zlib.gyp b/deps/zlib/zlib.gyp
index c73b9adba902..6f5a8ce3464b 100644
--- a/deps/zlib/zlib.gyp
+++ b/deps/zlib/zlib.gyp
@@ -91,20 +91,6 @@
                 'x86.c',
               ],
               'conditions': [
-                ['OS!="win" or llvm_version!="0.0"', {
-                  'cflags': [
-                    '-mssse3',
-                    '-msse4.2',
-                    '-mpclmul',
-                  ],
-                  'xcode_settings': {
-                    'OTHER_CFLAGS': [
-                      '-mssse3',
-                      '-msse4.2',
-                      '-mpclmul',
-                    ],
-                  },
-                }],
                 ['target_arch=="x64"', {
                   'defines': [ 'INFLATE_CHUNK_READ_64LE' ],
                 }],

I assume that would need to be polished up a bit, but I can upload the node binary that I’m getting from it, maybe that helps: https://addaleax.net/node-32553.gz

@addaleax any chance that could be built with glibc 2.23? Ubuntu 16.04 only provides 2.23 in the repos, upgrading to 2.31 fails because locales requires a higher libc-bin version, which requires a higher libc6 version which requires a higher locales version...

@Strum355 I … probably don’t want to downgrade glibc on my laptop. :sweat_smile:

It probably doesn’t make sense to put a test build together from this, but I can see if I could build that patch on one of the Ubuntu 16.04 CI machines.

Upon a closer look, it doesn't seem to be an issue upstream.

We have separated modules with their own flags, for optimized modules (https://cs.chromium.org/chromium/src/third_party/zlib/BUILD.gn?l=201) and vanilla ones (https://cs.chromium.org/chromium/src/third_party/zlib/BUILD.gn?l=246).

My GYP-fu is a bit rusty, but my guess is that you shouldn't use this compiler switches (https://github.com/nodejs/node/blob/master/deps/zlib/zlib.gyp#L94) on the portable code.

While we work upstream towards removing compiler switches, I guess the quick fix (without diverging from upstream) for Node would be to follow the same build strategy as Chromium (i.e. use source modules with the same compiler switches).

Does that sound reasonable?

Yeah, if somebody from @nodejs/build-files knows how to specify build flags only for specific files, I guess that works too.

@addaleax had the better idea of just running an ubuntu:18.04 container with glib 2.27, the binary you provided didnt SIGILL :+1:

Yeah, if somebody from @nodejs/build-files knows how to specify build flags only for specific files, I guess that works too.

I think we'd have to split the files into a new static library target and set the flags for that.

cc @nodejs/gyp

That's right. cflags are per-target.

Yeah, if somebody from @nodejs/build-files knows how to specify build flags only for specific files, I guess that works too.

Excelent! Please feel free to reach out if you guys run into any unexpected issue.
:-)

Opened https://github.com/nodejs/node/pull/32627 which is mostly the same as my patch above, since that has been confirmed to work for two people now. If somebody comes up with a gyp-only solution, great, but I’d like to get this resolved in time for v14.0.0 one way or another.

Was this page helpful?
0 / 5 - 0 ratings