I'm getting a segfault when I run wasm-pack with no arguments. I've made some small non-unsafe changes to the codebase.
Sometimes, instead of a segfault, I get a failed assertion in curl-sys:
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
left: `2`,
right: `0`', **/home/.cargo/registry/src/github.com-1ecc6299db9ec823/curl-0.4.20/src/lib.rs:92:13**
which is here: curl assert fail.
Here is the stack trace from the segfault core dump
May 19 15:44:30 systemd-coredump[11016]: Process 11013 (wasm-pack) of user 1000 dumped core.
Stack trace of thread 11014:
#0 0x00007fb5df46836d __pthread_rwlock_unlock (libpthread.so.0)
#1 0x00007fb5df17e3ba CRYPTO_THREAD_unlock (libcrypto.so.1.1)
#2 0x00007fb5df0f1e07 n/a (libcrypto.so.1.1)
#3 0x00007fb5df0f21c8 ERR_load_strings_const (libcrypto.so.1.1)
#4 0x00007fb5df679aea ERR_load_SSL_strings (libssl.so.1.1)
#5 0x00007fb5df679b1a n/a (libssl.so.1.1)
#6 0x00007fb5df46a5cf __pthread_once_slow (libpthread.so.0)
#7 0x00007fb5df17e40a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#8 0x00007fb5df679f24 OPENSSL_init_ssl (libssl.so.1.1)
#9 0x0000560063ae2fff n/a (/home/devel/non-work/wasm-pack/target/debug/wasm-pack
wasm-pack (e.g. cargo run)wasm-pack version: latest git
rustc version: 1.34.2
Update: I've updated my system, rebooted my computer and done a full rebuild. The segfault has changed slightly: now I occasionally get a message like:
double free or corruption (out)
and the stacktrace has changed
systemd-coredump[7725]: Process 7721 (wasm-pack) of user 1000 dumped core.
Stack trace of thread 7723:
#0 0x00007fbfae74e36d __pthread_rwlock_unlock (libpthread.so.0)
#1 0x00007fbfae3d23ba CRYPTO_THREAD_unlock (libcrypto.so.1.1)
#2 0x00007fbfae345e07 n/a (libcrypto.so.1.1)
#3 0x00007fbfae3461c8 ERR_load_strings_const (libcrypto.so.1.1)
#4 0x00007fbfae2a3a8d ERR_load_BN_strings (libcrypto.so.1.1)
#5 0x00007fbfae3472de n/a (libcrypto.so.1.1)
#6 0x00007fbfae3679ca n/a (libcrypto.so.1.1)
#7 0x00007fbfae7505cf __pthread_once_slow (libpthread.so.0)
#8 0x00007fbfae3d240a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#9 0x00007fbfae367ec6 OPENSSL_init_crypto (libcrypto.so.1.1)
#10 0x00007fbfae346792 ERR_get_state (libcrypto.so.1.1)
#11 0x00007fbfae3468da ERR_clear_error (libcrypto.so.1.1)
#12 0x00007fbfae3678da n/a (libcrypto.so.1.1)
#13 0x00007fbfae7505cf __pthread_once_slow (libpthread.so.0)
#14 0x00007fbfae3d240a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#15 0x00007fbfae3680a3 OPENSSL_init_crypto (libcrypto.so.1.1)
#16 0x00007fbfae993d72 n/a (libcurl.so.4)
#17 0x00007fbfae958ffe n/a (libcurl.so.4)
#18 0x000055ee7dda8f8a n/a (/home/devel/non-work/wasm-pack/target/release/wasm-pack
Hi @derekdreery! Thanks for the bug report! What OS do you have? I have no problem running wasm-pack without any arguments on a Linux x64 system. I tried the latest master and built it with cargo build. Then running it like this: ./target/debug/wasm-pack.
You might have some problem with curl on your OS.
Maybe @alexcrichton has some ideas?
Yeah this unfortunately looks like an issue with the build process or the built binary, probably unrelated to wasm-pack itself. The prebuilt binaries should likely work for you though.
The interesting thing is that the program works fine without the changes I made. So it seems that the changes introduce the error. I've just done a sys upgrade so I will re-test to see if the problem has gone.
@derekdreery are you still running into this? let us know!
Hey, I'll have another go! :)
I've still got the issue, but I'm also seeing it with the master branch, so it's unrelated to my patch. It's probably a version mismatch somewhere between a library and its bindings.
Just to follow up, it's fine to close this issue. I'd still personally like to get to the bottom of what's happening, but I lack the skills
I'm seeing what's probably the same segfault. It looks like a race condition within openssl triggered by exiting (running atexit handlers) while the thread spawned by background_check_for_updates is initializing openssl.
Even if openssl handled shutdown mid-initialization, I suspect you'd still be in trouble if the version check thread gets as far as actually using openssl while the main thread exits the process. It'd be nice if wasm-pack could wait for that other thread before exiting, but unless it can interrupt the check in flight I'm guessing that'd be a bit slow... Haven't investigated what it would take to interrupt the check and exit cleanly.
I'm also seeing this with a locally built wasm-pack (from revision 30a95a42f14abb5b7d6e50f98256da7400046199, built with Rust 1.35.0, with a local change that shouldn't matter if we exit this early). I haven't investigated if official builds of wasm-pack are less crashy (or why that might be).
The exact stack of the crashing thread varies, but usually looks like:
Thread 1 (Thread 0x7f9ca167f700 (LWP 17184)):
#0 __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x0) at pthread_rwlock_common.c:581
#1 __GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
#2 0x00007f6461bb6109 in CRYPTO_THREAD_write_lock (lock=<optimized out>) at crypto/threads_pthread.c:66
#3 0x00007f9ca196c245 in OBJ_NAME_add (name=0x7f9ca19e0500 "md5", type=type@entry=1, data=data@entry=0x7f9ca1a5ef20 <md5_md> "\004") at crypto/objects/o_names.c:230
...
#5 0x00007f9ca207aa4f in ossl_init_ssl_base () at ssl/ssl_init.c:77
...
#10 0x00007f646227abdb in OPENSSL_init_ssl (opts=2097152, settings=<optimized out>) at ssl/ssl_init.c:201
...
#12 0x000055af94bf51e0 in openssl_sys::init ()
#13 0x000055af94bf1eb5 in std::sync::once::Once::call_once::{{closure}} ()
#14 0x000055af94d4cf16 in call_inner () at src/libstd/sync/once.rs:387
...
#18 0x000055af94b5d2c3 in wasm_pack::manifest::Crate::return_wasm_pack_latest_version ()
...
There's another thread that looks like:
Thread 2 (Thread 0x7f9ca1680980 (LWP 17183)):
#0 0x00007f9ca1c690f8 in _fini () from /usr/lib64/libsmime3.so
#1 0x00007f9ca216a605 in _dl_fini () at dl-fini.c:143
#2 0x00007f9ca1ebd8fc in __run_exit_handlers (status=1, listp=0x7f9ca2049718 <__exit_funcs>, run_list_atexit=run_list_a
texit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#3 0x00007f9ca1ebda4a in __GI_exit (status=<optimized out>) at exit.c:139
#4 0x00005565384311b7 in std::sys::unix::os::exit () at src/libstd/sys/unix/os.rs:541
#5 0x000055653842fc3f in std::process::exit () at src/libstd/process.rs:1485
#6 0x000055653830c1cd in clap::app::App::get_matches ()
#7 0x00005565381e84a1 in wasm_pack::main ()
...
or:
Thread 2 (Thread 0x7f6461880980 (LWP 17749)):
#0 0x00007f6462369995 in _dl_fixup (l=0x7f64620408b0, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:93
#1 0x00007f646237035a in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:126
#2 0x00007f6461d358a3 in __do_global_dtors_aux () from /usr/lib64/libnss3.so
#3 0x00007ffccfedee00 in ?? ()
#4 0x00007f646236a5e3 in _dl_fini () at dl-fini.c:138
Backtrace stopped: frame did not save the PC
Looking at openssl (I'm using 1.1.0k), it uses pthread_once (or equivalent) to run its initialization function once, and registers an atexit handler that tears everything down quite early in that initialization. We're crashing trying to use a static lock in o_names.c that got nulled out by OBJ_NAME_cleanup, called from evp_cleanup_int, called by OPENSSL_cleanup, which is registered as an atexit handler by ossl_init_base, which runs early in OPENSSL_init_crypto. which gets called by OPENSSL_init_ssl before the call to ossl_init_ssl_base the crashing thread is in.
The backtraces from @derekdreery also contain openssl initialization functions and also occur when exiting immediately, so I think it's the same crash.
The curl-sys assert was curl failing to initialize. I haven't confirmed it has a similar race (and handles it better by failing initialization instead of crashing) but it seems plausible.
@marienz great work, I didn't know enough about ssl/curl to work out what was going on.
I've now recreated this with the pre-built binary.
So basically, if you initialize openssl in a thread that's not the main thread, you can get a race condition between an atexit handler on the process, and the initialization function. So you must run OPENSSL_init_ssl on the main thread, or synchronize it finishing with any exit call.
@marienz does that sound right?
I don't think this is limited to initialization. I think if we run atexit handlers on the main thread while another thread is still using openssl (either initializing it or actually using it to talk to something over the network) we'll potentially crash.
Moving openssl initialization to the main thread might end up making this harder to hit, because anything involving the network is most likely slower than running atexit handlers and exiting. If the version-checking thread is blocked in a network-related syscall while we exit, we shouldn't crash.
But I'd argue the "proper" fix here is to just join the version-checking thread from the main thread, after making sure wasm-pack does not do its version check more often than necessary (#653 and related issues). This will make wasm-pack a little slower to exit when it decides it has to do the version check and it doesn't have to do any work at the same time, but it should be pretty rare to hit both of those conditions simultaneously (for example: you'd never see this twice in a row). And if wasm-pack doing actual work is faster than the version check, waiting for the version check seems like a good idea (we do want that check to go through every now and then... if wasm-pack is consistently "too fast" the version check will never happen).
Hi everyone!
I ran into a similar issue with 0.9.1. The race condition surfaces when wasm-pack is compiled in release mode, but does not appear if it's compiled in debug mode.
My configuration:
OS: NixOS 19.09
rustc version: 1.41.0
GCC version: 9.2.0
OpenSSL version: 1.1.1d
I'm happy to report that with LibreSSL 3.0.2, the race conditions I experienced are gone in both release and debug modes.
@dhl thanks for letting us know!! do you think we should document this somewhere?
I would love to see this documented somewhere, but maybe it's a bit too early to say LibreSSL fixes the segfaults before others could confirm the same?
Thanks for the great work on wasm-pack, by the way! :heart:
I had a quick look to see if there's an obvious difference between LibreSSL and OpenSSL, or if it's just more timing changes hiding the problem. There is a real difference, but I don't know if it's intentional or not.
Comparing OpenSSL 1.1.1d and LibreSSL 3.0.2, the important difference is that I see no atexit() handlers in LibreSSL.
However, that might be because LibreSSL currently does not need them, rather than by design. The lock whose cleanup triggered the segfault I saw before is also not present in libressl. It looks like this lock was introduced in openssl relatively recently (https://github.com/openssl/openssl/pull/3525), and (assuming https://github.com/libressl-portable/openbsd/commits/master/src/lib/libcrypto/objects/o_names.c is current) libressl has not yet picked up those changes.
So it's possible libressl still has that race condition, and that if they choose to fix it the same way openssl did it'll start crashing wasm-pack the same way. (Or if they start using atexit() for cleanup for other reasons.) Unless someone confirms libressl intentionally does not use atexit handlers I wouldn't recommend LibreSSL as the fix for this one.
Most helpful comment
I'm seeing what's probably the same segfault. It looks like a race condition within openssl triggered by exiting (running atexit handlers) while the thread spawned by
background_check_for_updatesis initializing openssl.Even if openssl handled shutdown mid-initialization, I suspect you'd still be in trouble if the version check thread gets as far as actually using openssl while the main thread exits the process. It'd be nice if wasm-pack could wait for that other thread before exiting, but unless it can interrupt the check in flight I'm guessing that'd be a bit slow... Haven't investigated what it would take to interrupt the check and exit cleanly.
I'm also seeing this with a locally built wasm-pack (from revision 30a95a42f14abb5b7d6e50f98256da7400046199, built with Rust 1.35.0, with a local change that shouldn't matter if we exit this early). I haven't investigated if official builds of wasm-pack are less crashy (or why that might be).
653 (and any other changes that rate-limit the version check) might mostly hide this issue.
The exact stack of the crashing thread varies, but usually looks like:
There's another thread that looks like:
or:
Looking at openssl (I'm using 1.1.0k), it uses pthread_once (or equivalent) to run its initialization function once, and registers an atexit handler that tears everything down quite early in that initialization. We're crashing trying to use a static lock in o_names.c that got nulled out by
OBJ_NAME_cleanup, called fromevp_cleanup_int, called byOPENSSL_cleanup, which is registered as an atexit handler byossl_init_base, which runs early inOPENSSL_init_crypto. which gets called byOPENSSL_init_sslbefore the call toossl_init_ssl_basethe crashing thread is in.The backtraces from @derekdreery also contain openssl initialization functions and also occur when exiting immediately, so I think it's the same crash.
The curl-sys assert was curl failing to initialize. I haven't confirmed it has a similar race (and handles it better by failing initialization instead of crashing) but it seems plausible.