Mandatory Information
jcli 0.5.5 (master-c3f2dcc4+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]jormungandr 0.5.5 (master-c3f2dcc4+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]To Reproduce
$RUST_BACKTRACE=1 jormungandr --config node-config.yaml --genesis-block-hash adbdd5ede31637f6c9bad5c271eec0bc3d0cb9efb86a5b913bb55cba549d07
70 --secret node-secret.yaml --log-level debug
0: error_chain::backtrace::imp::InternalBacktrace::new
1: <error_chain::State as core::default::Default>::default
2: jormungandr::blockchain::chain::Blockchain::post_check_header
3: futures::future::chain::Chain<A,B,C>::poll
4: futures::future::chain::Chain<A,B,C>::poll
5: <futures::future::map::Map<A,F> as futures::future::Future>::poll
6: futures::task_impl::std::set
7: std::thread::local::LocalKey<T>::with
8: futures::future::Future::wait
9: jormungandr::blockchain::process::handle_input
10: <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll
11: std::panicking::try::do_call
12: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:80
13: <futures::future::catch_unwind::CatchUnwind<F> as futures::future::Future>::poll
14: <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll
15: futures::task_impl::std::set
16: std::panicking::try::do_call
17: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:80
18: tokio_threadpool::task::Task::run
19: tokio_threadpool::worker::Worker::run_task
20: tokio_threadpool::worker::Worker::run
21: std::thread::local::LocalKey<T>::with
22: std::thread::local::LocalKey<T>::with
23: std::thread::local::LocalKey<T>::with
24: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
25: std::thread::local::LocalKey<T>::with
26: std::thread::local::LocalKey<T>::with
27: std::sys_common::backtrace::__rust_begin_short_backtrace
28: std::panicking::try::do_call
29: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:80
30: core::ops::function::FnOnce::call_once{{vtable.shim}}
31: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
32: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
std::sys_common::thread::start_thread
at src/libstd/sys_common/thread.rs:13
std::sys::unix::thread::Thread::new::thread_start
at src/libstd/sys/unix/thread.rs:79
33: start_thread
34: __clone
) } })
thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1166:5
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/mod.rs
Thanks, the issue is the one just above. We are looking into it. The second one Expect the event to not close is a side effect of the first issue. What are the few lines above?
I captured this prior to running backtrace.
Oct 04 09:52:26.002 INFO leader event starting, date: 224.3566, leader: 1, task: leadership
Oct 04 09:52:26.003 DEBG processing block from leader event, date: 224.3566, parent: aa8f5206849d8c9ae5a8520a3fc03bcc0bfb1cfd2ac234ed269e80abc8f29210, hash: 0876e3f069bca
2ee133a5872803e857cfa7bddf1ad70fa2f32d6982d1c78b334, task: block
thread 'block0' panicked at 'called Result::unwrap() on an Err value: Error(BlockHeaderVerificationFailed("Block signature is invalid"), State { next_error: None, bac
ktrace: InternalBacktrace { backtrace: None } })', src/libcore/result.rs:1084:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.
called Result::unwrap() on an Err value: Error(BlockHeaderVerificationFailed("Block signature is invalid"), State { next_error: None, backtrace: InternalBacktrace { b
acktrace: None } })thread 'leadership0' panicked at 'Expect the event to not close
', src/libcore/option.rs:1166:5
Expect the event to not close
did you register a stakepool at the last epoch (yesterday)?
My pool was registered several days ago, yesterday I delegated stake to it.
The pool ID:
0f30fcf32277b69afe537d6af47f9ebc04955e8ea138d703981cc7420bab5290
I am experiencing the same issue. It only happens /w a registered stake pool.
But why does a node panic when a Block's header verification failed (message is corrupted, invalid or any other issues with any incoming data)?
As, every node, at least, should be highly resilient (would say silver-bullet-proof) to any message they're receiving from any (adversary or friendly) nodes or servers as otherwise the whole network can be shut down /w some malformed message.
@ilap You are correct, there should be proper error handling instead of a panic on invalid input, and we still have a few places that were left with a quick and dirty unwrap.
i ran this passive - without secret node
Oct 06 14:32:33.249 INFO update current branch tip, date: 225.40769, parent: 1de18f2726aa138a83351f43d5d72e7ea8caee262a9f75acb8e5674a9cc7c7cd, hash: b4b9077be0c6e6305d2491759c7887558dd9f5ff0be36736343dff17b547f36c, task: block
thread 'block0' panicked at 'called Result::unwrap() on an Err value: Error(Msg("cannot check if the block is in the storage"), State { next_error: Some(BackendError(SqliteFailure(Error { code: CannotOpen, extended_code: 14 }, Some("unable to open database file")))), backtrace: InternalBacktrace { backtrace: None } })', src/libcore/result.rs:1084:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.
called Result::unwrap() on an Err value: Error(Msg("cannot check if the block is in the storage"), State { next_error: Some(BackendError(SqliteFailure(Error { code: CannotOpen, extended_code: 14 }, Some("unable to open database file")))), backtrace: InternalBacktrace { backtrace: None } })
thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1166:5
Thanks, the issue is the one just above. We are looking into it. The second one
Expect the event to not closeis a side effect of the first issue. What are the few lines above?
Interestingly, it's working now. Is it due to the fact that we've entered to the next epoch?
It still crashes bit for different reason.
It seams to me some memory, memleak or ulimit issue as the rust crashed /w memory allocation of 694 bytes failedAborted.
So, how the mem is cleaned /w GC?
Also what is the recommended ulimit settings for a stake-pools.
As the srv has enough memory, 4GB, and jormangundr only used ~370MB total (~270MB heap, 128KB stack, and the others), about 15 mins before the crash.
What鈥檚 your --full-version?
What鈥檚 your
--full-version?
jormungandr 0.5.5 (HEAD-aec29a1, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
But, it's a bit misleading, as it's running on a 32bit OS, and I use the following binary:
jormungandr-v0.5.5-armv7-unknown-linux-gnueabihf.tar.gz
Here comes the total memory consumption of jormungandr (collected in every minute) since I started it:
Rss: 69692 kB
Rss: 86908 kB
Rss: 100616 kB
Rss: 114320 kB
Rss: 128164 kB
Rss: 149320 kB
Rss: 291524 kB
Rss: 292716 kB
Rss: 292968 kB
Rss: 309044 kB
Rss: 334752 kB
Rss: 366172 kB
Rss: 369904 kB
Rss: 371684 kB
Rss: 378924 kB
Rss: 381676 kB
Rss: 387240 kB
Rss: 393340 kB
Rss: 404864 kB
Rss: 410200 kB
Rss: 410208 kB
Rss: 410220 kB
Rss: 410236 kB
Rss: 404812 kB
Rss: 404836 kB
Rss: 404844 kB
Rss: 407488 kB
Ok, it works now and it seems to me stable (running more than an our /w same memory).
The issue was that I accidentally DID NOT download and use the
jormungandr-v0.5.5-armv7-unknown-linux-gnueabihf.tar.gz (for newer ARMHF CPUs) I mentioned above, but the other (the older one for ARMEL)
jormungandr-v0.5.5-arm-unknown-linux-gnueabi.tar.gz, which behaved very unstable.
i ran this passive - without secret node
Oct 06 14:32:33.249 INFO update current branch tip, date: 225.40769, parent: 1de18f2726aa138a83351f43d5d72e7ea8caee262a9f75acb8e5674a9cc7c7cd, hash: b4b9077be0c6e6305d2491759c7887558dd9f5ff0be36736343dff17b547f36c, task: block
thread 'block0' panicked at 'calledResult::unwrap()on anErrvalue: Error(Msg("cannot check if the block is in the storage"), State { next_error: Some(BackendError(SqliteFailure(Error { code: CannotOpen, extended_code: 14 }, Some("unable to open database file")))), backtrace: InternalBacktrace { backtrace: None } })', src/libcore/result.rs:1084:5
note: run withRUST_BACKTRACE=1environment variable to display a backtrace.
calledResult::unwrap()on anErrvalue: Error(Msg("cannot check if the block is in the storage"), State { next_error: Some(BackendError(SqliteFailure(Error { code: CannotOpen, extended_code: 14 }, Some("unable to open database file")))), backtrace: InternalBacktrace { backtrace: None } })
thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1166:5
My node crashed with this same error on version:
jormungandr 0.5.6 (master-8d96e03c+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
Did you delete the storage file in between ?
Did you delete the storage file in between ?
I deleted the contents of /tmp/jormungandr/ prior to updating to 0.5.6, with the expectation that the files would be repopulated. I can see now that I am missing blocks.sqlite-shm and blocks.sqlite-wal. Funny thing is that the newly updated node ran overnight with no problems (was receiving blocks).
Hi,
im facing the same issue.
Oct 09 16:45:17.303 INFO Starting jormungandr 0.5.6+lock (master-5ddbe91, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)], task: init
Oct 09 16:45:17.304 INFO storing blockchain in '"/root/IOHK/jor-test/jormungandr/scripts/jormungandr-storage-test/blocks.sqlite"', task: init
Oct 09 16:45:17.305 WARN No trusted peers joinable to bootstrap the network, task: bootstrap
Oct 09 16:45:17.306 INFO starting task, task: client-query
Oct 09 16:45:17.306 INFO starting task, task: network
Oct 09 16:45:17.306 INFO adding P2P Topology module: trusted-peers, task: network
Oct 09 16:45:17.306 INFO start listening and accepting gRPC connections on 127.0.0.1:8299, task: network
thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1166:5
note: run withRUST_BACKTRACE=1environment variable to display a backtrace.
Expect the event to not close
I did compine the sources multiple times, but it didn't work...
Any Idea?
Thank you
Cheers
I reproduced this issue running a node with a secret.yaml file and not registering that stake pool on the blockchain.
Config of node:
{
"log": {
"format": "plain",
"level": "info",
"output": {
"gelf": {
"backend": "monitoring.stakepool.cardano-testnet.iohkdev.io:12201",
"log_id": "steak-and-eggs"
}
}
},
"p2p": {
"public_address": "/ip4/45.55.58.244/tcp/3100",
"topics_of_interest": {
"blocks": "high",
"messages": "high"
},
"trusted_peers": [
"/ip4/3.123.177.192/tcp/3000",
"/ip4/52.57.157.167/tcp/3000",
"/ip4/3.123.155.47/tcp/3000",
"/ip4/3.115.57.216/tcp/3000",
"/ip4/3.112.185.217/tcp/3000",
"/ip4/18.139.40.4/tcp/3000",
"/ip4/18.140.134.230/tcp/3000"
]
},
"rest": {
"listen": "10.40.33.1:3101"
},
"storage": "/var/lib/jormungandr"
}
Ran with command:
/nix/store/ixdq1m73ahkk75d0r3bc57m5rdjvicyv-jormungandr-0.5.6+lock/bin/jormungandr --genesis-block-hash adbdd5ede31637f6c9bad5c271eec0bc3d0cb9efb86a5b913bb55cba549d0770 --config /nix/store/39fr7qyv161zr1085ylqyz0fav5v7ijy-config.yaml --secret /run/keys/jormungandr-secret.yaml
Started Jormungandr node service.
Oct 10 13:10:27.990 INFO Starting jormungandr 0.5.6+lock (, release, linux [x86_64]) - [rustc 1.37.0 (eae3437df 2019-08-13)], task: init
Oct 10 13:10:27.991 INFO storing blockchain in '"/var/lib/jormungandr/blocks.sqlite"', task: init
Oct 10 13:11:30.971 INFO connecting to bootstrap peer 3.115.57.216:3000, peer_addr: 3.115.57.216:3000, task: bootstrap
Oct 10 13:11:31.295 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 3.115.57.216:3000, task: bootstrap
Oct 10 13:11:31.297 INFO connecting to bootstrap peer 3.112.185.217:3000, peer_addr: 3.112.185.217:3000, task: bootstrap
Oct 10 13:11:31.624 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 3.112.185.217:3000, task: bootstrap
Oct 10 13:11:31.627 INFO connecting to bootstrap peer 52.57.157.167:3000, peer_addr: 52.57.157.167:3000, task: bootstrap
Oct 10 13:11:31.809 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 52.57.157.167:3000, task: bootstrap
Oct 10 13:11:31.811 INFO connecting to bootstrap peer 3.123.177.192:3000, peer_addr: 3.123.177.192:3000, task: bootstrap
Oct 10 13:11:31.996 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 3.123.177.192:3000, task: bootstrap
Oct 10 13:11:31.998 INFO connecting to bootstrap peer 18.139.40.4:3000, peer_addr: 18.139.40.4:3000, task: bootstrap
Oct 10 13:11:32.442 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 18.139.40.4:3000, task: bootstrap
Oct 10 13:11:32.445 INFO connecting to bootstrap peer 18.140.134.230:3000, peer_addr: 18.140.134.230:3000, task: bootstrap
Oct 10 13:11:32.897 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 18.140.134.230:3000, task: bootstrap
Oct 10 13:11:32.897 INFO connecting to bootstrap peer 3.123.155.47:3000, peer_addr: 3.123.155.47:3000, task: bootstrap
Oct 10 13:11:33.079 WARN with initial bootstrap, reason: bootstrap pull stream failed, peer_addr: 3.123.155.47:3000, task: bootstrap
Oct 10 13:11:33.081 INFO starting task, task: network
Oct 10 13:11:33.081 INFO adding P2P Topology module: trusted-peers, task: network
Oct 10 13:11:33.082 INFO start listening and accepting gRPC connections on 45.55.58.244:3100, task: network
Oct 10 13:11:33.082 INFO starting task, task: client-query
Oct 10 13:11:33.083 INFO connecting to initial gossip peer, peer_addr: 3.112.185.217:3000, task: network
Oct 10 13:11:33.085 INFO connecting to initial gossip peer, peer_addr: 3.115.57.216:3000, task: network
Oct 10 13:11:33.086 INFO connecting to initial gossip peer, peer_addr: 52.57.157.167:3000, task: network
Oct 10 13:11:33.086 INFO connecting to initial gossip peer, peer_addr: 18.139.40.4:3000, task: network
Oct 10 13:11:33.087 INFO connecting to initial gossip peer, peer_addr: 3.123.177.192:3000, task: network
Oct 10 13:11:33.088 INFO connecting to initial gossip peer, peer_addr: 3.123.155.47:3000, task: network
Oct 10 13:11:33.090 INFO connecting to initial gossip peer, peer_addr: 18.140.134.230:3000, task: network
thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1034:5
stack backtrace:
0: 0x5641add6c34b - std::panicking::default_hook::{{closure}}::h847a2eb38b396f14
1: 0x5641add6c027 - std::panicking::default_hook::h2ca0f9a30a0e206b
2: 0x5641add6cac0 - std::panicking::rust_panic_with_hook::hffcefc09751839d1
3: 0x5641add6c642 - std::panicking::continue_panic_fmt::hc0f142c930c846fc
4: 0x5641add6c526 - rust_begin_unwind
5: 0x5641add897ed - core::panicking::panic_fmt::h2daf88b2616ca2b2
6: 0x5641add89857 - core::option::expect_failed::h4b77ebe6e62ec3a1
7: 0x5641ad759f03 - futures::future::chain::Chain<A,B,C>::poll::ha355895fbd52404c
8: 0x5641ad6c2e68 - <futures::future::loop_fn::LoopFn<A,F> as futures::future::Future>::poll::h81adca4422d6b8a6
9: 0x5641ad4dbca7 - std::panicking::try::do_call::hfba1fe95d49ab22e
10: 0x5641add70e1a - __rust_maybe_catch_panic
11: 0x5641ad649c8c - <futures::future::catch_unwind::CatchUnwind<F> as futures::future::Future>::poll::hcb3290538a78d67a
12: 0x5641ad422163 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::hf286e919e0bc929a
13: 0x5641add46a4d - futures::task_impl::Spawn<T>::poll_future_notify::hb23b3373513e934c
14: 0x5641add46641 - std::panicking::try::do_call::h228954743a9e9cf4
15: 0x5641add70e1a - __rust_maybe_catch_panic
16: 0x5641add44e59 - tokio_threadpool::task::Task::run::h3c1ea0ba7374fc64
17: 0x5641add3dd79 - tokio_threadpool::worker::Worker::run_task::h5a2aca95cc3c9906
18: 0x5641add3d4f7 - tokio_threadpool::worker::Worker::run::hc73017d12b48a593
19: 0x5641add1ba1f - std::thread::local::LocalKey<T>::with::h9a23aa648baa86b7
20: 0x5641add1b908 - std::thread::local::LocalKey<T>::with::h76110cfda97f3d8c
21: 0x5641add1b1df - tokio_reactor::with_default::h492df3d33b38d57c
22: 0x5641add21e6f - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::h084c0bd1fdbad0f9
23: 0x5641add444b6 - std::thread::local::LocalKey<T>::with::hd8efb6f6b3779307
24: 0x5641add445cf - std::thread::local::LocalKey<T>::with::hda880bf1bdd22f0e
25: 0x5641add3e671 - std::sys_common::backtrace::__rust_begin_short_backtrace::h90d0a2345cf9a3bb
26: 0x5641add4668c - std::panicking::try::do_call::h31c1fb4fa4bc192b
27: 0x5641add70e1a - __rust_maybe_catch_panic
28: 0x5641add3f910 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h8451465f67620f58
29: 0x5641add5d36f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h42806b83647d4c79
30: 0x5641add701c0 - std::sys::unix::thread::Thread::new::thread_start::h4570080769500bcd
31: 0x7f8757154ef7 - start_thread
32: 0x7f8756e7222f - __GI___clone
33: 0x0 - <unknown>
Expect the event to not close
jormungandr.service: Main process exited, code=exited, status=66/NOINPUT
jormungandr.service: Failed with result 'exit-code'.
jormungandr.service: Consumed 1min 3.405s CPU time.
jormungandr.service: Service RestartSec=100ms expired, scheduling restart.
jormungandr.service: Scheduled restart job, restart counter is at 440.
Stopped Jormungandr node service.
jormungandr.service: Consumed 1min 3.405s CPU time.
@disassembler Great, thanks.We need to repro with a debug build to see what's causing it, this backtrace looks like some call frames are optimized away.
Seeing this panic on 0.6.0 too.
This should be fixed in 0.7. Let me know if this is not the case.
still there it seems. 0.7.0-rc1
Oct 24 19:37:21.001 INFO leader event starting, date: 1.386, leader: 1, task: leadership
thread 'block1' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Msg("cannot process leadership block"), State { next_error: Some(Error(BlockHeaderVerificationFailed("Block proof is invalid"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })), backtrace: InternalBacktrace { backtrace: None } })', src/libcore/result.rs:1084:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
called `Result::unwrap()` on an `Err` value: Error(Msg("cannot process leadership block"), State { next_error: Some(Error(BlockHeaderVerificationFailed("Block proof is invalid"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })), backtrace: InternalBacktrace { backtrace: None } })thread 'leadership0' panicked at 'Expect the event to not close', src/libcore/option.rs:1166:5
ubuntu@ip-172-31-32-11:~/jor-test/jormungandr$ jormungandr --full-version
jormungandr 0.7.0-rc1 (HEAD-5583e5e8, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]