Go-ethereum: Geth Does Not Graceful Exit on SIGTERM

Created on 19 Feb 2018  Â·  8Comments  Â·  Source: ethereum/go-ethereum

I'm not sure if this is due to the way that I start/stop Geth, or a bug, or just the way it is...

I run geth using supervisor on Ubuntu which is a process control service like systemd

I was on v1.8.0 and I ran apt-get to update to v1.8.1 - but I had to restart the service so I ran supervisorctrl restart geth

That resulted in my fully sync'd node dropping 676 blocks. That seems like a lot of blocks to drop 😕

Lightly redacted log:

INFO [02-19|09:13:53] Imported new chain segment               blocks=1  txs=216  mgas=7.993   elapsed=450.119ms mgasps=17.758  number=5119264 hash=01f218…7b9e13 cache=200.80mB
INFO [02-19|09:14:41] Imported new chain segment               blocks=1  txs=163  mgas=7.993   elapsed=315.894ms mgasps=25.304  number=5119265 hash=38f2fd…113901 cache=201.31mB
INFO [02-19|09:14:53] Imported new chain segment               blocks=1  txs=291  mgas=7.985   elapsed=815.572ms mgasps=9.790   number=5119266 hash=9f243a…9d4d68 cache=201.77mB
INFO [02-19|09:15:25] Maximum peer count                       ETH=10 LES=40 total=50
INFO [02-19|09:15:25] Starting peer-to-peer node               instance=Geth/MysticRyuujin-GETH01/v1.8.1-stable-1e67410e/linux-amd64/go1.9.4
INFO [02-19|09:15:25] Allocated cache and file handles         database=/geth/.ethereum/geth/chaindata cache=768 handles=512
INFO [02-19|09:15:27] Initialised chain configuration          config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 4370000 Engine: ethash}"
INFO [02-19|09:15:27] Disk storage enabled for ethash caches   dir=/geth/.ethereum/geth/ethash count=3
INFO [02-19|09:15:27] Disk storage enabled for ethash DAGs     dir=/home/geth/.ethash          count=2
INFO [02-19|09:15:27] Initialising Ethereum protocol           versions="[63 62]" network=1
WARN [02-19|09:15:27] Head state missing, repairing chain      number=5119266 hash=9f243a…9d4d68
INFO [02-19|09:15:28] Rewound blockchain to past state         number=5118586 hash=1a86e5…7b6dd6
INFO [02-19|09:15:28] Loaded most recent local header          number=5119266 hash=9f243a…9d4d68 td=2624927619703869778791
INFO [02-19|09:15:28] Loaded most recent local full block      number=5118586 hash=1a86e5…7b6dd6 td=2622904571753520301193
INFO [02-19|09:15:28] Loaded most recent local fast block      number=5119266 hash=9f243a…9d4d68 td=2624927619703869778791
WARN [02-19|09:15:28] Blockchain not empty, fast sync disabled 
INFO [02-19|09:15:28] Loaded CHT                               section=155 head=122be1…21f3c7 root=f687e5…a872cf
INFO [02-19|09:15:28] Loaded bloom trie                        section=155 head=122be1…21f3c7 root=5354f8…cb62a3
INFO [02-19|09:15:28] Starting P2P networking 
INFO [02-19|09:15:28] Mapped network port                      proto=udp extport=30301 intport=30301 interface=ExtIP(1.1.1.1)
INFO [02-19|09:15:28] UDP listener up                          self=enode://5bcfb28162bf02400c228f2383447e079e75b089cea49b8f718bd06ef0cf27c61233e856311d3811148d424eabccfa0bff9393e939b3624de44910c70f204c9c@1.1.1.1:30301
INFO [02-19|09:15:28] UDP listener up                          net=enode://5bcfb28162bf02400c228f2383447e079e75b089cea49b8f718bd06ef0cf27c61233e856311d3811148d424eabccfa0bff9393e939b3624de44910c70f204c9c@[::]:30301
INFO [02-19|09:15:28] RLPx listener up                         self=enode://5bcfb28162bf02400c228f2383447e079e75b089cea49b8f718bd06ef0cf27c61233e856311d3811148d424eabccfa0bff9393e939b3624de44910c70f204c9c@1.1.1.1:30301
INFO [02-19|09:15:28] Stats daemon started 
INFO [02-19|09:15:28] Mapped network port                      proto=tcp extport=30301 intport=30301 interface=ExtIP(1.1.1.1)
INFO [02-19|09:15:28] Starting topic registration              topic=LES2@d4e56740f876aef8
INFO [02-19|09:15:28] IPC endpoint opened                      url=/geth/.ethereum/geth.ipc
INFO [02-19|09:15:28] HTTP endpoint opened                     url=http://localhost:8545 cors=* vhosts=geth01
INFO [02-19|09:15:38] Block synchronisation started 
INFO [02-19|09:15:46] Imported new chain segment               blocks=4 txs=441 mgas=21.577 elapsed=4.013s mgasps=5.376 number=5118590 hash=83c758…59a479 cache=2.12mB
INFO [02-19|09:16:00] Imported new chain segment               blocks=9 txs=1545 mgas=58.868 elapsed=8.339s mgasps=7.059 number=5118599 hash=344a39…6167c8 cache=8.95mB

Most helpful comment

All 8 comments

According to @karalabe tweet:

If you terminate Geth gracefully, it will write out the state of the HEAD-128th block, so the next restart will reprocess that many blocks. If you pull the rug from underneath Geth, you get whatever was last written to disk (5 mins of active processing).

I'm unclear if the loss of 676 blocks is in line with the first sentence? According to the supervisor documentation the default kill signal is SIGTERM which should be the graceful exit path? I'll note that it took me about 15 minutes to re-sync.

https://ethereum.stackexchange.com/a/712/4757 suggests:

kill -INT `pidof geth`

Tried here - seems to shut down gracefully.


P.S. When using systemd, the default TimeoutStopSec of 90 seconds might not be enough - at least when interrupting a node that hasn't yet caught up after a previous termination.

(JIC, here's my service file.)

So I think @veox is correct. Geth seems to be expecting SIGINT or the exit command from the console. Only then does it gracefully shutdown. It seems that Geth doesn't handle SIGTERM which I think it should since that's the standard "Please shutdown" code from *nix...

When I altered supervisor to send SIGINT instead - the logs reflected a graceful shutdown.

According to most documentation SIGTERM:

It is the normal way to politely ask a program to terminate

You will still lose up to 128 blocks with a graceful termination (with default settings)

Hmmm, ok, so we have two issues here. One is to make geth play nice with sigint SIGTERM, I think that's a reasonable request, will do it. The other issue is that geth start up with 128 blocks behind. This was meant as a way to quickly pull in 128 blocks into memory on startup, but I think there's a lot of confusion stemming form it. Maybe we should just write out the last state on shutdown instead.

This should be fixed on master.

Tested with 1.8.2-unstable-4c845bdc. Does indeed terminate gracefully, and pick up where it left upon restart. Haven't yet encountered an ommer situation, where reverting to an earlier trie was required (but only tested a few times).

geth does seem to be busy with something else after closing the database, resulting in systemd force-terminating the process - but that's a different issue, and needs some more testing anyway.

Thanks for implementing this!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

VenusHu picture VenusHu  Â·  3Comments

AdrianScott picture AdrianScott  Â·  3Comments

leonzhao picture leonzhao  Â·  3Comments

keitaj picture keitaj  Â·  3Comments

freshonline picture freshonline  Â·  3Comments