Lisk-sdk: Reduce rebuild time by using PostgreSQL custom dump format

Created on 26 Feb 2018 · 8Comments · Source: LiskHQ/lisk-sdk

Currently the snapshot function is dumping the database to pure SQL and then gzipping.

The PostgreSQL tool pg_dump also supports a custom format (-Fc) which is compressed by default and that can be restored utilizing multiple threads using pg_restore with the - j <Threads> argument.

Snapshotting:
https://github.com/LiskHQ/lisk-scripts/blob/1bb2a9245e1b6a0d09bd96c8c811a025a24c6b98/packaged/lisk_snapshot.sh#L208

would become

pg_dump -Fc "$TARGET_DB_NAME" -f "$BACKUP_FULLPATH"

Restoring:
https://github.com/LiskHQ/lisk-scripts/blob/1bb2a9245e1b6a0d09bd96c8c811a025a24c6b98/packaged/lisk.sh#L149

would become

if ! gunzip -fcq  | pg_restore -Fc -j <numcpu*2> -U "$DB_USER" -d "$DB_NAME" "$DB_SNAPSHOT" >> "$SH_LOG_FILE" 2>&1; then

The format is incompatible with the old one but rebuild speed went down by ~50% in my tests.

I'd be happy to discuss whether this could be worth the breaking change in favor of faster node rebuilding

Source

hendrikhofstadt

👍1

All 8 comments

Nice proposal @Slamper, we will certainly contemplate it. Thanks 👍

karmacoma on 27 Feb 2018

👍1

Why use twice as many threads as CPUs in -j <numcpu*2>. Shouln't we use the number of logical cores that nproc on Linux and sysctl -n hw.ncpu on macOS report?

webmaster128 on 29 Mar 2018

Great idea, btw. I took the time to dig a little deeper into the topic. Looking at the pg_dump docu, I think the formats directory and tar are not relevant. Thus I compared plain+gzip and custom as proposed by @Slamper. From looking at the CPU use, I thought that compression/decompression is the performance bottle neck, so I added pigz into the mix, a multi-core gzip replacement.

All timings ware made using Ubuntu 18.04 in a 2 core virtual machine on my 4 core Laptop.

Dumping

Note: multi job dumping is not supported for format custom.

plain+gzip: pg_dump -O lisk_test | gzip > dump.sql.gz
plain+pigz: pg_dump -O lisk_test | pigz > dump.sql.pgz
custom: pg_dump -O -Fc lisk_test -f dump.pg_dump

_ | plain+gzip | plain+pigz | custom
-|--------------|-----------|-----------
run time (s) | 88 | 56 | 92
file size (MB) | 643 | 644 | 645

Restoring

plain+gzip: gunzip -fcq dump.sql.gz | psql -q -U $USER -d lisk_test
plain+pigz: pigz -dcq dump.sql.pgz | psql -q -U $USER -d lisk_test
custom: pg_restore -Fc -j 2 -U $USER -d lisk_test dump.pg_dump

_ | plain+gzip | plain+pigz | custom (2 jobs) | custom (4 jobs)
-|--------------|-----------|-----------|-----------
run time (s) | 237 | 231 | 182 | 165

Results

Dumping speed is limited by gzip compression and can be improved significantly when a multi-core gzip tool is used
Decompression is not the bottleneck when restoring
pg_restore's jobs count should be increased above the number of cores as this is how many database connections are doing work at the same time, but each of those connections does not necessarily use much CPU
pg_restore saves 30 % of time when restoring. This is 1 minute on my machine for testnet

webmaster128 on 10 Apr 2018

👀1 👍1

@webmaster128 Thanks for providing such detailed statistics 👍
I think that the custom format can be really useful because restoring is the task where speed is the most relevant and your results clearly support that this is the right way to go.
On my 8 core dedicated server (not HT) restoring using the custom format even saved ~50%.

As for <numcpu*2>: I saw a benchmark on Stackoverflow saying that 2x the number of logical cores yields the best results. Your benchmark also supports that.

hendrikhofstadt on 11 Apr 2018

@Slamper I implemented your idea using custom dump and restore script (both with "lisk_test" hardcoded): https://github.com/prolina-foundation/setup-lisk-node/tree/master/roles/lisk-node/files/blockchain
Both scripts are independent of the Ansible setup around them. An example snapshot is available at http://lisk.prolina.org/snapshots/ Please feel free to use, share, comment, improve.

webmaster128 on 12 Apr 2018

I tested the postgres dump format for a while now. I works nicely. Everything is fine.

However, I don't think it is with the change. I can restore a mainnet snapshot in less than 8 minutes on a very weak VPS (1 core, 2 GM ram, 3 EUR/month). Saving those handful of minutes is not worth the engineering time, the risk of new bugs, breaking existing tooling and the compatibility of existing snapshot repositories.

I'd consider this a nice to have compared to other tasks in the Lisk ecosystem.

webmaster128 on 3 May 2018

I'll transfer this issue to LiskHQ/lisk since this is where lisk_snapshot.sh and lisk.sh live now.