Lisk-sdk: How to indicate that lisk core is done creating a snapshot

Created on 29 May 2018  路  7Comments  路  Source: LiskHQ/lisk-sdk

Description

We recently closed https://github.com/LiskHQ/lisk-scripts/issues/73. The existence of that issue demonstrates the fragility of grepping a log file to find out if the snapshotting process is done or not.

Expected behavior

To be defined. Propose suggestions in the comments. Some have been suggested in the linked issue above

one approach that comes to mind - snapshot.sh already creates a file to use as a lock to stop other instances of snapshot.sh from doing anything, in case someone starts it twice. We could use that file with flock and have core use flock on it when snapshotting and when core's done with it and releases its flock, the script obtains the flock and continues, shutting down core and starting it up normally as it currently does. Though not sure on the support Mac OS has for this

Actual behavior

The only indication given by core that it's done snapshotting is an entry in a log file

Which version(s) does this affect? (Environment, OS, etc...)

All

Most helpful comment

Yes it should work as we are suing similar stuff in lisk.sh already at https://github.com/LiskHQ/lisk-scripts/blob/0fc090254df42d1e7b32928b07eb3d515dd8d311/packaged/lisk.sh#L315

All 7 comments

Referring to comment in linked issue - process should terminate when snapshot is finished (beta.8). If thats's not the case then it's a bug.

@4miners nice idea.

The issue https://github.com/LiskHQ/lisk/issues/2075 will ensure it's happening.

Then:

The issue https://github.com/LiskHQ/lisk-scripts/issues/80 will ensure that created snapshots are working. After Lisk Core process finished and shut down, cold start a new Lisk Core instance and check if it stays in sync.

New Jenkins nightly jobs using the provided snapshotting script will provide the tested snapshots every day then for Mainnet, Testnet and Betanet.

I confirmed this after snapshotting the process terminated successfully.

[inf] 2018-05-30 13:23:27 | Snapshot creation finished
[inf] 2018-05-30 13:23:27 | Cleaning up...
[dbg] 2018-05-30 13:23:27 | Cache - Clean up database
[dbg] 2018-05-30 13:23:27 | Cache - Quit database
[dbg] 2018-05-30 13:23:27 | Export peers to database failed: Peers list empty
[inf] 2018-05-30 13:23:27 | Cleaned up successfully

Since in production we are using pm2 it starts the script after it exits.

pm2 start app.js -- -s 1
1|app      | [inf] 2018-05-30 13:29:25 | Cleaned up successfully
PM2        | App [app] with id [1] and pid [36350], exited with code [1] via signal [SIGINT]
PM2        | Starting execution sequence in -fork mode- for app name:app id:1
PM2        | App name:app id:1 online
1|app      | [dbg] 2018-05-30 13:29:26 | Cache Enabled
1|app      | [inf] 2018-05-30 13:29:26 | App connected with redis server
1|app      | [inf] 2018-05-30 13:29:27 | Socket Cluster ready for incoming connections

But we have proper config file to not start it... :)

https://github.com/LiskHQ/lisk-scripts/blob/252493405c8e39f103dc1b2f79b3e60a3ed14371/packaged/etc/pm2-snapshot.json#L10

Now will look into bash file to see what's wrong there.

I suggest to use following approach. In script lisk_snapshot.sh where the lines are:

https://github.com/LiskHQ/lisk-scripts/blob/252493405c8e39f103dc1b2f79b3e60a3ed14371/packaged/lisk_snapshot.sh#L177-L178

until tail -n10 "$LOG_LOCATION" | (grep -q "Snapshot finished"); do

Instead of using logs we should rely on our process manager which is `pm2. So we can use any of following:

pm2 jlist | jq -c '.[] | select(.name | contains("lisk.snapshot")) | .pm2_env.status'

or

pm2 info lisk.snapshot | grep status | awk '{print $4}'

And use it

until [ !`pm2 info lisk.snapshot | grep status | awk '{print $4}'` = "stopped"];  do

To get the status of the snapshot process, so any dependency on lisk core logic should be removed.

@Nazgolze What are your thoughts?

@nazarhussain I tested this out, this works like a charm

Yes it should work as we are suing similar stuff in lisk.sh already at https://github.com/LiskHQ/lisk-scripts/blob/0fc090254df42d1e7b32928b07eb3d515dd8d311/packaged/lisk.sh#L315

@MaciejBaj We can close this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

karek314 picture karek314  路  3Comments

hendrikhofstadt picture hendrikhofstadt  路  4Comments

willclarktech picture willclarktech  路  4Comments

Nazgolze picture Nazgolze  路  3Comments

MaciejBaj picture MaciejBaj  路  3Comments