Lightning: Backup solutions

Created on 5 Nov 2020  路  18Comments  路  Source: ElementsProject/lightning

Hi, this is just a general question.
After my precocious 1BTC lnd node got corrupted and died (although all hardware was new and bought specifically for create a node), https://github.com/lightningnetwork/lnd/issues/4720 I wonder what are the backup solutions that are available in c-lightning. I read somewhere that it is possible to do replication or something like that, but at the same time SCB is not supported so I can potentially loss all my money (#3083). Is there a place where I can read a bit about what you support?
(I guess the corruption happen so I can keep my promise ;-) https://github.com/ElementsProject/lightning/issues/2400#issuecomment-474346091)

@rustyrussell
Thanks

EDIT
For whoever is intrested:
Christian Decker on the latest state of LN backups

Most helpful comment

I've been messing around with this; the somewhat fiddly thing about the backup.py backup protocol to me is that it's not a simple stream but requires being able to rewind a record. This makes it more involved to mirror these backups remotely, or, say, send them over a SSH-encapsulated TCP stream to a remote host.

So I've been thinking of adding a new record type \x03 that means 'rewind one` and leaving it up to the program interpreting/restoring the backup to delete one record in this case. Is there any drawback to this? Sure, there will be some garbage in the backup, but apart from that?

Good idea, that is indeed an option. Since the update mechanism of the header gave me a trivial way to go back in time by just storing the previous position as well as the current one I went with that, allowing dumb restores. But I think having some complexity on the occasional restore rather than the way more common backup step is likely better :+1:

All 18 comments

With a PostgreSQL backend you should be able to do DB-level replication. I am unsure how to actually set that up, but you might ask @cdecker . In addition, there is also the backup plugin which replicates to another file: https://github.com/lightningd/plugins/blob/master/backup/backup.py though documentation on it is very sparse. This is intended to be the basis of a "better" backup strategy, such as using an NFS or other network mount for the backup file.

You can also use filesystem replication RAID-1 such as ZFS or btrfs; for example the Raspiblitz setup can set up a btrfs RAID-1 between a partition of your 1Tb HDD/SSD and a 32Gb flash disk.

CLDCB is not yet ready for production use yet (I see you filed an issue there), I am currently working on CLBOSS.

Generally in C-lightning we consider a continuous dynamic backup like the above backup plugin (or CLDCB when I get around to completing it, sorry) or replicating a PostgreSQL database to be superior to SCBs. Continuous dynamic backups subsume SCBs and you do not have to remember to make an SCB for each channel you create.

Thank you for the detailed information.
OffTopic
Why did you decided to use C++ for CLBOSS? isn't a scripting language (Python :)) will be easier if speed is not important?

People have complained about C-Lightning requiring Python in its build scripts before, and have subsequently avoided C-Lightning (many people are attracted to the "C" in C-Lightning, under the impression that it has lower system requirements; not everyone is a Pythonista). Recent C-Lightning has finally weaned itself off Python in its build scripts. So I think it is better to reduce requirements. Xref. https://lists.ozlabs.org/pipermail/c-lightning/2020-October/000197.html for more info on many decisions in CLBOSS.

Generally in C-lightning we consider a continuous dynamic backup like the above backup plugin (or CLDCB when I get around to completing it, sorry) or replicating a PostgreSQL database to be superior to SCBs. Continuous dynamic backups subsume SCBs and you do not have to remember to make an SCB for each channel you create

Why not both? What happen in case that there is a corruption that is not detected or not crush the system? then you just replicate the corruption. Maybe I am not making sense and those things are being tested. But then a bug could cause error writes to the DB.

you do not have to remember to make an SCB for each channel you create.

See this:https://github.com/lightningnetwork/lnd/issues/4729 they autosave each channel change.

Thanks.

Why not both?

The new doc/BACKUP.md does suggest backing up the database as a worst-case fallback. #4207

What happen in case that there is a corruption that is not detected or not crush the system? then you just replicate the corruption. Maybe I am not making sense and those things are being tested. But then a bug could cause error writes to the DB.

If it is because of a bug in the software, then it probably affects more than just a few users and we need some way to safely recover from the corruption in the software and heal from the erroneous write to the DB.

If it is because of a hardware glitch, then use of stuff like ECC memory so in-memory glitches are caught and corrected, storage systems with checksums like ZFS or BTRFS or some RAID-1 setup with 3 or more devices, would help remove those problems just as well. Network communications tend to have checksums (I believe TCP has those built-in?), and I assume something like PostgreSQL will have those; CLDCB uses a MAC for each blob of data that it transports to the server (but CLDCB is not finished yet).

See this:lightningnetwork/lnd#4729 they autosave each channel change.

If it autosaves in the same medium as the database, and that single medium crashes, it is still not much help. Continuous backups to other hardware are still better in general.

I agree that Continuous backups are better, but I don't see the harm from having the SCB as backup.
Thanks for all the great info, I think I'm doing with the Rock64, I'll get a 230$ machine and add two 500GB SSDs.

A backup of the database, using the VACUUM INTO method described in doc/BACKUP.md, is equivalent to a mass SCB of all channels, and has similar safety (SCBs also cannot back up channels to nodes that do not support option_data_loss (possibly lnd no longer allows channels to such nodes), requires your peers to honestly implement option_data_loss, and cannot recover channels created after the SCB snapshot). Is that not enough?

@MrManPew sorry it turns out the VACUUM INTO method described in doc/BACKUP.md has a race condition that might crash lightningd, bleah. Lemme think of some other way to get a kind of snapshot backup. https://github.com/ElementsProject/lightning/pull/4207#discussion_r528206037

I think the benefit of separating the SCB file (or the equivalent one in your case) from the main database is that it only needs to be saved once you have a new channel and it is smaller and - therefore it is much easier to move it to a different location, or to set a different saving location. No need for continues replication. It is in a way like the first backup option that you suggest but a bit more easier to implement due to the small filesize and the need to save only when new channels are opened. (Sorry if I am not understanding correctly and writing nonsense)

Not sure what VACUUM INTO means as it is not in the documentation anymore

Well, if you are relying on the honesty of your peers (which you are if you are relying on SCBs) then you only need to copy the backup file when a new channel is opened as well. Again, they are equivalent backup methods.

VACUUM INTO is a SQLITE3 query to create a backup copy of a database file. It locks the database, which lightningd might not like. In my experimental node I had a crontab running that used VACUUM INTO and it never had problems, but (1) I often bring down this experimental node for various development reasons and (2) I could just be lucky. Recently while investigating database performance on a BTRFS filesystem I wrote a program that uses an SQLITE3 database and spams it with various inserts and deletes and sorted select queries, then tried out the VACUUM INTO command in a separate process, which showed that it could cause an SQLITE_BUSY error on the main process.

I've been messing around with this; the somewhat fiddly thing about the backup.py backup protocol to me is that it's not a simple stream but requires being able to rewind a record. This makes it more involved to mirror these backups remotely, or, say, send them over a SSH-encapsulated TCP stream to a remote host.

So I've been thinking of adding a new record type \x03 that means 'rewind one` and leaving it up to the program interpreting/restoring the backup to delete one record in this case. Is there any drawback to this? Sure, there will be some garbage in the backup, but apart from that?

Edit: so I'd ideally like to have a backup mechanism where a backup client connects to the server running lightningd using ssh, receives the state up until now, then stays connected and receives continuous updates. If the connection drops, the backup client will try to reconnect. When it succeeds to reconnect it will again catch up and receive updates, from the point where it was disconnected. And so on.
This way, if the server crashes, the client will (assuming it was connected at the time it happened) always have an up to date backup.

so I'd ideally like to have a backup mechanism where a backup client connects to the server running lightningd using ssh, receives the state up until now

This sort of requires us to have a way to create a snapshot copy of the SQLITE3 database.

A plugin with a db_write hook can safely copy the SQLITE3 database (and any WAL file as well) during the write hook, since the write-hook ensures the database is "at rest" while the plugin has not responded yet (I did extensive mock-testing). On the other hand, a plugin with db_write cannot safely provide any commands or access any command-line options. So we need this snapshot copying of the SQLITE3 database in lightningd. But we also need some kind of atomicity guarantee, that after this snapshot creation, we are able to also receive any intermediate queries.

So what you want is more easily done in a backup.py-type plugin (the lightningd server connects to the backup server, not the other way around). If the backup plugin is unable to contact the backup server, it can optionally continue operating, or stall until it can re-contact the backup server, but once it manages to re-contact the backup server, it can wait for the next db_write and re-upload the database file and the latest queries.

Thanks for the explanation! It's more clear to me now how the backup.py works and why it works like that.

the lightningd server connects to the backup server, not the other way around

Thinking of it, that's okay. Either some netcat magic or using ssh with -R can turn around the direction of the socket, while keeping the "backup device connects to the server" concept (which is important to me because the backup device can be anything鈥攈ave a dynamic IP, be behind a NAT, etc). But the important part is somehow streaming updates in real time over a socket.

Stalling lightningd while the backup server is disconnected is a clever idea! This makes the setup fail-safe instead of fail-dangerous. No state will ever not be backed up not even for a small time window.

Yes, although your peers might disconnect from you due to data not flowing properly to the lightningd process --- when a db_write plugin is blocked, everything is blocked, including some interactions with peers. This has not been tested extensively I think; the general assumption is that a backup plugin would not take more than a second to save queries, so nobody has tested the situation where a backup plugin stalls for a long time while it is trying to contact the backup server (or waiting for a backup client to contact it, as in your plan), while peers are chatting happily pushing data at us. I would suggest some extensive testing of a testnet node first and see what happens if a db_write plugin stalls for minutes at a time.

Also note that you need confirmation from the backup software that it has received the data 100% okay and stored it on-disk, as that is the model that we assume for db_write, i.e. once it returns, the data is now "safe" on some disk somewhere.

I've been messing around with this; the somewhat fiddly thing about the backup.py backup protocol to me is that it's not a simple stream but requires being able to rewind a record. This makes it more involved to mirror these backups remotely, or, say, send them over a SSH-encapsulated TCP stream to a remote host.

So I've been thinking of adding a new record type \x03 that means 'rewind one` and leaving it up to the program interpreting/restoring the backup to delete one record in this case. Is there any drawback to this? Sure, there will be some garbage in the backup, but apart from that?

Good idea, that is indeed an option. Since the update mechanism of the header gave me a trivial way to go back in time by just storing the previous position as well as the current one I went with that, allowing dumb restores. But I think having some complexity on the occasional restore rather than the way more common backup step is likely better :+1:

Also note that you need confirmation from the backup software that it has received the data 100% okay and stored it on-disk, as that is the model that we assume for db_write, i.e. once it returns, the data is now "safe" on some disk somewhere.

Yes, I've looked at this a bit more. It seems that what I first imagined, a dumb pipe to send the backup over is not really enough. It needs a bidirectional protocol. E.g.

  • At startup the plugin queries what the version and prev_version version is. Based on this it might decide to rewind a version. This information could be cached locally but without feedback there's no guarantee the remote backup matches.
  • To implement "never lose any state" we'd want acknowledgement that the remote has stored a record before continuing. Otherwise a disconnect in the middle of a record might bring the backup in inconsistent state.

I've also noticed that the backups grow rather large quickly, due to large updates. The backup server might, in limited space scenarios, be better off to execute the database commands immediately (and do "poor man's replication") instead of storing the stream as-is. What the server does here is inconsequential to the client, of course.

I've also noticed that the backups grow rather large quickly, due to large updates. The backup server might, in limited space scenarios, be better off to execute the database commands immediately (and do "poor man's replication") instead of storing the stream as-is. What the server does here is inconsequential to the client, of course.

This may matter if the backup server is trusted only to the extent of retaining information, but is not trusted by the LN node to not steal or otherwise resell information. In this model, the LN node would provide encrypted records to the backup service; then the only thing that the backup server can resell would be the IP address of the LN node and the timing of updates, but not any details such as which channels updated, by how much, or any payment preimages. This is the model that my CLDCB project has (currently moribund, will go update it once I am reasonably satisfied with CLBOSS), as the backup server might be run on a cloud server.

The server would not be able to execute the database commands in that case, and can only save the encrypted stream (presumably the encryption can be decrypted by knowledge of the node private key, which is backed up elsewhere). To "roll up" the large log of database records, the server would ask the backup plugin to re-sample the current db and resend it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ZmnSCPxj picture ZmnSCPxj  路  4Comments

igreshev picture igreshev  路  4Comments

brunoaduarte picture brunoaduarte  路  5Comments

rustyrussell picture rustyrussell  路  4Comments

SPIRY-RO picture SPIRY-RO  路  4Comments