We definitely need to mention snapshot/restore on here. Things to cover:
consul snapshot inspect can help you figure out which snapshot is better.https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/consul-tool/_TitQGHdRSA/18mZiFnJCQAJ
It would be good to discuss how to use the automatically created snapshots also (they seem to be created regularly ever few hours).
cd /var/lib/consul/raft/snapshots/2-146226-1481465408058
[in snapshot directory]
# sha256sum * >SHA256SUMS
# tar -czf /tmp/recreated.snap *
# consul snapshot restore -token=... /tmp/recreated.snap
Restored snapshot
I'm exploring using the new snapshots feature as a backup mechanism as a mitigation tactic against accidental data loss.
I noticed that in the snapshot docs it says:
Restores involve a potentially dangerous low-level Raft operation that is not designed to handle server failures during a restore. This operation is primarily intended to be used when recovering from a disaster, restoring into a fresh cluster of Consul servers.
Can you add some clarification of what exactly that means? Conventional wisdom about database backups is that you should exercise them regularly. If we were to use the restore operation on our most recent snapshot weekly would we be at risk of data loss?
Probably unrelated to this issue, but is there some feasible mechanism that could be added to restore only certain keys? For example, imagine that 1000 keys were deleted 6 hours ago and many other keys were modified/updated since then. Would there be a way to restore only the 1000 deleted keys? Or do we need to keep our own separate dump of kv pairs and restore them through the normal /v1/kv API?
Edit: looks like once we upgrade we can use kv import and kv export to replace our JSON dumping process, but it uses the same /v1/kv API so will perform at the same speed
Can you add some clarification of what exactly that means? Conventional wisdom about database backups is that you should exercise them regularly. If we were to use the restore operation on our most recent snapshot weekly would we be at risk of data loss?
There's a little more detail in the comment here. The restore is implemented by having the leader take on the state of the snapshot and then bump the raft index which creates a "hole" in the Raft log, which causes the snapshot to go out to its followers. This means that the server commits the restore before replicating anything to its followers, which is weird from a Raft perspective, and could leave the cluster in an incorrect state if the leader were to die during that restore operation. If that happened you might have to blow away your server state and do the restore into a fresh cluster to recover. This should be a very unusual case to hit in practice (and the restore API returns success only once the followers have replicated the snapshot itself), but we wanted to fully disclose this possibility.
Where is this outage recovery guide?
The link mentioned in the groups was 404
@richard-mauri https://www.consul.io/docs/guides/outage.html
The thing that's still missing from
https://www.consul.io/docs/guides/outage.html is the simple restore of a
snapshot. On our cluster we take and save regular snapshots; of course
Consul takes them as well. Snapshots can easily be restored to build a
cluster even from scratch.
Our disaster restore process is at
https://github.com/drud/vault-consul-on-kube/blob/master/troubleshooting.md#complete-loss-and-rebuild-with-recovery-using-a-consul-snapshot
-Randy
On Fri, May 26, 2017 at 9:17 AM, James Phillips notifications@github.com
wrote:
@richard-mauri https://github.com/richard-mauri
https://www.consul.io/docs/guides/outage.html—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/consul/issues/2583#issuecomment-304310012,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAG3PDc1Xda12souqilut-vpsYjLBNQzks5r9u0UgaJpZM4LIJtV
.
--
Randy Fay
[email protected]
+1 970.462.7450
Most helpful comment
It would be good to discuss how to use the automatically created snapshots also (they seem to be created regularly ever few hours).