K3s: backup/restore procedure [question]

Created on 11 Mar 2019 · 12Comments · Source: k3s-io/k3s

I am looking for backup/restore procedure in the docs and cannot find one

any help here?

Unscheduled kinquestion

Source

eleaner

Most helpful comment

At least there needs to be documentation on how to backup k3s cluster state. It could be really simple, it just needs to be documented!

VladimirAkopyan on 3 Jun 2020

👍17

All 12 comments

(source) ...step two is to completely rework the state management such that you no longer need to persist state in k8s. This is a much larger goal. I've fooled around a lot with the persistence layer in k8s (this project is running on sqlite3) so I'm basically doing a lot of work of figuring what state I can throw away. The theory is that all desired state comes from your yaml files. Actual state is actual (what really exists). Everything else in k8s should be purely derived and not important.

(source) ...but right now HA is not available, you can only create a high resilient system by backing up and restoring the single master

It sounds like the ultimate goal is for the backup/restore process to be dead simple. At this point, is all state that can't be regenerated automatically stored in SQLite or is there important state elsewhere?

bencompton on 13 Mar 2019

The video from the meetup yesterday answers my question, and it sounds like all important state is indeed stored in SQLite:

(source) ...it's just a single file you have to backup if you want to manage it and do some type of automated backup and restore

(source) ...you can get a highly-resilient setup, which means if the master goes down, you can restore it very quickly somewhere else because the state is stored in a single file

If that's the case, a cron job regularly running sqlite3 .backup should do the trick. Then restoring should be as simple as spinning up a new master from your original image and running sqlite3 .restore. I haven't tested this out myself, though.

bencompton on 14 Mar 2019

👍2

The video from the meetup yesterday answers my question, and it sounds like all important state is indeed stored in SQLite:

(source) ...it's just a single file you have to backup if you want to manage it and do some type of automated backup and restore

(source) ...you can get a highly-resilient setup, which means if the master goes down, you can restore it very quickly somewhere else because the state is stored in a single file

If that's the case, a cron job regularly running sqlite3 .backup should do the trick. Then restoring should be as simple as spinning up a new master from your original image and running sqlite3 .restore. I haven't tested this out myself, though.

That's great. But the slqlite should switch to etcd in HA mode, and can use etcd snapshot to do this.
FYI: https://github.com/rancher/k3s/issues/42#issuecomment-468343584

warmchang on 15 Mar 2019

Since we use sqlite and maybe etcd in the future it might be nice to have a solution which doesn't depend on implementation. I was trying to find a backup solution that might just use shell scripts and kubectl, not sure of the feasibility of something like that tho.

erikwilson on 15 Mar 2019

So to confirm, cluster state backups are essentially just backing up /var/lib/rancher/k3s/server/db/state.db?

I have file/database backups, but my database backup solution (KubeDB) requires that the database superuser credentials are identical for WAL restores, and those are autogenerated and stored in secrets. So I need some way of keeping those secrets around if I have to restore backups, and I'd rather just let them be autogenerated and leave them in-cluster than manage them myself.

Thanks.

ndarilek on 30 May 2019

This issue is a bit old now, I'm wondering if things changed. Can we get some pointers about the _current_ recommended method of backing up/restoring a k3s environment?

Thanks!

ahmedmagdiosman on 17 Jul 2019

👍16

huapox on 8 Oct 2019

👍2

I’d like to reset k3s without having to download the container images again.

So beyond the database, I'm not sure which directories should also be restored after a reinstall.

I have a single master+agent server.

brandonkal on 6 Mar 2020

With some investigation, it appears restoring the state.db file works (as long as the shm and wal files are deleted before restore).

I looked through containerd folders and note that the image blobs are here:

/var/lib/rancher/k3s/agent/containerd/io.containerd.content.v1.content

However, deleting the other folders inside /var/lib/rancher/k3s/agent/containerd
and then restarting the k3s service appears to cause issues. My pods then show:

MountVolume.SetUp failed for volume "cert-manager-cainjector-token-92bgb" : failed to sync secret cache: timed out waiting for the condition

followed by a Pulling image event. This is unfortunate as I'd like to use k3s in a low-bandwidth location and pulling images constantly is painful.

@ibuildthecloud do you have any recommendations or should I just be snapshotting the entire "/var/lib/rancher/k3s" folder?

In my mind I should only need to backup the blobs and the database in order to revert back to a working state with minimal downloads but doesn't appear that way.

brandonkal on 6 Mar 2020

@brandonkal this isn't really a backup/restore issue, but it sounds like you want the airgap install option with pre-downloaded images.

https://rancher.com/docs/k3s/latest/en/installation/airgap/

brandond on 6 Mar 2020

At least there needs to be documentation on how to backup k3s cluster state. It could be really simple, it just needs to be documented!

VladimirAkopyan on 3 Jun 2020

👍17

What other files should be backed up aside from the db?
I assume the following are required as well:

/var/lib/rancher/k3s/server/cred
/var/lib/rancher/k3s/server/node-token
/var/lib/rancher/k3s/server/tls
/var/lib/rancher/k3s/server/token

Can anyone confirm?

eranyanay on 16 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Initial "v1" HA Support

davidnuzik · 3Comments

[Request for implementation]: Integrate rancher longhorn as storage orchestrator

giminni · 3Comments

Should it be possible to set up log shipping to elastic via fluentd?

e-nikolov · 3Comments

Installation Fails on Fedora 33 IOT Edition

seanmalloy · 3Comments

K3s reapplies "master" node-role label to single node cluster on server restart.

jgreat · 3Comments