Restic: Make restic work without permission to delete files on GCS

Created on 11 Jan 2018 · 34Comments · Source: restic/restic

Output of `restic version`

restic 0.8.1
compiled with go1.9.2 on linux/amd64

How did you run restic exactly?

export GOOGLE_PROJECT_ID=xxx
export GOOGLE_APPLICATION_CREDENTIALS=
restic --no-lock -p -r gs::/ ~/doc
password is correct
scan [/home/gebi/doc]
scanned 1852 directories, 5764 files in 0:00
Remove() returned error, retrying after 424.215262ms: client.RemoveObject: googleapi: Error 403: SERVICE-ACCOUNT does not have storage.objects.delete access to /locks/991ceef8fd55609bb08303bf43c637d12de122800627e71492d10be6474334f0., forbidden

What backend/server/service did you use to store the repository?

GCS (google cloud storage bucket)

Expected behavior

not creating lock files when called with --no-lock, because the service account has actually no permission to delete objects in the GCS bucket.

Actual behavior

restic creates lock files which is then not able to delete.
The backup is successfully created, but restic hangs on deleting the lock file needing a kill. (strg+c does not work)

Steps to reproduce the behavior

Create GCS bucket and service account with service account having only the following permissions in the bucket:

Storage Object Creator
Storage Object Viewer

Do you have any idea what may have caused this?

yes, missing permissions and --no-lock which creates lock files

Do you have an idea how to solve the issue?

not create lock files :)?

Did restic help you or made you happy in any way?

AWESOME tool, i use it daily, especially since GCS support was added (thx again for the effort!).
We are currently testing the new permissions in GCS and try to get a setup where the local machine is not able to delete it's own backups anymore.
(GC not working is a non-issue in this case for me).

backend feature suggestion

Source

gebi

👍1

Most helpful comment

Maybe another way could be option of having lock files stored separately (for example in a bucket that you DO have write access to)?

I wonder what everyone else does to resolve this security situation? If backuping machine gets compromise how do you make sure attacker does not just delete all the backups?

matejdro on 14 Apr 2019

👍3

All 34 comments

Hey, thanks for raising this issue. The --no-lock option is only for supported operations, like check. Any operation that may add data (such as backup) does not support it, that's not the way the repo was designed.

Is it maybe an option to grant the service account deletion on the locks/ subdir?

Or backup to a local directory, and then use e.g. rclone to sync new files to the cloud?

fd0 on 11 Jan 2018

Is it maybe an option to grant the service account deletion on the locks/ subdir?

That is not possible in GCS.

tamalsaha on 11 Jan 2018

@fd0 restic backup with --no-lock works here though and i'm deleting everything under /locks after a few hours automatically.

% myrestic check
password is correct
load indexes
check all packs
check snapshots, trees and blobs
no errors were found

no, that's not posible because of size constraints.

gebi on 11 Jan 2018

@gebi "it works" means restic does not return an error when you specify --no-lock, but for the backup operation it will still create a lock file. The --no-lock switch is checked for each operation individually, and only some (like check) respect it.

I understand your use case, but I must say that I'm very reluctant to add support for --no-lock to backup, because of the high potential that people use it without having understood what it's for. For example, let's say a backup takes much longer than anticipated, and while the backup (without a lock) is still running, the prune operation is started on a different machine. It won't see any lock, and since the other process isn't finished yet, it won't see the snapshot it created. So the prune process won't know which data is referenced by the new snapshot, and it will even remove newly uploaded files, since these aren't referenced by any existing snapshot. Then the backup process is done, uploads a new index (referencing removed files) and a new snapshot that cannot be restored any more.

How do you make sure that there's no restic backup process running when you run prune?

fd0 on 14 Jan 2018

@fd0 The service account used by restic is simply not allowed to delete files on the GCS bucket. So a collision between those two commands is not possible.

IMHO it would be a very important property and worthwhile goal to support GCS as a tamper-proof storage in restic because intruders more often than not go on and delete everything they can find including backups, with a service account with full delete/rewrite permissions the backup is effectively worthless.

@fd0 but as you said, there should be a warning printed that there is a possibility for data corruption if some priviledged account executes prune in parallel (maybe possible to supress with some i-know-what-i-m-doing switch).

ah... and restic prune is never called on this datapools, they are sharded by year and only whole years are deleted (because deleting individual snapshots out of a repository is unnecessarry and too slow).

gebi on 17 Jan 2018

@fd0 Any idea to my suggestion?
If i understand the code correctly it would just be not creating any lock files and thus automatically a clean exit from restic (currently it needs to be killed with -9).

Supporting tamper proof backups in restic would be awesome as pretty few backup systems support such a mode and it is a property you nearly always want.

As for parallel prunes, why not use a two-phase "commit' for prunes?
restic prune only writes the objects it would delete out to a file, on the next run it would run normally meaning first create a list of objects to prune from the whole repository, and after that only delete files found in the list written on the run before. All objects not in the list are written out into a new list. This would remove the need for any locking with the safety guarantee to not delete objects from running backups which where shorter than the prune interval (which if prune only runs every month is quite a good safety margin).
This idea would have the additional advantage as to not need a new repository format too.

gebi on 1 Feb 2018

See #1141 for lock-less prune discussion.

ifedorenko on 1 Feb 2018

@ifedorenko yes, i've read the discussion, but as already mentioned there the duplicacy aproach needs an update to the repository format, this simple albeit not that fancy aproach does not and also does not have extensive requirements regarding atomic operations in the backend store.

gebi on 5 Feb 2018

I was merely suggesting to discuss lock-free prune in #1141, so we have all ideas in one place.

ifedorenko on 5 Feb 2018

I've decided to not add support for --no-lock for backup, at least not for now. If you want this behavior (and it feels to me you know what you're doing), one way is to patch it into restic manually and build it yourself. Here's a patch: https://gist.github.com/fcaf7a0cbc35b4e0bebc901fbacd3860

fd0 on 25 Feb 2018

@fd0 that's unfortunately as we really require the functionality for WORM backups (write once read many), and most backup users do so too, they just don't realize it, or only after their first compromise where most backups are then deleted too.

I've forked restic and made the first release of restic-worm that we have to use and will keep up to date to your upstream restic as it makes sense for our usecase.
It's currently backing up about 10PB of data and running fine so far, i really wish we would have found a possibility to work together and add this functionality to upstream even if it meant we would have to test it and keep it in shape, a fork is of no help for both sides.

https://github.com/mgit-at/restic/tree/v0.8.3-worm
https://github.com/mgit-at/restic/tree/backup-nolock

gebi on 4 May 2018

@gebi I can understand your use case and what you're trying to do. In my opinion, just adding the small patch to allow --no-lock during backup has the potential to be (ab)used by way too many users in the wrong way, which may lead to data loss. That's the reason I don't like just adding it.

In general, the pruning process is not optimal, and even using lock files is unfortunate, especially for your use case. It was what I came up with during the initial design phase, and it's the simplest to implement. We will change it and move to something better in the long run, for sure.

I agree that a fork is unfortunate and won't help us both, even if it is only the added --no-lock to support your use case. I could live with a patch that enables this only with a special worn build flag that disables the prune command altogether and runs backup and check without lock files. Would that maybe work for you?

I'm very interested in your results of working with 10PB within a restic repo, that's awesome!

fd0 on 4 May 2018

@fd0 hmm... the more i think about that the more i came to the conclusion that maybe we should just refuse to allow restic prune on such backends receiving snapshots with no-lock.

So why not let the restic snapshot --no-lock create a lock file if it's not already there but just not delete it?
This would prohibit restic to run any prune operations on the data in parallel, but would not restrict future snapshots (as far as i can see they just work, regardlessly of how many lock files are present).

If the user want's to prune such storage backends _he_ has to ensure no one is writing to it concurrently, which is totally fine for the intended usecase.

If we get #1141 into usable shape that might lift that restriction later on, but for now i would be totally fine to restrict it, it would just be awesome to have the functionallity for writing with snaphots to WORN GCS buckets in upstream.

would this path work for you?
It would create a safe way to do WORN backups, have the functionality included in upstream, and concurrent prune is the responsibility of the user if he wants to use/implement it.

gebi on 9 May 2018

I assume you mean restic backup --no-lock? This would work for backup, but you'll end up with many lock files over time...

For other operations (such as snapshots) changing the behavior won't work: Originally we've added the --no-lock switch in order to support accessing a repository on read-only media (like DVDs).

fd0 on 8 Jun 2018

@fd0 yea i meant restic backup --no-lock, yes many lock-files would be the outcome but they could either be deleted from the machine pruning the data or first check if at least one lock file exists and only create a new one if no lock file exists.

And for operations such as snapshots it would be either be the same behaviour as now or ignore the error?

gebi on 9 Jun 2018

Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in.

fd0 on 9 Jun 2018

Yea that would be awesome!

btw... the current behaviour is a loop with error output where restic can't be killed normally but only with kill -9 (for this case)

gebi on 9 Jun 2018

Uh, that's not good, thanks for pointing it out again.

fd0 on 9 Jun 2018

I think you should be able to add full write and delete permissions only for the lock folder by using ACL's:
https://cloud.google.com/storage/docs/access-control/lists

I did not test it myself though and could be mistaken.

lukastribus on 30 Jul 2018

@lukastribus it would be awesome if that works, but it doesn't.

There is no "folder" lock/ to put ACLs on it, google cloud storage buckets don't work like that, sorry.
One would have to put ACLs on each individial object within the namespace lock/ but that would defeat the purpose.

gebi on 30 Jul 2018

@fd0 any news on the feature "Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in."

It would be really awesome if you could add this to restic, would make our life a whole lot easier and it would IMHO be a worthwhile addition.

gebi on 30 Jul 2018

What if it was a specific exit code? It could be useful to know that there's a stale lock in the repo...

mholt on 30 Jul 2018

👍1

@fd0 any news on the feature "Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in."

Nope, no news unfortunately. I don't have much time at the moment, so somebody needs to actually do the work here and build a prototype ;)

One small issue which isn't mentioned here (as far as I can see) is that when long-running operations such as backup run, restic replaces its own lock file every few minutes with a new one with a new name. So there's no "single" lock file, but a bunch of them.

What if it was a specific exit code? It could be useful to know that there's a stale lock in the repo...

Good point, hm.

fd0 on 31 Jul 2018

The base of this discussion is making the --no-lock flag available for backup, which already exists as code (from you). It's also what we are currently using to backup everything.

@fd0 what would be your preferred way? i've submitted the patch we are using on top of restic (which was thankfully provided by you, i've just rebased it to master). #1917

gebi on 1 Aug 2018

Maybe another way could be option of having lock files stored separately (for example in a bucket that you DO have write access to)?

I wonder what everyone else does to resolve this security situation? If backuping machine gets compromise how do you make sure attacker does not just delete all the backups?

matejdro on 14 Apr 2019

👍3

I'm running into this same issue as well. I have no need to (and because of data requirements can't) prune backups. Running backup --no-lock would be the perfect solution for me. Is there a viable workaround out there?

parkerp1 on 24 Jun 2019

@parkerp1 We're using effectively the method suggested here, which is to have restic talking to two copies of rclone fronted by tinyproxy, and having one read/write bucket for the lock files and one write-only bucket for the data. The article is about Wasabi, but the back end is irrelevant and we're using it with GCS. While not particularly clean, Dockerizing this solution helps to hide the complexity.

sdudley on 25 Jun 2019

👍1

@parkerp1 we are still using a small patch on top of restic https://github.com/mgit-at/restic/tree/backup-nolock to backup a few hundred TB of data. Works like a charm.
Sadly it was rejected upstream...

gebi on 25 Jun 2019

👍1

Thanks @gebi and @sdudley. Both look like good options

parkerp1 on 25 Jun 2019

I agree with @fd0 that adding a --no-lock flag would be dangerous. If someone was getting a lock error they might try adding that flag to get past it and end up corrupting their data.

Instead perhaps on init there could be an alternate lock file location specified (with the fact that an alternate lock location is being used stored in the original repo). Then any command that is invoked and forgets to specify the alternate lock location could fail (e.g. "error: repo uses alternate lock location and no alternate location given"). The commands would also fail if an alternate lock location was provided but the original repo wasn't setup to use an alternate lock location.

This would make the advanced behavior possible and also keep the regular commands pretty foolproof for those not using the advanced behavior.

onionjake on 19 Nov 2019

👍2

I'm running into this same issue as well. I have no need to (and because of data requirements can't) prune backups. Running backup --no-lock would be the perfect solution for me. Is there a viable workaround out there?

Two options off the top of my head:

Use rest-server with the --append-only flag - this lets your users only back up to their repositories, they won't be able to delete data.
Use a filesystem on the repository server that allows you to snapshot the relevant parts of the storage. I would recommend using ZFS because that will be extremely cheap snapshots, and easily accessible if need be. This will not be the same thing as not allowing deletions, but you will be able to always still have a copy of the latest snapshots anyway, so any deletions are pointless.

rawtaz on 8 Apr 2020

Another approach would be to use object versioning. With that enabled, deletions may be allowed for the service account used by restic as they only actually add deletion markers to the version history. Only permission for the DeleteObjectVersion action must be refused in order to prevent an attacker from doing permanent damage.
Frankly, I am not sure about GCS, but I'm currently implementing this on S3/Wasabi and it looks promising.

BenBipod on 17 Apr 2020

@fd0 Would an environment variable ala RESTIC_DANGEROUSLY_DO_NOT_LOCK_BACKUP be an option? :)

apollo13 on 31 Aug 2020

@fd0 Would an environment variable ala RESTIC_DANGEROUSLY_DO_NOT_LOCK_BACKUP be an option? :)

No, I don't think so. I've outlined in https://github.com/restic/restic/issues/1544#issuecomment-386549926:

I could live with a patch that enables this only with a special worn build flag that disables the prune command altogether and runs backup and check without lock files.

fd0 on 5 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Very slow backup of SSHFS filesystems -- need ability to skip inode-based comparison

McKael · 4Comments

Case insensitive exclude file

cfbao · 3Comments

Slow, although it is just checking timestamps

viric · 5Comments

Restoring with --include creates directories with wrong permissions on the way down.

rakor · 5Comments

The future of restic development

ikarlo · 4Comments