Borg: Implement a "safe" append-/readonly-mode

Created on 28 Oct 2016 · 19Comments · Source: borgbackup/borg

The docs claim append-mode can be used to used to prevent hacked clients from _permanently_ altering existing archives. This can be achieved be granting only append-mode access to the client. Then changes to the repository are appended to the transaction log/journal and can be reverted by removing the lastet transactions from the journal.

First, this kind of manual roleback is not state-of-the-art. ;)

Second, disk space is not infinite. Sooner or later a trusted client (or the server) itself will need to free disk space. This requires "true" write access to repository and is done by prune. However archives that have been marked as (to-be-)deleted in append-mode will be wiped out by prune even if the retention policy specified along with the prune invokation should have preserved them.

See: #1689 and #1744

Therefore the trusted client the invokes prune on the repository is responsible for checking the integrity of the repository. But how could the be achieved? When a trusted client runs prune at a time when a hack of a client was not detected yet the prune action will apply any malicious trancations permanently. Then even archives might be purges or compromised that have been created before the hack and should not have been purged according to the retention policy. This would make desaster recovery from borgbackup based backups impossible.

I would like to suggest the implementation of a (new) safe append-, readonly-, worm-mode or whatever-mode that restricts clients to add new archives and rejects any action that would delete or change existing archives. Prohibited actions should be rejected immediately and therefore should not go into the journal at all.

Bountysource security

Source

MichaelHierweck

👍6

All 19 comments

Yes, currently one has to be sure about having a "valid" (untampered) repo state before writing to it with append-mode=0.

borg list repo, borg list archive, borg extract --dry-run archive can help here, but making really really sure might be difficult (and slow).

We could have something better if we could disallow delete tags within a no-delete mode.

ThomasWaldmann on 28 Oct 2016

I reviewed the code where repository.delete(id) is used:

by borg delete archive in Archive.delete() (via chunk_decref())
by borg debug delete-obj
by borg check --repair
- with --verify-data in verify_data() to remove corrupt objects (so that they will be replaced by non-corrupt ones by later backups, hopefully)
- in orphan_chunks_check() (to remove unreferenced objects)
by borg create in Archive.write_checkpoint() to remove the checkpoint archive item again after it has been saved/committed (so that the next checkpoint [or final] save/commit will replace it without creating unreferenced stuff)

The first ones are more or less expected and unproblematic (we just need to fail them early if there is no delete capability) - they don't need to be done from a not-that-much-trusted client (but can be done from a more trusted machine).

The last one is more problematic, can we solve it better than just switching off checkpoints completely?

ThomasWaldmann on 30 Oct 2016

We could also just keep checkpoints in "no delete" mode.
But I think the real problem is not "delete" operations, it is put. Mostly put for the manifest is very big hole. (we could ignore all other puts, because they are supposed to contain the same data, although we can‘t check because of encryption)
I think what we need for a safe append only mode is that the appended archives are _not_ stored in the repo manifest but managed by the borg server. i.e. we would need a new RPC operation "add_archive" that either takes the chunk-id or maybe even the whole archive chunk. That way the server could even implement a policy where only the last one in one connection is persisted. Thus there would not be a pile up of archives for each checkpoint.
This of course is a bigger change, as all clients that interact with such an repo need to be able to see the append only archives using further new rpc commands.
The trusted client might merge all of these into the manifest to create a repo that would be compatible with older clients again, or maybe just because it is more efficient.

Still problematic: A client can put chunks that claim to contain the data for some chunk-id but do not (either corrupted, or something else). I don‘t think there is anything we really can do about this. The trusted client could download and check these chunks, but that‘s a bit late. Also a bad client can put chunks not linked from any archive, although borg check would be able to clean this up.

textshell on 1 Nov 2016

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

enkore on 1 Nov 2016

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

ThomasWaldmann on 1 Nov 2016

I don‘t think we need to loose all hope for something that works well enough. Fundamentally the borg model distrusts the server, so we can‘t get perfect security here. But i hope we can do enough that borg backups can have a reasonable trust level.

We basically want to prevent one evil client to interfere with other clients backups and with backups of the client before it became evil. I don‘t think there is any way (in any setup) to make sure that a client doesn’t sabotage its new backups.

Maybe we should think about kinds of attacks here. One that springs to mind is for example the crypto trojan. An evil client just wants to destroy the backups to prevent undoing it‘s damage. For evil clients that want to do data ex-filtration we already have #672 or #1164. What are other major attacks an evil client might want to do?

One nice thing would be to be able to restrict clients to a certain (set?) of prefixes. This would likely be another --restrict--something option.

I think just using the first put is a viable strategy. Excluding the manifest (maybe just by refusing puts to it‘s id in this mode), a bad client needs to predict the id of an chunk another client will want to save. This should be hard for most client unique data. On the other hand it would be easy for data say from a distribution update. But restore errors in distribution files are just an hassle. Nothing that would force a user to for example pay ransom to a crypto trojan.

Even further a client could validate already known chunks with a certain probability. This would guard against non malicious corruption or if a client massively poisons the repository. Ideally it would check "new" chunks with a higher probability. (detecting new chunks would mean tracking trusted chunks (i.e. written from this client) separately on the client, which of course is more work.

textshell on 2 Nov 2016

Still don't get how one would defend against a low-level crap-chunk-putting client while being able to run delete or prune now and then (see first post).

ThomasWaldmann on 4 Nov 2016

Another threat scenario would be a user that uses some kind of cloud syncing solution.
Evil client syncs some file (thesis.tex) first. It now knows how this will be chunked and can poison those chunk ids with bogus data.
Now even if the file is synced to a good client later that client can hardly fix the damage of the evil client. I don‘t see a feasible way to defend against this, apart from the cloud syncing service also having backups. Then again the evil client could also just replace the file in the synced folder with crap and hope it will be synced to the good client before it has backuped the correct version.

textshell on 5 Nov 2016

To summarize:
Add a new client restriction to borg that restricts delete and overwriting capabilities of a client.
Such a client:

can not write to the manifest
can not prune or delete anything
has to register new archives using a new remote call with the server
The server should save a secure client id with each archive that is registered in this way, for later validation.
The client should be able to replace a previous checkpoint that was created in the same connection with a new one. The server has to check that this is really in the same connection.
checkpoints that are only later "resumed" can not be deleted.
the chunks that would be deleted in checkpoint rollover need to be added as metadata in the most recent checkpoint while replacing checkpoints
puts to chunk ids that the server already has are ignored. (should contain same data as already stored or are evil)

All borg clients:

need to use a new api to load all separatly registered archives in addition to useing the list from the manifest.
client could validate already known chunks with a certain probability to guard against corruption.

A trusted client that e.g. does purge:

needs to check that an archive is created from the expected client, else report to the admin
might want to merge correct archives into the manifest and remove them from the separate list.
might want to check new chunks added (possible a random sample)

textshell on 5 Nov 2016

👍1

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

Status update for that: Prototype is working.

What I've been up to here is essentially a backup system built on top of Borg, where you only have one trusted party, a central backup server that controls access to repositories.

This works by having (among some higher level coordination that is kinda required to make it all work) a reverse proxy that the (untrusted!) clients use to access a view of the target repository.

This provides:

Clients can't read or mutate archives in the repository
Clients can't push bad data (id != id(data) -- they can still write bogus metadata etc. my plan is to thwart that in the cache sync phase on the server - ditto for bogus orphans [the RP can create a delta-index])
Clients don't know the location of the real repository
Clients don't get the encryption keys for the real repository
Hence clients could not access the data in the real repository even if they gained access to it
Clients don't maintain a cache, and no archive caches are needed anywhere
But still full deduplication across all clients

Code: https://github.com/enkore/borgcube (please heed the notes in the readme)

enkore on 6 Nov 2016

❤1

Actually, I got a little lost in all those issues about 'hacked-server', 'append-only', 'append-only not save with prune' and so on. So excuse me if I'm not commenting in the right/most-appropriate place...

If I understood the current situation correctly:

--append-only will save your backup-data in case some client tries to delete stuff from your repo, (by only tagging chunks 'to-be-deleted', but not beeing able to delete them)
when leaving --append-only and executing --prune (or some other operation), that will delete everything that is to-be-pruned and tagged as 'to-be-deleted' by previous repo-accesses from --append-only runs.

are those assumptions correct? I'm new to borg and try to get my head around all this stuff, so please correct me if I'm wrong.

What I'm thinking about is:

the combination of --append-only and pruning from a trusted client is save, as long as you are sure that your clients/your repo have/has not been tampered with when you do the pruning.

so what about introducing something like an 'incubation-period' aka: prune all transactions that are older than [insert user-supplied time span here]. That would mean, If I have plenty of space, I will keep all transactions of the current Year, but pruning the stuff that is further past than that year.
My intention on that is: If one of my clients gets evil I will notice that at some point in time. If I have the transactions 'unpruned' since that client got evil, I can easily recover from that, by deleting its transactions. The conclusion is: If i am sure that none of my clients were evil in the last year, I can prune the transactions that are further past than one year without loosing data.

That would allow to save some space, prune now and then and have some kind of 'incubation-period' for me noticing that one of my clients got evil without it tampering with all my backups.

Depending on the users choice and trust in their machines they could choose a reasonable 'incubation-period' for them to notice something went wrong before that could creep in their backups.

As I couldn't get my head around the --append-only logic completely, I'm not sure if that is even possible like that, but wanted to share that idea. Is it possible like this?

MK-42 on 13 Feb 2017

@MK-42 yes, that's correct.

repo commits do not have timestamps, so we can't consider time.

ThomasWaldmann on 13 Feb 2017

In -ao mode there is the transaction log which could be parsed back, but this sort of thing definitely requires RPC updates -> something for 1.1+

Also I'm not super-convinced that this would be a big improvement over simple -ao, since it requires even more knowledge of internals to grasp and is even harder to use. Either is stop-gappy...

enkore on 13 Feb 2017

I created a $100 bounty. I encourage others who would find this useful to contribute!

lucassz on 1 Jun 2019

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

To reiterate the problem and make sure that I understand it correctly now, after reading the ~10 various currently existing issues loosely requesting new types of read-only/write-only/etc mode, they all seemingly stem from the fact that "--append-only" mode as it exists right now is mostly broken in real life usage. (It is not technically broken, as it does what says in the docs, but in reality most users will want combine it with pruning old data on the server, which will make every deletion/corruption previously masked and prevented by the append-only mode permanent. Thus if administrators want to use pruning, they are now expected to somehow inspect all repositories before every real prune (which is usually done often using a scheduling mechanism), which is completely unrealistic. The only real use case for append-only, the only time when it can prevent corruption/hack is when an attack has been detected immediately, and an administrator has been notified and reacted immediately to stop pruning batch jobs and started inspecting the state of the repository immediately after the attack. (Or if no prune commands are ever issued on a repository at all.))

The difficulty in implementing a fix seems to be rooted in the fact that the client-server model of Borg allows a client to issue low-level simple commands (who ever thought that up as a viable way to design it?) such as "PUT" or "GET" on individual blocks or indexes or repo files, and most of these commands are required for both creating, deleting, removing and purging at the same time, and so simply banning certain low-level commands does not work because they are used in a normal "create" command as well, and so banning them would prevent any operation (even creating a new backup). Does this assessment sound correct?

If so, the only two ways we have to implement "real" append-only/write-only, in a meaningful way that many people expect is
1) To implement a clever heuristic/well planned analyzer or rights management system on the server which will interpret and disentangle the stream of low-level commands sent to it by the client in order to make an educated guess about whether the high level operation the client is trying to do is legitimate and valid, and then restrict/allow it accordingly.
2) Change the underlying architecture of the client-server model in Borg, and finally stop exposing low-level commands that should only be done in the server to clients, giving a new API which will then be easily restricted.

Judging by the number of open issues, the breadth of discussion and the different ideas, the lack of consensus, the timespan, etc., solution 1) is proving to be very difficult to design implement.

How far are developers from the decision to invest in the solution 2)? Is it a viable alternative at all, how much reorganization would it require? How long time would it require to implement? Can it be done? Would such a big change even be accepted as a pull request?

imperative on 11 Oct 2019

@imperative Not exactly. The basic security model says that the server is the untrusted part. This is needed for (data at rest) encryption to be actually meaningful. So the server can not do much high level operations. This is on purpose. Of course the server always can drop data to make the backup disappear.

I've outlined my view of this in https://github.com/borgbackup/borg/issues/1772#issuecomment-258575677. Which i still think is viable.

This adds a bit more trust to the server, as now the server sees encrypted archive data separately instead all in a big block, but this should be tolerable, because it is still encrypted and the previous usage patterns are likely to leak the exact same data for creates (assuming the crypto is good). prune/manifest compaction should not expose to much details either.

In a situation with multiple (untrusted) clients accessing one repository it still has the problem that an evil client can poison the repository with chunks claiming an id that does not match the contained data. In my model the (weak) defense against this is having the client check random chunks. A secure defense would be to have a client keep track of validated chunks and download and validate each chunk that is needed in an archive that this client did not yet validate.

For single client repos this is not really a problem as long as you keep in mind that only backups done before your client has been compromised are reliable. As those will always have their data already in the repository before the evil client comes along and already existing chunks can not be erased or replaced it can not spoil the old archives. (defend against the crypto malware use case)

textshell on 13 Oct 2019

I am testing borgbackup and I found also this problem with its architecture.

I would like to share an idea which I don't know if it is realistic. Could we implement the pruning in the server side?. If the server saves the last date of when a chunk was required by any archive, then maybe the server can delete the chunks that have a date older than the configured one for pruning.

If a chunk belongs to 3 archives: 1 month old, 1 a week old and 1 a day old. The date for the chunk would be the one of the "day old" archive. If that chunk stops being used in new archives, it will retain that date so when a month (or whatever date) passes, the server can remove the chunk, bypassing the append-only.

This way:

You can still have an append-only so clients cannot remove the backups, but the server will free space.
You have to configure the pruning on the server side.
The amount of information required for the server to do the pruning is minimal and can be acquired by the trusted client.
The client cannot make a chunk older than it is, so I don't think this can be exploited.

However, I don't know if this is feasible or if I am getting anything wrong (probably I am). Anyway I just wanted to share it.

diego-treitos on 23 Oct 2020

The server doesn't know when an archive references a chunk due to encryption.

enkore on 23 Oct 2020

The server doesn't know when an archive references a chunk due to encryption.

I guess so, but I was wondering if it could be possible for the server to store that information (the client could send it). The amount of information is minimal and it doesn't look that it could disclose anything about the contents of the backup. Only getting that information should allow to prune the archives server side which is a big improvement in security.

The only additional information required is to associate a chunk with a date.

diego-treitos on 24 Oct 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings