The docs claim append-mode can be used to used to prevent hacked clients from _permanently_ altering existing archives. This can be achieved be granting only append-mode access to the client. Then changes to the repository are appended to the transaction log/journal and can be reverted by removing the lastet transactions from the journal.
First, this kind of manual roleback is not state-of-the-art. ;)
Second, disk space is not infinite. Sooner or later a trusted client (or the server) itself will need to free disk space. This requires "true" write access to repository and is done by prune. However archives that have been marked as (to-be-)deleted in append-mode will be wiped out by prune even if the retention policy specified along with the prune invokation should have preserved them.
See: #1689 and #1744
Therefore the trusted client the invokes prune on the repository is responsible for checking the integrity of the repository. But how could the be achieved? When a trusted client runs prune at a time when a hack of a client was not detected yet the prune action will apply any malicious trancations permanently. Then even archives might be purges or compromised that have been created before the hack and should not have been purged according to the retention policy. This would make desaster recovery from borgbackup based backups impossible.
I would like to suggest the implementation of a (new) safe append-, readonly-, worm-mode or whatever-mode that restricts clients to add new archives and rejects any action that would delete or change existing archives. Prohibited actions should be rejected immediately and therefore should not go into the journal at all.
Yes, currently one has to be sure about having a "valid" (untampered) repo state before writing to it with append-mode=0.
borg list repo, borg list archive, borg extract --dry-run archive can help here, but making really really sure might be difficult (and slow).
We could have something better if we could disallow delete tags within a no-delete mode.
I reviewed the code where repository.delete(id) is used:
borg delete archive in Archive.delete() (via chunk_decref())borg debug delete-objborg check --repair--verify-data in verify_data() to remove corrupt objects (so that they will be replaced by non-corrupt ones by later backups, hopefully)orphan_chunks_check() (to remove unreferenced objects)borg create in Archive.write_checkpoint() to remove the checkpoint archive item again after it has been saved/committed (so that the next checkpoint [or final] save/commit will replace it without creating unreferenced stuff)The first ones are more or less expected and unproblematic (we just need to fail them early if there is no delete capability) - they don't need to be done from a not-that-much-trusted client (but can be done from a more trusted machine).
The last one is more problematic, can we solve it better than just switching off checkpoints completely?
We could also just keep checkpoints in "no delete" mode.
But I think the real problem is not "delete" operations, it is put. Mostly put for the manifest is very big hole. (we could ignore all other puts, because they are supposed to contain the same data, although we can鈥榯 check because of encryption)
I think what we need for a safe append only mode is that the appended archives are _not_ stored in the repo manifest but managed by the borg server. i.e. we would need a new RPC operation "add_archive" that either takes the chunk-id or maybe even the whole archive chunk. That way the server could even implement a policy where only the last one in one connection is persisted. Thus there would not be a pile up of archives for each checkpoint.
This of course is a bigger change, as all clients that interact with such an repo need to be able to see the append only archives using further new rpc commands.
The trusted client might merge all of these into the manifest to create a repo that would be compatible with older clients again, or maybe just because it is more efficient.
Still problematic: A client can put chunks that claim to contain the data for some chunk-id but do not (either corrupted, or something else). I don鈥榯 think there is anything we really can do about this. The trusted client could download and check these chunks, but that鈥榮 a bit late. Also a bad client can put chunks not linked from any archive, although borg check would be able to clean this up.
I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.
@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.
I'ld say this is pretty much doomed to be unsolvable without fundamental changes.
I don鈥榯 think we need to loose all hope for something that works well enough. Fundamentally the borg model distrusts the server, so we can鈥榯 get perfect security here. But i hope we can do enough that borg backups can have a reasonable trust level.
We basically want to prevent one evil client to interfere with other clients backups and with backups of the client before it became evil. I don鈥榯 think there is any way (in any setup) to make sure that a client doesn鈥檛 sabotage its new backups.
Maybe we should think about kinds of attacks here. One that springs to mind is for example the crypto trojan. An evil client just wants to destroy the backups to prevent undoing it鈥榮 damage. For evil clients that want to do data ex-filtration we already have #672 or #1164. What are other major attacks an evil client might want to do?
One nice thing would be to be able to restrict clients to a certain (set?) of prefixes. This would likely be another --restrict--something option.
I think just using the first put is a viable strategy. Excluding the manifest (maybe just by refusing puts to it鈥榮 id in this mode), a bad client needs to predict the id of an chunk another client will want to save. This should be hard for most client unique data. On the other hand it would be easy for data say from a distribution update. But restore errors in distribution files are just an hassle. Nothing that would force a user to for example pay ransom to a crypto trojan.
Even further a client could validate already known chunks with a certain probability. This would guard against non malicious corruption or if a client massively poisons the repository. Ideally it would check "new" chunks with a higher probability. (detecting new chunks would mean tracking trusted chunks (i.e. written from this client) separately on the client, which of course is more work.
Still don't get how one would defend against a low-level crap-chunk-putting client while being able to run delete or prune now and then (see first post).
Another threat scenario would be a user that uses some kind of cloud syncing solution.
Evil client syncs some file (thesis.tex) first. It now knows how this will be chunked and can poison those chunk ids with bogus data.
Now even if the file is synced to a good client later that client can hardly fix the damage of the evil client. I don鈥榯 see a feasible way to defend against this, apart from the cloud syncing service also having backups. Then again the evil client could also just replace the file in the synced folder with crap and hope it will be synced to the good client before it has backuped the correct version.
To summarize:
Add a new client restriction to borg that restricts delete and overwriting capabilities of a client.
Such a client:
All borg clients:
A trusted client that e.g. does purge:
I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.
Status update for that: Prototype is working.
What I've been up to here is essentially a backup system built on top of Borg, where you only have one trusted party, a central backup server that controls access to repositories.
This works by having (among some higher level coordination that is kinda required to make it all work) a reverse proxy that the (untrusted!) clients use to access a view of the target repository.
This provides:
Code: https://github.com/enkore/borgcube (please heed the notes in the readme)
Actually, I got a little lost in all those issues about 'hacked-server', 'append-only', 'append-only not save with prune' and so on. So excuse me if I'm not commenting in the right/most-appropriate place...
If I understood the current situation correctly:
--append-only will save your backup-data in case some client tries to delete stuff from your repo, (by only tagging chunks 'to-be-deleted', but not beeing able to delete them)--append-only and executing --prune (or some other operation), that will delete everything that is to-be-pruned and tagged as 'to-be-deleted' by previous repo-accesses from --append-only runs.are those assumptions correct? I'm new to borg and try to get my head around all this stuff, so please correct me if I'm wrong.
What I'm thinking about is:
--append-only and pruning from a trusted client is save, as long as you are sure that your clients/your repo have/has not been tampered with when you do the pruning.so what about introducing something like an 'incubation-period' aka: prune all transactions that are older than [insert user-supplied time span here]. That would mean, If I have plenty of space, I will keep all transactions of the current Year, but pruning the stuff that is further past than that year.
My intention on that is: If one of my clients gets evil I will notice that at some point in time. If I have the transactions 'unpruned' since that client got evil, I can easily recover from that, by deleting its transactions. The conclusion is: If i am sure that none of my clients were evil in the last year, I can prune the transactions that are further past than one year without loosing data.
That would allow to save some space, prune now and then and have some kind of 'incubation-period' for me noticing that one of my clients got evil without it tampering with all my backups.
Depending on the users choice and trust in their machines they could choose a reasonable 'incubation-period' for them to notice something went wrong before that could creep in their backups.
As I couldn't get my head around the --append-only logic completely, I'm not sure if that is even possible like that, but wanted to share that idea. Is it possible like this?
@MK-42 yes, that's correct.
repo commits do not have timestamps, so we can't consider time.
In -ao mode there is the transaction log which could be parsed back, but this sort of thing definitely requires RPC updates -> something for 1.1+
Also I'm not super-convinced that this would be a big improvement over simple -ao, since it requires even more knowledge of internals to grasp and is even harder to use. Either is stop-gappy...
I created a $100 bounty. I encourage others who would find this useful to contribute!
@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.
I'ld say this is pretty much doomed to be unsolvable without fundamental changes.
To reiterate the problem and make sure that I understand it correctly now, after reading the ~10 various currently existing issues loosely requesting new types of read-only/write-only/etc mode, they all seemingly stem from the fact that "--append-only" mode as it exists right now is mostly broken in real life usage. (It is not technically broken, as it does what says in the docs, but in reality most users will want combine it with pruning old data on the server, which will make every deletion/corruption previously masked and prevented by the append-only mode permanent. Thus if administrators want to use pruning, they are now expected to somehow inspect all repositories before every real prune (which is usually done often using a scheduling mechanism), which is completely unrealistic. The only real use case for append-only, the only time when it can prevent corruption/hack is when an attack has been detected immediately, and an administrator has been notified and reacted immediately to stop pruning batch jobs and started inspecting the state of the repository immediately after the attack. (Or if no prune commands are ever issued on a repository at all.))
The difficulty in implementing a fix seems to be rooted in the fact that the client-server model of Borg allows a client to issue low-level simple commands (who ever thought that up as a viable way to design it?) such as "PUT" or "GET" on individual blocks or indexes or repo files, and most of these commands are required for both creating, deleting, removing and purging at the same time, and so simply banning certain low-level commands does not work because they are used in a normal "create" command as well, and so banning them would prevent any operation (even creating a new backup). Does this assessment sound correct?
If so, the only two ways we have to implement "real" append-only/write-only, in a meaningful way that many people expect is
1) To implement a clever heuristic/well planned analyzer or rights management system on the server which will interpret and disentangle the stream of low-level commands sent to it by the client in order to make an educated guess about whether the high level operation the client is trying to do is legitimate and valid, and then restrict/allow it accordingly.
2) Change the underlying architecture of the client-server model in Borg, and finally stop exposing low-level commands that should only be done in the server to clients, giving a new API which will then be easily restricted.
Judging by the number of open issues, the breadth of discussion and the different ideas, the lack of consensus, the timespan, etc., solution 1) is proving to be very difficult to design implement.
How far are developers from the decision to invest in the solution 2)? Is it a viable alternative at all, how much reorganization would it require? How long time would it require to implement? Can it be done? Would such a big change even be accepted as a pull request?
@imperative Not exactly. The basic security model says that the server is the untrusted part. This is needed for (data at rest) encryption to be actually meaningful. So the server can not do much high level operations. This is on purpose. Of course the server always can drop data to make the backup disappear.
I've outlined my view of this in https://github.com/borgbackup/borg/issues/1772#issuecomment-258575677. Which i still think is viable.
This adds a bit more trust to the server, as now the server sees encrypted archive data separately instead all in a big block, but this should be tolerable, because it is still encrypted and the previous usage patterns are likely to leak the exact same data for creates (assuming the crypto is good). prune/manifest compaction should not expose to much details either.
In a situation with multiple (untrusted) clients accessing one repository it still has the problem that an evil client can poison the repository with chunks claiming an id that does not match the contained data. In my model the (weak) defense against this is having the client check random chunks. A secure defense would be to have a client keep track of validated chunks and download and validate each chunk that is needed in an archive that this client did not yet validate.
For single client repos this is not really a problem as long as you keep in mind that only backups done before your client has been compromised are reliable. As those will always have their data already in the repository before the evil client comes along and already existing chunks can not be erased or replaced it can not spoil the old archives. (defend against the crypto malware use case)
I am testing borgbackup and I found also this problem with its architecture.
I would like to share an idea which I don't know if it is realistic. Could we implement the pruning in the server side?. If the server saves the last date of when a chunk was required by any archive, then maybe the server can delete the chunks that have a date older than the configured one for pruning.
If a chunk belongs to 3 archives: 1 month old, 1 a week old and 1 a day old. The date for the chunk would be the one of the "day old" archive. If that chunk stops being used in new archives, it will retain that date so when a month (or whatever date) passes, the server can remove the chunk, bypassing the append-only.
This way:
However, I don't know if this is feasible or if I am getting anything wrong (probably I am). Anyway I just wanted to share it.
The server doesn't know when an archive references a chunk due to encryption.
The server doesn't know when an archive references a chunk due to encryption.
I guess so, but I was wondering if it could be possible for the server to store that information (the client could send it). The amount of information is minimal and it doesn't look that it could disclose anything about the contents of the backup. Only getting that information should allow to prune the archives server side which is a big improvement in security.
The only additional information required is to associate a chunk with a date.