Borg: Borg backup to Amazon S3 on FUSE?

Created on 19 Jul 2015  Â·  88Comments  Â·  Source: borgbackup/borg

Hi everyone,

I'm interested in using Borg to backup my webserver to an Amazon S3 bucket. I've been using Duplicity, but I'm sick of the full/incremental model, as well as the difficulty of pruning backups. I love the ease of use and features that Borg provides, but I don't really understand the internals and I'm not sure if it will work with Amazon S3 storage.

Specifically, I'm considering mounting my S3 bucket over FUSE, using one of the following three options:

Any comments on which, if any would be more appropriate? And how tolerant would Borg be to S3's "eventual consistency" weirdness?

Additionally, I need to plan against the worst-case scenario of a hacker getting root access to my server and deleting the backups on S3 using the stored credentials on my server. To eliminate this possibility, I was thinking about enabling S3 versioning on the bucket so that files deleted with my server's S3 user account can still be recovered via my main Amazon user account. Then, I would have S3 lifecycle management configured to delete all versions of deleted files after X amount of time. In this case,

  • How much of my S3 data would Borg routinely need to download in order to figure out which files have changed and need to be backup up? (I'm worried about bandwidth costs.)
  • How much accumulated clutter and wasted space could I expect from files that Borg "deletes", (which will actually be retained on S3 due to the versioning)?

Again, my concerns are based on me not really understanding all the black magic that happens with all the chunks and indexes inside a Borg repository, and how much they change from one backup to the next.

Thanks in advance for the help!


:moneybag: there is a bounty for this

Bountysource

Most helpful comment

Stumbled upon this hoping to find some progress on borg + S3 like (minio) backed

All 88 comments

I'm still trying to get an idea of what exactly happens in the Borg repo from one run to the next. I used it to backup my ~/ directory (about 72GB on disk) last night, and I messed around with creating and deleting files and re-combining ISO images to see how well the de-dupe works. (It works extremely well, I might add!) I ran around 30 backups with no pruning. That was last night, and then today I used my computer for some web browsing and then ran another backup with a before and after ls -sl on the repo/data/1 directory . Here's a diff of repo/data/1 before and after:
http://paste.ubuntu.com/11910814/
(1 chunk deleted, 4 added, total change of 5)

Then I pruned all but the most recent backup and ran another diff:
http://paste.ubuntu.com/11910824/
And here's the repo/data/0 directory, just the names of deleted files:
http://paste.ubuntu.com/11910839/
(580 chunks deleted, 75 added, total change of 655)

So assuming that all the chunks are around 5MB, that would be around 3GB of deleted data taking up wasted space in Amazon S3, which would cost me about $0.05/month in Glacier according to Amazon's calculator, and it would have to stay there for 90 days to avoid a penalty. Or else in regular S3 storage it would cost something like $0.11/month. Additionally there would be far fewer changes and much less total data stored in the case of my webserver I want to back up with this scheme.

So I would tentatively think this could be a good option?

I might add that you can get 10 TB (thats ten terrabyte) as "nearly" OpenStack Swift compatible storage from HubiC.com for 50 Euro a year (no kidding). I use this together with my Hubic Swift Gateway and the swift duplicity back end.

This also is EU storage (located in france) which solves some problems with German laws.

I also think that it is fairly easy to implement as backend for software with a chunked approach.

P.S.: Their desktop client (still) sucks imho... but you even get 25 GB for free. Which can also be used for experiments with the API.

Thanks @oderwat for the tip! Good to know.

I must say that I don't use "cloud data storage services", so I can't advise about their API/capabilities.

Borg's backend is similar to a key/value storage and segment files only get created/written, but not modified (except from complete segment files being deleted), so it could be possible if someone writes such a backend.

Borg has an "internals" doc that might be interesting for anybody wanting to write such a backend. If information is missing there, please file a docs issue here.

borg has _some_ level of abstraction of remote repositories... there's currently only a single RemoteRepository implementation, and it hardcodes ssh in a bunch of place. we nevertheless have a list of methods we use in RPC calls that would need to be defined more clearly, maybe cleaned up, and then implemented in such a new implementation:

    rpc_methods = (
        '__len__',
        'check',
        'commit',
        'delete',
        'destroy',
        'get',
        'list',
        'negotiate',
        'open',
        'put',
        'repair',
        'rollback',
        'save_key',
        'load_key',
    )

this list is from remote.py, and is passed through the SSH pipe during communication with the borg serve command...

notice the similar issue in https://github.com/jborg/attic/issues/136

Supporting storage services like AWS S3 would be huge and make borg a real alternative to tools like tarsnap. I would support a bounty for a) generic storage interface layer b) and S3 support based on it.
I suggest libcloud https://libcloud.readthedocs.org/en/latest/storage/supported_providers.html to design interfaces/deal with cloud storage services.

Another interesting backend storage might be sftp/scp, as provided by some traditional hosting providers, like Hetzner or Strato HiDrive

@rmoriz your contribution would of course be welcome. bounties are organised on bountysource, in this case: https://www.bountysource.com/issues/24578298-borg-backup-to-amazon-s3-on-fuse

the main problem with S3 and other cloud providers is that we can't run native code on the other side, which we currently expect for remote server support. our remote server support involves calling fairly high-level functions like check on the remote side, which can't possibly be implemented directly in the native S3 API: we'd need to treat those as different remotes. see also https://github.com/borgbackup/borg/issues/191#issuecomment-145749312 about this...

the assumptions we make about the remotes also imply that the current good performance we get on SSH-based remotes would be affected by "dumb" remotes like key/object value storage. see also https://github.com/borgbackup/borg/issues/36#issuecomment-145918610 for this.

Please correct my if I'm wrong.

It looks like we have/need a three-tier architecture:

  • borg client
  • borg server (via ssh)
  • (dumb) storage.

So the borg server part needs a storage abstraction model where backends like S3, ftps, Google Cloud Storage, etc. can be added.

Is that correct? I think using FUSE adapters are not a reliable way (IMHO).

Update:

the server is not necessaryly needed

borgs internal structure would allow to use something like a different k/v store as well - but someone needs to do and test it

Thanks for putting a bounty on this.

If someone wants to take it: please discuss implementation here beforehands, do not work in the dark.

+1 for me on this. I want exactly what the original poster is talking about. Also since I am worrying about deduplicating I want to use some really highly durable storage like amazon has. Also the versioning life-cycles to protect against the "compromised" host problem would be fantastic... (I added to the bounty :) )

I've written up some of my thoughts on some of the limitations of s3, and a WIP discussion about some possible method to address them. It is organised as a single document right now, but as it flushes out, I will expand it as appropriate. Please comment there and I will try and keep the document up to date with as much information as possible. see https://gist.github.com/asteadman/bd79833a325df0776810

Any feedback is appreciated. Thank you.

the problematic points (as you have partly noticed already):

  • using 1 file per chunk is not gonna work practically - too many chunks, too much overhead. you have to consider that 1 chunk is not just the usual 64kiB (or soon: 1MiB) target chunk size, but can be way smaller if the input file is smaller. you can't really ignore that in the end, this is something that has to be solved.
  • the archive metadata (list of all files, metadata of files, chunk lists) can be quite large, so you won't be able / you won't want to store this in one piece. borg currently runs this metadata stream through chunker / deduplication also, which is quite nice because we always have the full(!) item list there and a lot of it is not changing usually.
  • "skipping chunks that already exist" - if you want to do that quickly, you need an up-to-date (consistent) local index / hash table. otherwise, you may have 1 network roundtrip per chunk.
  • that "eventually consistent" S3 property is scary. it's already hard enough to design such a system without that property.
  • "chunk staleness" is an interesting idea. but i think you could run into race conditions - e.g. you just decided that this 3 months old chunk shall be killed, when a parallel backup task decided to use it again. guess either atomicity or locking is needed here.

Yes, target chunk size in 1.0 will be 1 or 2MiB. That doesn't mean that there will be no tiny chunks - if you file only has 1 byte, it will be still 1 chunk. So, the average might be lower than the target size.

BTW, it is still unclear to me how you want to work without locking, with parallelel operations allowed (including deletion). I also do not think that making this github issue longer and longer with back-and-forth discussion posts is helping here very much - if we want to implement this, we need ONE relatively formal description of how it works (not many pages in discussion mode).

So I'ld suggest you please rather edit one of your posts and update it as needed until it implements everything needed or until we find it can't be implemented. Also, the other posts (including mine) should be removed after integration. I am also not sure a gh issue is the best for that, maybe a github repo, where one can see diffs and history would be better.

http://www.daemonology.net/blog/2008-12-14-how-tarsnap-uses-aws.html doesn't sound too promising about the possibility of reliably using S3 directly from a backup tool (he wrote a special server that sits between the backup client and S3).

@ThomasWaldmann - actually its promising - it's not too different from what borg is already doing in the local format - and it might not need too much of a change to make borg work against it

Don't forget BackBlaze's B2. Cheapest storage around. Hashbackup already does all that but it's closed source so who knows how that is done.

Amazon Cloud Drive offers unlimited storage for just 50$ a year. Would be great if it'd be supported! :)

There's a FUSE FS for it: https://github.com/yadayada/acd_cli

That should work okayish (maybe not the best performance).

This thread here is about directly using the S3 key-value store as a backup target (no intermediate FS layer), at least that's how I understand it.

I think it's kinda unrealistic, at least for now, to completely redo the Repository layer. An alternative Repository implementation could be possible, but I don't see how you could do reliable locking with only S3 as the IPC, when it explicitly states that all operations are only eventually consistent. Parallel operation might be possible, but really, it's not a good idea for a first impl. Also, Repository works only on a chunk-level, and most chunks are _very_ small. That just won't work. (As mentioned above)

Working on the LoggedIO level (i.e. alternate implementation of that, which doesn't store segments in the FS, but S3) sounds more promising to me (but - eventual consistency, so the Repository index must be both local and remote, i.e. remote updated after a successful local transaction, so we will actually need to re-implement _both_ LoggedIO _and_ Repository).

Locking: Either external (e.g. simple(!) database. Are there ACID RESTful databases, those wouldn't need a lot of code or external deps?) or "User promise locking" (i.e. 'Yes dear Borg, I won't run things in parallel').

Eventual consistency: Put last (id_hash(Manifest), timestamp) in locking storage or local, refuse to operate if Manifest of S3 isn't ==?

For what it's worth, I'm currently using borg on top of a Hubic FUSE-based filesystem for my off-site backups. It's painfully slow - my net effective writing speed is around only 1 Mb/s - but other than that works pretty well.

Issues as I see them

  • Writes have a very high latency. Once you're writing it's fast (10 Mb/s, intentionally limited within Hubic), but there seems to be a two second delay at the beginning of each file write.
  • Reads are reasonably fast. There's certainly nothing like the write latency but I've yet to turn this from an empirical value into a quantifiable one.
  • The process is slow, so avoiding inter-feature locking would be a very good thing. (borg list, and borg extract, specifically).

It might help to cache KV updates locally before writing them in a blast periodically, But I don't have any easy way of testing this. (It would be nice if there were a generic FUSE caching layer, but I have not been able to find one.)

Increasing the segment size in the repo config might help if there is a long-ish ramp-up period for uploads. (And increasing filesystem level buffer sizes if possible)

http://rclone.org/ maybe interesting as component for the cloud support plan.

here's the most original solution I have heard yet for "cloud" backups with borg:

https://juliank.wordpress.com/2016/05/11/backing-up-with-borg-and-git-annex/

TL;DR: backup locally, then use git-annex (!) to backup to... well, anything. in this case, a webdav server, but yeah, git-annex supports pretty much anything (including rclone) and can watch over directories. I'm surprised this works at all!

Yeah so I've already went down the git-annex route through research and
testing and it's extremely complicated. The way your suggesting is really
dirty and tedious....git-annex is a whole other beast to learn. Really,
users of Borg could already just rclone their backups to whatever cloud
provider is supported by rclone (most of them). You'd only need to add
git-annex if you're looking for even more versioning and/or encryption.

On Wed, May 11, 2016 at 11:40 AM, anarcat [email protected] wrote:

here's the most original solution I have heard yet for "cloud" backups
with borg:

https://juliank.wordpress.com/2016/05/11/backing-up-with-borg-and-git-annex/

TL;DR: backup locally, then use git-annex (!) to backup to... well,
anything. in this case, a webdav server, but yeah, git-annex supports
pretty much anything (including rclone) and can watch over directories. I'm
surprised this works at all!

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/borgbackup/borg/issues/102#issuecomment-218499047

_Mario P. Loria_
(586) 258-6003
Co-Founder, Arroyo Networks https://arroyonetworks.com/

[email protected] https://www.linkedin.com/in/mario-loria-32514b10a
https://twitter.com/marioploria
https://plus.google.com/+MarioLoria/posts https://github.com/InAnimaTe

Well, that was the whole point wasn't it, encryption... Then again, it's unclear to me why they didn't use the built-in encryption.

The whole setup, even with just rclone, also has the problem that you have a local repository which takes local disk space. Obviously, this is not a complete solution for the problem here, but I thought it would be interesting to share nonetheless.

So.... if I rclone the entire borg repo to my favorite cloud storage provider, and then I later want to restore something, do I have to re-download the _entire_ repo? And what if one or two "chunks" get corrupted, can I still recover the rest?

https://github.com/gilbertchen/duplicacy-beta

It looks like duplicacy has most or all of the same features as borg-backup, and it also supports backing up to cloud storage like Amazon S3. Unfortunately, it is not currently open-source, and the development seems to happen behind closed doors.

The developer does share a design document so it is possible to get a general idea how it works. If I understand it correctly, the reason duplicacy is able to work with cloud storage is because it does not have a specialized index or database to keep track of chunks, but rather uses the filesystem and names the files/chunks by their hash.

It's super-shady that they're using github while not being open source.

I agree that the way they are doing it is not very cool. I did not mention them to recommend their software, but rather because it seems that they have a good design. Since the design document is on github, others should be free to use the same design, no?

Also, if you need specifics, they forgot to strip their executable :P. It's built in go from what I can tell.

See also related #1070.

WRT the git-annex thing, I have to say I dropped that and am thinking about other solutions. The major issue being that you basically have to fetch _everything_ back to restore the latest backup, as we'd otherwise need borg to be able to tell git-annex which files it needs. The WebDAV server I'm on also has 1TB of storage, but apparently there's a file number limit or something, because it stopped working at some point.

Another thing I wondered about is pruning: Doesn't that repack a lot of files, and would force me to retransmit a lot of data to the server compared to running prune on a server-side borg (append-only would probably be about 1GB per day, so I'd quickly fill up 1TB)? This would also be a issue for FUSE file systems.

I'm thinking about just running borg once for each location, instead of duplicating local borg backups, and just pay a tiny bit for an rsync.net account ($0.03/GB is OK) and/or re-use a few machines at my local sites.

@julian-klode if a bigger segment size would help you for that webdav server, you can set that in the repo config. borg 1.1 (beta right now) will create bigger segments.

Prune: you can't run prune server-side (except if you give the server-side borg the encryption key - you can do that if you trust the server, but borg doesn't trust the server by design).

@rmoriz considering the size of your bounty, could you clarify if it is really dependent on a "generic storage interface layer" other than a FUSE filesystem? as things stand, the bounty is in the process of being claimed by @bgemmill in https://github.com/yadayada/acd_cli/pull/374, but that only covers a S3 over FUSE filesystem implementation.

it seems to me #1070 is the place for a more generic implementation (which I would be interested in working on, more than just S3), but there's no bounty on that other task, and even the amount here isn't quite enough to compensate for the time that would be needed to complete a more generic design...

Hi, add 5 dollar for this great bounty !
It works also with the unlimited storage solution from amazon (cloud drive) ?

It works also with the unlimited storage solution from amazon (cloud drive) ?

I believe the Amazon drive cloud thing is a different API than S3. Even worst, the API is invite-only:

https://developer.amazon.com/amazon-drive

So I think it's out of scope for this specific issue here, but would be a good fit, again, for #1070...

@anarcat Why not, perhaps we can start a bounty on that ? I can give 5$ - 10$ on this. I'm really interested by automatic backup on cheap cloud backup solution like amazon drive offer.

head over to #1070 then :)

With my PR on acdcli, you can use borg on amazon cloud drive right now. I've been using it for a month or so without issue.

You also don't need to sign up for an API as @anarcat mentioned; that's only if you're going to be creating your own access system like acdcli or a fork thereof. Just using it is fine.

As to the bounty, acd isn't technically s3, so I'm not sure if my work qualifies there. If the demand was merely to run borg on cheap amazon storage, you can do so now :-)

Any plans to to make this more general and support something like libcloud. An Amazon Cloud Drive or Backblaze option would be great. Rsync.net will probably be more cost effective than S3.

bgemmill wrote:

With my PR on acdcli, you can use borg on amazon cloud drive right now. I've been using it for a month or so without issue.

How did you do that? I mounted Amazon Cloud Drive with acd_cli, but get "assert transaction_id is not None" @ borg init, and "Invalid segment magic" @ borg create... Are there any special parameters to set or other software to install?

You probably have to apply his PR first.

Considering rclone was banned from ACD (they may ban similar automated backup programs too) and Amazon removed their unlimited storage option Amazon Cloud Drive might have lost a lot of its appeal for use with Borg...

Agreed. I'm dropping ACD. I'm really disappointed in them. They offered that unlimited plan after the OneDrive fiasco. If they really couldn't afford it, they should have learned from Microsoft's mistake. I think it was an intentional ploy to acquire users.

Mark Penner wrote:

Agreed. I'm dropping ACD. I'm really disappointed in them. They offered that unlimited plan /after/ the OneDrive fiasco. If they really couldn't afford it, they should have learned from Microsoft's
mistake. I think it was an intentional ploy to acquire users.

Ditto. I'm just evaluating Google Cloud now. There's even a Linux client
("gsutil") in my distro's repository which has an "rsync" option. Looks
promising.

-Matt

Anyone tried using RioFS to mount S3 and then using that mount point for your Borg repo? I've used RioFS and it's pretty stable. Much better performance than yas3fs, in my experience.

@davetbo https://github.com/skoobe/riofs#known-limitations doesn't sound like it could work, but maybe just try, maybe docs there are outdated.

Does borg append to existing files? I thought it either created or deleted the blocks, but never appended to them. Maybe it appends to other files, though.
Does it rename folders? I wouldn't know.
Does it expect "posix filesystem semantics?" I wouldn't know that either.

I will give it a try and post back with my results. In the meantime, if anyone else comes along and has any feedback on having tried it, maybe they'll share :)

S3 changed their consistency guarantees so a file will always be consistent at least with an older version of it (which I think I can deal with), but if you try to read a file that was deleted or never existed it can get weird again. Aside from the lock files, does borg ever test for files' existence?

@davetbo when it writes a chunk and the currently open segment file has not reached its size limit, it appends the chunk data at the end (== it does not write the whole segment file with one write() call). Not sure if that already is a problem for that filesystem.

If borg has written some data, it expects to be able to read them in a consistent way.

Guess we'll only know if someone tries.

Does the bounty only apply if it's implemented via FUSE? IMO S3 doesn't behave enough like a filesystem to make sense, though I think I can make it work hooking into Borg.

I ended up going with EFS (Amazon's NFS implementation) instead of S3. I
don't have that much data, so it will be manageable.

I tried S3 and it didn't work. I got an error during init that made it look
like it didn't like not editing files, so I switched to EFS.
Unfortunately, I didn't write the error down to share it here.

So far, it works great on EFS! It's 10x more expensive than S3, but I
don't have a lot of data yet, so for now this is sufficient.

I suppose you could also point Borg to a local repository and then use aws
s3 sync to copy that repository to S3 regularly. That might work well.
Any anticipated issues with that?

Best,
Dave

Sorry, I should have added this to that last response. As it stands, using
Borg directly to EFS takes about 45 minutes to backup about 2.5GB of data,
and that's on subsequent passes even. Does that sound right to you? I know
it's deduplicating properly, because the reports show that each new backup
of 2.5GB is deduped to less than 1MB if no files changed.

Here's an example from a subsequent pass of a folder that hasn't changed:

BORG BACKUP RESULTS FOR /var/www/efs_prod


Archive name: fsm-DEV-2017-07-28-14-28
Archive fingerprint: 22220dd9a859fcb44da8a257802d1b
4e943a5d5d5b3a8c7dcd1395b01327024e
Time (start): Fri, 2017-07-28 14:28:02
Time (end): Fri, 2017-07-28 15:11:56
Duration: 43 minutes 54.20 seconds

Number of files: 59923


Original size Compressed size Deduplicated size
This archive: 2.66 GB 2.55 GB 447.03 kB
All archives: 12.46 GB 11.95 GB 2.44 GB

Unique chunks Total chunks

Chunk index: 51056 275786


BORG PRUNE RESULTS

So it goes through those 59K files and takes 43 minutes to do it, even
though the deduplicated size is 447.03kb (meaning it's probably only
metadata about this backup, I'm guessing).

Does that sound right to you? I'm thinking this may be reason to switch to
a local Borg repo followed by aws s3 sync after. Thoughts?

Best, Dave

Try running at --verbose or --debug level to get a better idea on what's going on exactly.

Backblaze is a great solution. Cost efficient as well. If this feature is ever implemented would be a good place to start.

Facing this same problem (fuse s3 being too slow to be practical) I concocted this:

https://github.com/luispabon/borg-s3-home-backup

Uses a regular local borg repo, pruning and aws sync. Daily backups take around a minute or so for a ~100MB "delta" on a 10mbit/s home connection, 20GB folder with 1.7 million small files on a regular SSD.

After reading this thread I tried backblaze, and managed to make it work, but the setup is very fragile.

  1. Install either minio or s3proxy to translate B2 API to S3 API.

    • They have various deviations from full S3 API, so I couldn't make some S3 FUSE filesystems work at all; and for those which did work, there were troubles with Borg, which wants to rename some files, because B2 doesn't support renaming. But it is enough for the next step.

  2. Install s3backer to provide a block device on top of ~S3~ B2 bucket.
  3. Format that block device in ext4 or other usual FS of your choice.
  4. Point Borg at it.

While you're welcome to follow my steps, I myself probably won't use it... https://libcloud.readthedocs.io/en/latest/storage/index.html would be a cleaner solution, if/when implemented.

@DarthGandalf have you tried rclone mount? Same end result, but with only one tool (less fragile). If you have, how does the performance compare?

@Artefact2 AFAIR I discarded rclone mount because it claims to not retry writes, if RPC failed; and s3backer seems to have support for retries, though I didn't actually test that it works as expected.

For me reliability of backups is more important than performance, and this setup doesn't look any reliable, so I didn't get to measuring performance.

That's a very good reason against rclone, indeed. It seems you can use some vfs caching to retry on failed uploads. It's worth a try.

That's an argument against rclone mount, not against rclone :) rclone sync would work, but it requires too much local space.
I didn't notice the caching feature before, but worth trying, yeah.

No combination of vfs-cache option I tried enabled 'move' operations on rclone mount, which are required for borg.
I keep getting:
Dir.Rename error: Fs "B2 bucket draget-backup" can't rename files (no Move)

Pity. :(

Thinking about @DarthGandalf s3backer idea .. I was wondering if one of the log-structured file systems might be a good fit on top of a cloud-backed block device?

Il used rclone on the borg storage server to backup the encrypted borg repository onto a S3 service. This way I can (read only) mount it on other another computer and use rclone cache and filesystem to access it with the local borg and extract data from it. But lately I switched from borg to https://www.duplicati.com/ for most of my backup/archiving. This has no filesystem but works very transparent and comes with nice demons for all systems. Borg is really hitting a limit by not supporting all those new remote storages available.

Borg does almost everything better than Arq in my opinion (including supporting Linux ;)) but not providing a reliable way to backup directly to remote repositories (other than ssh) really makes it harder to archive big files to somewhat big repos that I don't want to keep locally (photos, old videos, etc, that I don't want to keep locally, but that I might want to update later).

I tried with s3fs, seems to work fine however the borg cache has issues being written correctly on the s3fs. seems like an s3fs issue please check https://github.com/borgbackup/borg/issues/4096

Works fine for me using S3QL. Only borg prune that seems to need to download the entire archive it's pruning, this takes a lot of time given my archive sizes.

I'm somewhat interested in throwing something like this together - I'm thinking it might be better to treat b2/s3 as a backend for local storage or something along those lines. I started skimming code, but I haven't done anything substantial yet. is anyone currently working on a native implementation of this?

Also interested in remote storage, I think adding rclone integration is a way to support multiple destinations at once.
Maybe http://www.hashbackup.com/destinations/rclone/rclone.py is a start, I don't know how local repository works, but it could be used. Maybe segment size should be smaller, and rclone configured with some cache to improve speed.

There seems to be no progress here, also the ticket is getting more unfocused over time (note: it started as "S3 via FUSE", went over to "borg backend for S3" (backer @rmoriz) to "rclone integration" and other stuff.

For most backers, it's not quite clear what they are backing (except maybe "something that lets borg work with S3").

So not sure how we should proceed here. Close the issue? What to do with the money? I am not sure if there is a way to give back money with bountysource. Another way would be to transfer it to general borg organisation funds, but that would need the backers to agree with that. Or we just keep it open for anything that might come in future...

I am not sure if there is a way to give back money with bountysource

Their support was quite responsive when I wrote them (a few years ago)

I'm no backer but when I started using borg for like one month ago I thought I needed this to easier be able to just fire-and-forget backup my stuff into s3 glazier.

But now I'm wiser as I found out it doesn't work and instead opted to do a two-phase solution where I borg backup to a local disk and then sync over to s3 with rclone.

This has the advantage that I'm able to instantly restore files from backup from my local backup and won't be punished with s3's retrieval rates.

I would prefer to use rclone as a _gateway_ to S3, I don't have enough free space to keep a mirror of my backup, and I think rclone is better than fuse because it supports multiple backends.

In facing a "will not resolve" solution for this issue, I think I could mention that we simply moved away from Borg + rclone (as simple backup) for some of our S3 backend storage projects. We switched to using Duplicati for those use cases. They don't even promote the deduplication but it works very well for our stuff which has large files with small changes. It also supports a lot of backends and has a command line client. So maybe this is a good solution for some of you.

I have been using "2-pass" solution (i.e. from my laptop to my NAS with native Borg SSH, and from NAS to B2 with rclone) since 2017 and it worked quite well.

Just now I stumbled upon this: https://aws.amazon.com/sftp/

So S3 supports SFTP (SSH) now. You could spin up an SFTP gateway and back up directly to S3 with Borg!

The gateway is quite expensive though (0.3 USD/hr), and data transferred thru gateway are metered (0.04 USD/GiB). So if someone would like to try this, he/she should use a wrapper script to spin up / down the gateway on-demand.

So S3 supports SFTP (SSH) now. You could spin up an SFTP gateway and back up directly to S3 with Borg!

AFAICT it's sftp only. I don't believe Borg supports SFTP? My understanding is it needs to be able to invoke itself on the remote and talk to it over forwarded stdin/stdout..

So S3 supports SFTP (SSH) now. You could spin up an SFTP gateway and back up directly to S3 with Borg!

AFAICT it's sftp _only_. I don't believe Borg supports SFTP? My understanding is it needs to be able to invoke itself on the remote and talk to it over forwarded stdin/stdout..

Oh... I'm dumb. I forgot that Borg needs another Borg executable on the server side...

borg can work either with a (working, potentially remote) filesystem or client/server.

If anyone is still trying to do this, it's totally possible using the s3fuse plugin -- I know it isn't native. But here's a writeup of doing it with Linode object storage that could easily be adapted to Amazon S3: https://jthan.io/blog/borg-backups-on-linode-object-storage/

Stumbled upon this hoping to find some progress on borg + S3 like (minio) backed

I'd definitely be willing to pitch in for Borg S3 backend

Amazon S3 is now strongly consistent, eliminating the most annoying part of a theoretical borg S3 backend.
Does this bounty still stand? If a working S3 borg backend were written, would it be merged?

My only suggestion would be that it works with any s3-compatible storage so
we aren't locked to AWS, but I'm still interested if anyone is wondering.
Borg is in every way superior to Restic, other than a lack of s3
compatibility. My two cents.

On Tue, Dec 1, 2020, 19:12 Milkey Mouse notifications@github.com wrote:

Amazon S3 is now strongly consistent
https://aws.amazon.com/s3/consistency/, eliminating the most annoying
part of a theoretical borg S3 backend.
Does this bounty still stand? If a working S3 borg backend were written,
would it be merged?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/borgbackup/borg/issues/102#issuecomment-736941423,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AENJAJ33SDF72XTHFJS34MLSSWPCTANCNFSM4BLNQ54Q
.

There is a new project, Kopia (in Go like Restic) which is not as mature as Borg but with many same features + S3 backend

@milkey-mouse that is hard to say in advance. i guess the code change would have to be somehow clean and not risking stability of the existing file/ssh-based backend. not sure whether that is possible.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pierreozoux picture pierreozoux  Â·  4Comments

verygreen picture verygreen  Â·  4Comments

rugk picture rugk  Â·  3Comments

htho picture htho  Â·  5Comments

unlandm picture unlandm  Â·  4Comments