The "Unlimited Everything" plan of Amazon Cloud Drive is a quite affordable backup storage option. Amazon Cloud Drive has its own RESTful API.
Just thought the same thing.
I may look at this once #21 is in place - uploading without compression seems like a waste of bandwidth.
That depends on your use case ;)
There’s some proof of concept code I found: http://sprunge.us/fdQF — it requires that an oauth token is in /tmp/token.json, but seems to work for me.
Motivated people could turn that into a clean backend for ACD :).
+1 from me :)
I compile the backend @stapelberg found, and it indeed appears to work and use code from rclone.
+1 too
I'm trying to use Restic + acd_cli (FUSE python client for ACD), but it's very unreliable for now : some operations do not behave as expected (file truncate, rename) and Restic randomly panics as a result.
A working ACD storage backend would do wonder.
I'm currently reworking the interface to the backends, this includes a radical simplification. This is basically done, but not yet merged. For the plan, see #383, the PR is #395.
Afterwards it will be much easier to implement new backends.
Before implementing many new backends, I'd like to have a list of rules that services we write backends for must fulfill, this may include that a test instance of the service must be available that we can run the integration tests against.
Do you by chance know whether there is a test service for ACD we can use for tests?
As far as I know, there is no test instance for ACD. No mention of such a thing here : https://developer.amazon.com/public/apis/experience/cloud-drive/content/restful-api
But the https://github.com/ncw/rclone project already did an ACD backend in go. It seems to be fairly reusable as demonstrated by the proof of concept shared by @stapelberg .
Actually, looking at the revised interface, it would be reasonably easy to do a full wrapper for rclone filesystems. Maybe that way separate implementations isn't needed?
I don't know what @fd0 vision for Restic future is, but it would seem logical to focus the project on the backup intelligence instead of re-implementing a ton of remote filesystems one by one. Besides both project licenses are compatible.
It would also solve the worries about how to test all those backends.
@klauspost was your idea to create a wrapper around rclone/fs/fs.go ? Is it doable without being tightly coupled with the internal logic of rclone ?
Hm, interesting idea, I have to think about it. Not having to implement all the backends by ourselves looks like a good idea, on the other side (at least at the moment) I must admit that I don't like the thought of a tight coupling between restic and rcclone, as this introduces a dependency that we can't control...
I envision for restic that it should be easy to configure and use with a variety of suitable backends. This includes (in my opinion) only one place for configuration e.g. of the backends. Maybe that's possible with rcclone or at least part of their code. The interface looks suitable to be used with restic.
I pledged a 5$ bounty for this feature.
Some thoughts:
GET requests are always redirected with a Location header to Cloudfront. So you could use the Range header to request only a portion of a pack file as it is required by the new backend API. See: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RangeGETs.htmlin case the priority of this FR depends on the popular vote: +1
👍
👍
Yes please 👍
How about adding some more bounties to this feature?
See: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive
The bounty is now 35 USD: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive
Hey, thanks for your interest in restic in general and this backend in particular.
Just to give you a heads-up what's the blocker here: I'm not sure how to handle third-party web services. For the local and sftp backends we have extensive tests in place that are ran for every push/PR as part of the CI tests on Travis. This is also true for the s3 backend, there we're using a local Minio s3 server instance, due to this I've found several bugs in the minio client library we're using in the s3 backend.
How can we run CI tests for backend implementations that require a third-party service? Is there e.g. a test service for ACD we could use? Or maybe just take well-tested code from other projects such as rclone?
One solution might be to register an account with Amazon, whitelist it for the Cloud Drive API and then use that for the CI tests? The downside is that such a test depends on Cloud Drive being available, but I guess we can wait for an hour or so occasionally before merging a PR? :)
That's the only solution I can imaging right now that allows us to run the tests against a live service (and that's desirable in my opinion).
When we add more backends for other services the following will happen:
Did I forget anything?
Your list looks good. There are of course more effects, but I’m not sure whether they are in scope for the question you’re trying to answer:
This test account must allow parallel connections, as the tests (e.g. for a PR) are run in parallel
I think a simple way to take care of this requirement is to use different directories for each test invocation. Sending requests in parallel is usually not an issue with these services, and the different directories make sure the tests don’t clash.
I think a simple way to take care of this requirement is to use different directories for each test invocation. Sending requests in parallel is usually not an issue with these services, and the different directories make sure the tests don’t clash.
What I meant was more of a question how many parallel connections a service accepts. For most web-based services this won't be limited (at least concerning the number of connections we require), but this may not be the case for other, more obscure services.
When a service limits the number of connections so aggressively that our testing is impacted, we could ask the service owner for an exception or rate-limit on our end as well. As a last resort, we could disable the tests for the backend in question or remove that backend altogether.
But, I suggest we cross that bridge when we get there :).
I tried to use ACD with restic (through Fuse) and the system is still reliable (same errors as kisscool). I tried rClone to check their backend and there isn't any problem.
But I do not like rClone (snapshots...).
Conclusion: +1 for a native ACD backend!
Is there a problem with trying to write this? The issue seems to imply it's basically done, but delayed due to other architecture goals. This was a while ago, though, so... should I write this backend from scratch and pull request, or is it still being done internally?
Writing the backend is the not so hard part, figuring out a good user interface for configuring access to the service is what's still missing here. What's the workflow for acquiring an oauth token for ACD? What does a user need to do to access ACD via the API?
rclone does this using a local webserver, and the backend for duplicity provided a link to a solution hosted by the developer. The former is cumbersome for headless, and the latter makes this into a service that you'd want to continue to provide for free, although the resources would cost quite a bit. That makes the former the only viable option, in my opinion.
Are you opposed to a webserver, in the style of rclone, to accomplish this? Perhaps restic could ask you what interface to bind on, or give you the opportunity to use a different copy of the same program (this is what rclone does) on a non-headless machine, and simply copy credentials from one host to the other (JSON prints in console, or on the page).
I'll have to check how rclone automates this process for testing, if it does that at all.
Yeah, testing is another story. How do we run the CI tests for these backends? Please don't get me wrong, I'd love to add cloud-based backends, but we need a clear strategy for this.
For configuration, I can also imagine having a CLI-based process, where restic prints instructions to the user. I'm not super familiar with the process at the moment, is that even possible?
What do you think?
What about simply integrating with rclone? Kinda a "unix philosophy" type of idea, you continue to be great at doing backups, they continue to be great at cloud copy, and users get best of both worlds.
I'd have to see what their API affords, if it exists, but since we'd be leaving configuration solely to rclone (users would use rclone to add their cloud accounts, then would configure restic to "use rclone") you'd add far less complexity to restic.
I'll preface this by saying I've not read the docs for ACD, but I have implemented some basic oauth stuff in the past.
If we implemented a webserver, we could simply test the following thing:
and a few others, but the point is, we mock what we have to, and test the rest. I don't know if that'd test _everything_ but it'd test as much as we possibly could. I've not written extensive tests for my projects in the past, so please correct me if this isn't testing the right things haha
Sorry for glossing over your reply, was typing mine a bit before I saw yours!
For configuration, I can also imagine having a CLI-based process, where restic prints instructions to the user.
I imagine the process would probably be something like this, and this... again, is just rclone. I only reference it over and over because it's the only program that does _exactly_ what we're talking about quite well.
To start:
Then, setup ACD:
Using auto config (non-headless)
It, very quickly, opened a local webserver in my browser which immediately redirected to Amazon for login, and once I signed in, it communicated with the CLI app producing:
This, in my browser:
and this, in my command line
(60% of the credential is off to the right side of my screen, outside the screenshot, I don't think this is a security risk, haha)
Then I save it, and it's now usable in the program.
If I select the other option, then I am simply directed to do the following:
and that just does the same process on the machine I download rclone on (opens a browser, gives me an authorization key) but instead gives it to me in the CLI to copy and paste to the headless machine.
Thanks for describing the process in such great detail, that is already very similar to what I had in mind.
I'm wondering: Why is the webserver needed at all? This process works for a "workstation" type of machine, but not on a server (where there is not browser). The workflow used by rclone is described here: http://rclone.org/remote_setup/
I don't know why we need a webserver for this, but I haven't implemented an oauth-based login workflow yet.
We'll also need a config file to store the token configured for the remote in, that's also not yet done.
From my understanding, which is limited, the oauth data is provided to the user using the GET data in a redirect.
Have a look at the URL in my screenshot of the browser. That was put there by Amazon. After I hit sign-in on my amazon cloud drive, it redirected me, immediately, to that 127.0.0.1 URL.
Perhaps that is the only way to get this data. This is likely the case, because rclone implemented a webserver instead of picking another simpler solution. When I implemented oauth before, this seemed to be the implication.
If I am correct, then it follows that you must run your own webserver to provide a page to redirect to amazon, and a page to handle the redirect from amazon to do this, and this must be accessed through a web-browser.
As for config file, I think all we need is a file that's in a default location (~/.restic.conf) but can be configured via a flag or environment variable. I think this is a bit dirty, but it's only viable solution that is transparent to the average user, but powerful for those who wish to do it "their way"
That sounds plausible. Let me think about a strategy here, this may take some time.
We'll need to:
my_amazon_account which is a ACD backend configured with a login token, so users can run restic --repo my_amazon_account:/foo/bar/dir ...Anything else I'm missing here?
I think you got the big stuff outlined there.
Would you want to move all current backends into a single abstraction that supports this, or would this whole system become a "cloud" backend in the current sense of a backend (which itself is configured through special restic commands)?
Each instance of a cloud backend (google drive, onedrive, amazon cloud drive, S3?) has the following components:
and maybe some other stuff I'm missing
The current abstraction, from my quick read, only relates to the last thing. I think this is a pretty smart way of handling backends, if you're looking to revamp it a bit.
The other option is to simply, as I said, implement a "cloud" backend which does all of these things and rolls all the different providers together under it's umbrella.
Here's some background in regards to embedding a client secret in open source applications: http://stackoverflow.com/a/28109307
As far as I understand the problem: You're not allowed to embed a client secret in an open source application. rclone employs some obfuscation to hide what they're embedding.
I doubt that embedding a static client id/secret in restic's source code is a good idea. On the other hand, having the user register an application themselves is complicated.
This article describes how to do oauth2 with Go: https://jacobmartins.com/2016/02/29/getting-started-with-oauth2-in-go/
I doubt that embedding a static client id/secret in restic's source code is a good idea.
There is no real solution, it is a broken concept to assume that any client can keep a secret.
However, if you consider what the client secret contains, it is not that important. The only real thing it allows is for Amazon (and others) to be able to identify a specific client, nothing more. It does not grant any special access - your tokens are used for that.
Sure a publicly available "client secret" can make other application identify themselves as restic, but other than risk that "restic" will be banned (or more likely rate limited) as a client, there is not much risk at exposing the client "secret". It will never put any user data in jeopardy.
The problem here is that somebody needs to register the clientID, for example me. If I'm using my normal Amazon account (or even worse, my Google account), and "violate" the TOS for the service by publishing the client secret, they can terminate my account. That's not something I'm going to risk.
Another problem is that once the client secret changes (or is revoked), we're stuck with older versions of restic e.g. in Debian stable which are unable to communicate with the service because of a hardcoded (and now invalid) client secret. This is the case even if access to the service is restored shortly after, but the client secret has changed.
I've thought about possible solutions and found only two:
Currently, I'm in favor of the second option, we need a UI for the oauth token thing anyway. What do you think?
If I'm using my normal Amazon account [...]
I know that Nick has had some correspondence with Amazon, since rclone was being rate limited due to many users. It is however my impression (from memory) that they were quite forthcoming and encouraged OS development, and have made exemptions for his client. So I guess my advise would be to contact them and see how things go from there. In the overall picture I don't think they would mind the business coming from restic users.
Interesting idea, do you have any hint on who to contact at Amazon?
For Microsoft OneDrive he said that he did not contact anyone: https://github.com/ncw/rclone/issues/372
I know that @breunigs had bad luck with his amazon cloud drive duplicity backend — they wouldn’t give him any rate limit exemptions AFAIK.
I have only read the last few comments, so please forgive me if this info is not needed:
Also, a final word of advice: read through rclone's workarounds for Amazon Drive. The API contains a lot of undocumented "eventual consistency" gotchas. It even goes out of its way to cache an outdated response it gave you, so that you need to wait even longer if you were too hasty to begin with. This is on top of it reporting errors when there are none, one just needs to wait.
HTH,
Stefan
Thanks for the information!
Just throwing something out there:
What is we remove all (but local and REST) backends from restic and stick them into restic/rest-server?
This allows restic to focus on doing backups properly and filesystem implementations are done in the rest-server.
This also leaves restic with just 1 backends API to maintain.
This doesn't solve the testing problem, but will certainly help keep the restic source clean/focussed and it is easier to make API changes inside restic.
Thanks for the suggestion. Unfortunately I don't like it at all, in my opinion this approach (adding an intermediate layer including a new transport via HTTP) will lead to even more problems.
The backend API interface was stable for a long time, then changed recently, and will be stable again. The interface is already rather small.
We should try to get backends into restic (including proper CI tests) as soon as possible, that's IMHO the only way to make sure they work.
In case of the Amazon ACD backend, we need to answer the outstanding questions first.
The Amazon Developer Guide for Amazon Drive (what's it called these days) states that:
What Not To Build
[...]
- Don’t build apps that encrypt customer data
I feel that Amazon Drive is not the right platform for securely storing encrypted backups.
Interesting. This must be a new addition, as it definitely was not the case when ACD support was added to Arq.
Seems ACD is not a real storage option after all.
Indeed an addition within the last year, wasn't listed one year ago: http://web.archive.org/web/20160322034250/https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide
Amazon has since clarified this in https://forums.developer.amazon.com/questions/54909/impact-of-dont-encrypt-customer-data-part-of-drive.html:
What if the customer choses to encrypt their data?
They can do that, and that is fine.
So, restic and other apps should be good.
I think their intention is to protect the users having their data encrypted without a way to recover it.
Steffen
One other motivation which I find plausible is to increase interoperability — if each application encrypts their files, the user’s ability to switch between applications is severely hampered.
I asked Arq Backup support. They encrypt everything, and said that their app had been approved by Amazon, and to not worry.
I'm not sure what Amazon is trying to say. But seems that are now evaluating each case as they come in.
Not sure if anybody is aware of the recent ACD drama with acd_cli and rclone, but a TL;DR of the situation is that they have had their ACD API access revoked due to TOS violations. Their efforts to regain API access are apparently being hampered by the fact that Amazon has stopped accepting new third-party apps for ACD. I assume this latter revelation stops any Restic ACD support in its tracks, unless the project had already obtained ACD API access.
acd_cli API access was revoked due to a security issue with their oauth app, not a TOS violation. The problem has been fixed and Amazon re-instated their key. Although this is off topic from this project.
New ACD API access is currently closed.
Thanks for posting this here, I wasn't aware of it. I had reservations implementing ACD, and it seems that Amazon indeed did not like secrets in the code of an Open Source program: rclone was banned for it: https://forum.rclone.org/t/rclone-has-been-banned-from-amazon-drive/2314
On the other hand, acd_cli implemented an OAUTH auth service (not sure what the correct nomenclature here is). This handles authorization for all users, and there apparently was a bug that allowed people to access/modify other people's files.
Since Amazon isn't accepting new clients anyway I'm closing this issue for now. Thanks!
Most helpful comment
I tried to use ACD with restic (through Fuse) and the system is still reliable (same errors as kisscool). I tried rClone to check their backend and there isn't any problem.
But I do not like rClone (snapshots...).
Conclusion: +1 for a native ACD backend!