restic 🚀 - Storage Backend: Amazon Cloud Drive

Just thought the same thing.

I may look at this once #21 is in place - uploading without compression seems like a waste of bandwidth.

klauspost on 14 Aug 2015

That depends on your use case ;)

fd0 on 14 Aug 2015

There’s some proof of concept code I found: http://sprunge.us/fdQF — it requires that an oauth token is in /tmp/token.json, but seems to work for me.

Motivated people could turn that into a clean backend for ACD :).

stapelberg on 29 Dec 2015

+1 from me :)
I compile the backend @stapelberg found, and it indeed appears to work and use code from rclone.

jsimonetti on 16 Jan 2016

+1 too
I'm trying to use Restic + acd_cli (FUSE python client for ACD), but it's very unreliable for now : some operations do not behave as expected (file truncate, rename) and Restic randomly panics as a result.
A working ACD storage backend would do wonder.

kisscool on 26 Jan 2016

I'm currently reworking the interface to the backends, this includes a radical simplification. This is basically done, but not yet merged. For the plan, see #383, the PR is #395.

Afterwards it will be much easier to implement new backends.

Before implementing many new backends, I'd like to have a list of rules that services we write backends for must fulfill, this may include that a test instance of the service must be available that we can run the integration tests against.

Do you by chance know whether there is a test service for ACD we can use for tests?

fd0 on 26 Jan 2016

As far as I know, there is no test instance for ACD. No mention of such a thing here : https://developer.amazon.com/public/apis/experience/cloud-drive/content/restful-api

But the https://github.com/ncw/rclone project already did an ACD backend in go. It seems to be fairly reusable as demonstrated by the proof of concept shared by @stapelberg .

kisscool on 26 Jan 2016

Actually, looking at the revised interface, it would be reasonably easy to do a full wrapper for rclone filesystems. Maybe that way separate implementations isn't needed?

klauspost on 26 Jan 2016

I don't know what @fd0 vision for Restic future is, but it would seem logical to focus the project on the backup intelligence instead of re-implementing a ton of remote filesystems one by one. Besides both project licenses are compatible.
It would also solve the worries about how to test all those backends.

@klauspost was your idea to create a wrapper around rclone/fs/fs.go ? Is it doable without being tightly coupled with the internal logic of rclone ?

kisscool on 26 Jan 2016

Each backend implements the fs.Fs interface. Each file is represented as an fs.Object.

It should be fairly easy to create a restic backend that uses an rclone filessystem+folder, provided it is already set up in the rclone configuration.

klauspost on 26 Jan 2016

Hm, interesting idea, I have to think about it. Not having to implement all the backends by ourselves looks like a good idea, on the other side (at least at the moment) I must admit that I don't like the thought of a tight coupling between restic and rcclone, as this introduces a dependency that we can't control...

I envision for restic that it should be easy to configure and use with a variety of suitable backends. This includes (in my opinion) only one place for configuration e.g. of the backends. Maybe that's possible with rcclone or at least part of their code. The interface looks suitable to be used with restic.

fd0 on 26 Jan 2016

I pledged a 5$ bounty for this feature.

Some thoughts:

Amazon Cloud Drive is using AWS S3 / CloudFront as its backend. The GET requests are always redirected with a Location header to Cloudfront. So you could use the Range header to request only a portion of a pack file as it is required by the new backend API. See: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RangeGETs.html
You could start with integrating go-acd. That's the library which is used by rclone.

stv0g on 31 Jan 2016

in case the priority of this FR depends on the popular vote: +1

romusz on 16 May 2016

👍

heikobornholdt on 29 May 2016

👍

Intensity on 4 Jun 2016

Yes please 👍

nunofgs on 25 Jun 2016

How about adding some more bounties to this feature?

See: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive

stv0g on 2 Jul 2016

The bounty is now 35 USD: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive

stapelberg on 5 Jul 2016

Hey, thanks for your interest in restic in general and this backend in particular.

Just to give you a heads-up what's the blocker here: I'm not sure how to handle third-party web services. For the local and sftp backends we have extensive tests in place that are ran for every push/PR as part of the CI tests on Travis. This is also true for the s3 backend, there we're using a local Minio s3 server instance, due to this I've found several bugs in the minio client library we're using in the s3 backend.

How can we run CI tests for backend implementations that require a third-party service? Is there e.g. a test service for ACD we could use? Or maybe just take well-tested code from other projects such as rclone?

fd0 on 5 Jul 2016

One solution might be to register an account with Amazon, whitelist it for the Cloud Drive API and then use that for the CI tests? The downside is that such a test depends on Cloud Drive being available, but I guess we can wait for an hour or so occasionally before merging a PR? :)

stapelberg on 5 Jul 2016

That's the only solution I can imaging right now that allows us to run the tests against a live service (and that's desirable in my opinion).

When we add more backends for other services the following will happen:

The number of dependent services for running the CI tests grows
For each backend we will need a test account to use in the tests
This test account must allow parallel connections, as the tests (e.g. for a PR) are run in parallel

Did I forget anything?

fd0 on 5 Jul 2016

Your list looks good. There are of course more effects, but I’m not sure whether they are in scope for the question you’re trying to answer:

More backends means changes that touch the backends API become more involved (need to update more code, test more code).
People who want to run the tests locally need to create test accounts or use their own account.
More backends make restic more appealing to more people :).

This test account must allow parallel connections, as the tests (e.g. for a PR) are run in parallel

I think a simple way to take care of this requirement is to use different directories for each test invocation. Sending requests in parallel is usually not an issue with these services, and the different directories make sure the tests don’t clash.

stapelberg on 5 Jul 2016

I think a simple way to take care of this requirement is to use different directories for each test invocation. Sending requests in parallel is usually not an issue with these services, and the different directories make sure the tests don’t clash.

What I meant was more of a question how many parallel connections a service accepts. For most web-based services this won't be limited (at least concerning the number of connections we require), but this may not be the case for other, more obscure services.

fd0 on 5 Jul 2016

When a service limits the number of connections so aggressively that our testing is impacted, we could ask the service owner for an exception or rate-limit on our end as well. As a last resort, we could disable the tests for the backend in question or remove that backend altogether.

But, I suggest we cross that bridge when we get there :).

stapelberg on 5 Jul 2016

I tried to use ACD with restic (through Fuse) and the system is still reliable (same errors as kisscool). I tried rClone to check their backend and there isn't any problem.

But I do not like rClone (snapshots...).

Conclusion: +1 for a native ACD backend!

Siviuze on 17 Oct 2016

👍6

Is there a problem with trying to write this? The issue seems to imply it's basically done, but delayed due to other architecture goals. This was a while ago, though, so... should I write this backend from scratch and pull request, or is it still being done internally?

Twister915 on 7 Dec 2016

Writing the backend is the not so hard part, figuring out a good user interface for configuring access to the service is what's still missing here. What's the workflow for acquiring an oauth token for ACD? What does a user need to do to access ACD via the API?

fd0 on 7 Dec 2016

rclone does this using a local webserver, and the backend for duplicity provided a link to a solution hosted by the developer. The former is cumbersome for headless, and the latter makes this into a service that you'd want to continue to provide for free, although the resources would cost quite a bit. That makes the former the only viable option, in my opinion.

Are you opposed to a webserver, in the style of rclone, to accomplish this? Perhaps restic could ask you what interface to bind on, or give you the opportunity to use a different copy of the same program (this is what rclone does) on a non-headless machine, and simply copy credentials from one host to the other (JSON prints in console, or on the page).

I'll have to check how rclone automates this process for testing, if it does that at all.

Twister915 on 7 Dec 2016

Yeah, testing is another story. How do we run the CI tests for these backends? Please don't get me wrong, I'd love to add cloud-based backends, but we need a clear strategy for this.

For configuration, I can also imagine having a CLI-based process, where restic prints instructions to the user. I'm not super familiar with the process at the moment, is that even possible?

What do you think?

fd0 on 7 Dec 2016

Idea 1 - Just use rclone

What about simply integrating with rclone? Kinda a "unix philosophy" type of idea, you continue to be great at doing backups, they continue to be great at cloud copy, and users get best of both worlds.

I'd have to see what their API affords, if it exists, but since we'd be leaving configuration solely to rclone (users would use rclone to add their cloud accounts, then would configure restic to "use rclone") you'd add far less complexity to restic.

Idea 2 - How the tests might look if we went webserver route

I'll preface this by saying I've not read the docs for ACD, but I have implemented some basic oauth stuff in the past.

If we implemented a webserver, we could simply test the following thing:

Does the webserver respond on the right ports/addresses
When I GET the / route, does it contain what I'd expect it to (a redirect, the text we put there, etc)
If I mock the response that would come from amazon (more thought needed on how to do this correctly) how does it behave?
Still needs a lot more thought: static "testing" credentials of some sort, to test the API. I'll have to look at how you test other services first.

and a few others, but the point is, we mock what we have to, and test the rest. I don't know if that'd test _everything_ but it'd test as much as we possibly could. I've not written extensive tests for my projects in the past, so please correct me if this isn't testing the right things haha

Twister915 on 7 Dec 2016

Sorry for glossing over your reply, was typing mine a bit before I saw yours!

For configuration, I can also imagine having a CLI-based process, where restic prints instructions to the user.

I imagine the process would probably be something like this, and this... again, is just rclone. I only reference it over and over because it's the only program that does _exactly_ what we're talking about quite well.

To start:
Using config

Then, setup ACD:
ACD

Using auto config (non-headless)

It, very quickly, opened a local webserver in my browser which immediately redirected to Amazon for login, and once I signed in, it communicated with the CLI app producing:

This, in my browser:
browser

and this, in my command line
(60% of the credential is off to the right side of my screen, outside the screenshot, I don't think this is a security risk, haha)
code

Then I save it, and it's now usable in the program.

If I select the other option, then I am simply directed to do the following:
non headless

and that just does the same process on the machine I download rclone on (opens a browser, gives me an authorization key) but instead gives it to me in the CLI to copy and paste to the headless machine.

Twister915 on 7 Dec 2016

Thanks for describing the process in such great detail, that is already very similar to what I had in mind.

I'm wondering: Why is the webserver needed at all? This process works for a "workstation" type of machine, but not on a server (where there is not browser). The workflow used by rclone is described here: http://rclone.org/remote_setup/

I don't know why we need a webserver for this, but I haven't implemented an oauth-based login workflow yet.

We'll also need a config file to store the token configured for the remote in, that's also not yet done.

fd0 on 7 Dec 2016

From my understanding, which is limited, the oauth data is provided to the user using the GET data in a redirect.

Have a look at the URL in my screenshot of the browser. That was put there by Amazon. After I hit sign-in on my amazon cloud drive, it redirected me, immediately, to that 127.0.0.1 URL.

Perhaps that is the only way to get this data. This is likely the case, because rclone implemented a webserver instead of picking another simpler solution. When I implemented oauth before, this seemed to be the implication.

If I am correct, then it follows that you must run your own webserver to provide a page to redirect to amazon, and a page to handle the redirect from amazon to do this, and this must be accessed through a web-browser.

As for config file, I think all we need is a file that's in a default location (~/.restic.conf) but can be configured via a flag or environment variable. I think this is a bit dirty, but it's only viable solution that is transparent to the average user, but powerful for those who wish to do it "their way"

Twister915 on 7 Dec 2016

That sounds plausible. Let me think about a strategy here, this may take some time.

We'll need to:

have a config file to store the authentication tokens
have "instances" of backends, e.g. something called my_amazon_account which is a ACD backend configured with a login token, so users can run restic --repo my_amazon_account:/foo/bar/dir ...
have a workflow to create these login tokens
register a client id and secret for use with restic, and hide it in the source code (similar to what rclone does)

Anything else I'm missing here?

fd0 on 7 Dec 2016

I think you got the big stuff outlined there.

Would you want to move all current backends into a single abstraction that supports this, or would this whole system become a "cloud" backend in the current sense of a backend (which itself is configured through special restic commands)?

Each instance of a cloud backend (google drive, onedrive, amazon cloud drive, S3?) has the following components:

Optional: authorization mechanism
Optional: arbitrary persistent state, typically related to authorization, but should support other things. This would be written by restic to a file for each instance of the backend.
A protocol for communication (ie: the interface/API)

and maybe some other stuff I'm missing

The current abstraction, from my quick read, only relates to the last thing. I think this is a pretty smart way of handling backends, if you're looking to revamp it a bit.

The other option is to simply, as I said, implement a "cloud" backend which does all of these things and rolls all the different providers together under it's umbrella.

Twister915 on 7 Dec 2016

Have a workflow to refresh auth tokens (if they expire and the provider supplies a refresh token, which should be stored next to the auth token in your point 1)

jsimonetti on 7 Dec 2016

👍1

Here's some background in regards to embedding a client secret in open source applications: http://stackoverflow.com/a/28109307

As far as I understand the problem: You're not allowed to embed a client secret in an open source application. rclone employs some obfuscation to hide what they're embedding.

I doubt that embedding a static client id/secret in restic's source code is a good idea. On the other hand, having the user register an application themselves is complicated.

This article describes how to do oauth2 with Go: https://jacobmartins.com/2016/02/29/getting-started-with-oauth2-in-go/

fd0 on 8 Dec 2016

I doubt that embedding a static client id/secret in restic's source code is a good idea.

There is no real solution, it is a broken concept to assume that any client can keep a secret.

However, if you consider what the client secret contains, it is not that important. The only real thing it allows is for Amazon (and others) to be able to identify a specific client, nothing more. It does not grant any special access - your tokens are used for that.

Sure a publicly available "client secret" can make other application identify themselves as restic, but other than risk that "restic" will be banned (or more likely rate limited) as a client, there is not much risk at exposing the client "secret". It will never put any user data in jeopardy.

klauspost on 12 Dec 2016

The problem here is that somebody needs to register the clientID, for example me. If I'm using my normal Amazon account (or even worse, my Google account), and "violate" the TOS for the service by publishing the client secret, they can terminate my account. That's not something I'm going to risk.

Another problem is that once the client secret changes (or is revoked), we're stuck with older versions of restic e.g. in Debian stable which are unable to communicate with the service because of a hardcoded (and now invalid) client secret. This is the case even if access to the service is restored shortly after, but the client secret has changed.

I've thought about possible solutions and found only two:

Live with the risk and just put the client secret into the source
Build restic in a way that users need to register their own client ID and client secret, via a nice UI that minimizes the hassle

Currently, I'm in favor of the second option, we need a UI for the oauth token thing anyway. What do you think?

fd0 on 12 Dec 2016

If I'm using my normal Amazon account [...]

I know that Nick has had some correspondence with Amazon, since rclone was being rate limited due to many users. It is however my impression (from memory) that they were quite forthcoming and encouraged OS development, and have made exemptions for his client. So I guess my advise would be to contact them and see how things go from there. In the overall picture I don't think they would mind the business coming from restic users.

klauspost on 13 Dec 2016

Interesting idea, do you have any hint on who to contact at Amazon?

For Microsoft OneDrive he said that he did not contact anyone: https://github.com/ncw/rclone/issues/372

fd0 on 13 Dec 2016

I know that @breunigs had bad luck with his amazon cloud drive duplicity backend — they wouldn’t give him any rate limit exemptions AFAIK.

stapelberg on 14 Dec 2016

I have only read the last few comments, so please forgive me if this info is not needed:

rclone implements the web server on top of it offering remote setup where you copy the URL. Having a local webserver is just more convenient
if you want to whitelist any redirect target in Amazon, it has to be on a https machine – linking to http URLs is not okay. Only exception is localhost. So, for remote setup you can either redirect the user to a blank page and hope they realize what they have to do, or host some page with instructions. I added https://breunig.xyz/duplicity/copy.html for duplicity, since it doesn't have https infrastructure yet. Amazon will add all details in the query string, so you can get away with making this a static page
You need an Amazon Developer account. You can use your existing credentials to log in I believe, but you can also create a new one
There is a process where you register your app and then at some later stage you can create a security profile for said app. This process is very confusing, because of horrible UX, but it should work without human interaction from Amazon. (Note: by App they usually refer to "mobile apps", but not always. Click around a bit)
What the limits are is unclear, Amazon don't say. It's clear there are multiple stages: per user, per API endpoint, per credentials
If you want production limits, you send an email with "details" to [email protected]. Use a big player mail server, or they will tag you as spam and it takes a month or two until some poor soul went through all their spam.

Also, a final word of advice: read through rclone's workarounds for Amazon Drive. The API contains a lot of undocumented "eventual consistency" gotchas. It even goes out of its way to cache an outdated response it gave you, so that you need to wait even longer if you were too hasty to begin with. This is on top of it reporting errors when there are none, one just needs to wait.

HTH,
Stefan

breunigs on 15 Dec 2016

Thanks for the information!

fd0 on 16 Dec 2016

Just throwing something out there:

What is we remove all (but local and REST) backends from restic and stick them into restic/rest-server?

This allows restic to focus on doing backups properly and filesystem implementations are done in the rest-server.
This also leaves restic with just 1 backends API to maintain.

This doesn't solve the testing problem, but will certainly help keep the restic source clean/focussed and it is easier to make API changes inside restic.

jsimonetti on 27 Jan 2017

Thanks for the suggestion. Unfortunately I don't like it at all, in my opinion this approach (adding an intermediate layer including a new transport via HTTP) will lead to even more problems.

The backend API interface was stable for a long time, then changed recently, and will be stable again. The interface is already rather small.

We should try to get backends into restic (including proper CI tests) as soon as possible, that's IMHO the only way to make sure they work.

In case of the Amazon ACD backend, we need to answer the outstanding questions first.

fd0 on 28 Jan 2017

The Amazon Developer Guide for Amazon Drive (what's it called these days) states that:

What Not To Build

[...]

Don’t build apps that encrypt customer data

I feel that Amazon Drive is not the right platform for securely storing encrypted backups.

fd0 on 29 Mar 2017

😕2

Interesting. This must be a new addition, as it definitely was not the case when ACD support was added to Arq.

Seems ACD is not a real storage option after all.

askielboe on 29 Mar 2017

Indeed an addition within the last year, wasn't listed one year ago: http://web.archive.org/web/20160322034250/https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide

e2b on 29 Mar 2017

Amazon has since clarified this in https://forums.developer.amazon.com/questions/54909/impact-of-dont-encrypt-customer-data-part-of-drive.html:

What if the customer choses to encrypt their data?
They can do that, and that is fine.

So, restic and other apps should be good.

stapelberg on 29 Mar 2017

I think their intention is to protect the users having their data encrypted without a way to recover it.

Steffen

stv0g on 29 Mar 2017

😄1 👍1

One other motivation which I find plausible is to increase interoperability — if each application encrypts their files, the user’s ability to switch between applications is severely hampered.

stapelberg on 29 Mar 2017

I asked Arq Backup support. They encrypt everything, and said that their app had been approved by Amazon, and to not worry.

I'm not sure what Amazon is trying to say. But seems that are now evaluating each case as they come in.

askielboe on 29 Mar 2017

Not sure if anybody is aware of the recent ACD drama with acd_cli and rclone, but a TL;DR of the situation is that they have had their ACD API access revoked due to TOS violations. Their efforts to regain API access are apparently being hampered by the fact that Amazon has stopped accepting new third-party apps for ACD. I assume this latter revelation stops any Restic ACD support in its tracks, unless the project had already obtained ACD API access.

mikesager on 30 May 2017

acd_cli API access was revoked due to a security issue with their oauth app, not a TOS violation. The problem has been fixed and Amazon re-instated their key. Although this is off topic from this project.

New ACD API access is currently closed.

sedlund on 30 May 2017

Thanks for posting this here, I wasn't aware of it. I had reservations implementing ACD, and it seems that Amazon indeed did not like secrets in the code of an Open Source program: rclone was banned for it: https://forum.rclone.org/t/rclone-has-been-banned-from-amazon-drive/2314

On the other hand, acd_cli implemented an OAUTH auth service (not sure what the correct nomenclature here is). This handles authorization for all users, and there apparently was a bug that allowed people to access/modify other people's files.

Since Amazon isn't accepting new clients anyway I'm closing this issue for now. Thanks!

fd0 on 30 May 2017

Restic: Storage Backend: Amazon Cloud Drive

Most helpful comment

All 56 comments

Idea 1 - Just use rclone

Idea 2 - How the tests might look if we went webserver route

Related issues