Etcd: Meta issue: Pluggable backend

Created on 13 Dec 2018  路  7Comments  路  Source: etcd-io/etcd

etcd supports an on-disk backend via bbolt. However, there is growing interest to support other backends. For example, cloud provider's might have a hosted database with appropriate semantics to act as the backend and etcd can run in a "serverless" environment.

As a concrete example: Microsoft announced a CosmosDB backend for etcd to support their AKS project. I think we should work to see if we can get that backend integrated into etcd which might prove out the model. @khenidak @mboersma WDYT?

stale

Most helpful comment

trying out badger as an alternative backend here: https://github.com/etcd-io/etcd/pull/10311

All 7 comments

trying out badger as an alternative backend here: https://github.com/etcd-io/etcd/pull/10311

Post holiday bump @khenidak @mboersma.

馃尣

I've tried this a few ways. While it's possible only to replace boltdb, the alternative is to reimplement the proto API to make something that is wire-compatible with etcd but doesn't share much/any code. Most backends that people are considering themselves support distributed consistency, so the raft implementation which is the bulk of what's here is not particularly applicable.

I've found that reimplementing the API produces a much better result, so I think we should prioritize efforts to make it easier for people to do that e.g. conformance tests, #10324, data export tools.

On Mon, Jan 14, 2019 at 9:59 PM Justin Santa Barbara <
[email protected]> wrote:

Most backends that people are considering themselves support distributed
consistency, so the raft implementation which is the bulk of what's here is
not particularly applicable

Do you have links to those or can you CC the people in?

Most backends that people are considering themselves support distributed consistency, so the raft implementation which is the bulk of what's here is not particularly applicable.

The thing that can be reused is the mvcc package which supports the etcd data model and the gRPC layer rpc implementation layer which handles stream demultiplexing. etcdserver itself can be stateless, and raft will not even be involved.

I've found that reimplementing the API produces a much better result,

Have you gave it a try? Do you have a reference implementation that we can take a look?

The backend efforts that I'm aware of are CosmosDB (existing), DynamoDB (stopped?) and FoundationDB (speculative). The dynamodb efforts took place at the k8s storage layer, and I think it was in response to that that CosmosDB re-implemented the etcd protocol instead, as that is our recommended direction per https://github.com/kubernetes/kubernetes/issues/53162. I don't know of anyone specifically that I could cc, but of course the "implement the etcd protocol" strategy can be done without contributing to the OSS repos, so we don't have full visibility.

I've put together a few prototypes - one where I effectively integrated the storage layer into apiserver (big wins, but architecturally difficult), one where I just implemented the etcd API, and one where I replaced boltdb in the etcd source code with one of the cloud key-value stores. The storage layer is https://github.com/kubernetes/kubernetes/pull/37536, and I'll see if I can open source the other two. But I found there was a lot of overhead from reusing the existing etcd code (both code overhead and efficiency overhead), and I was surprised in comparison how easy it was to just shim to an existing distributed storage system. It gets even easier if you are willing to leader-elect a single node to run the watches, which is likely less scalable than a fully scale-out implementation but is probably comparable to anything Raft based.

My view was that it wasn't clear that my reimplementations were going to be that much better than etcd (as I was doing single-node anyway), and it wasn't clear what the effect would be on the etcd project. But none of these were more than a few hours work, and I learned a lot about the tradeoffs by doing it, so I'd encourage people to give it a try!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

govine picture govine  路  3Comments

suresh-chaudhari picture suresh-chaudhari  路  3Comments

olalonde picture olalonde  路  4Comments

itnikita picture itnikita  路  3Comments

primeroz picture primeroz  路  3Comments