Dgraph: Support Multi-Tenancy

Created on 25 Oct 2018  路  23Comments  路  Source: dgraph-io/dgraph

What you wanted to do

Create multiple db/schema on the same server such as tradition dbs (Postgres/MySql/SqlServer). Useful for personal VMs and raspberry pi where I can replace traditional db with dgraph and run in only on server.

What you actually did

Add prefix.

Why that wasn't great, with examples

cumbersome because now I need to create variables for prefix. If I accidental drop schema, I drop everything.

Any external references to support your case

CREATE DATABASE foodatabase or CREATE SCHEMA fooschema.
Would love to have something like localhost:8080/alter/foodatabase. If foodatabase is not provided it would default to existing behavior.

aresecurity aretools exexpert kinfeature popular prioritP1 statuaccepted statuneeds-specs

Most helpful comment

There are many reasons to have multiple databases; for example, it's typical to have a dev, test, and prod environment with their respective databases. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It's also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn't seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Lastly, having multiple database support will help people transitioning from an RDMS world to have an easier time making the switch.

All 23 comments

The advantage of a graph DB is that multiple data sources can be combined together into one, and queried across. Given that benefit, having the division of a database is at best a low priority feature request.

There are many reasons to have multiple databases; for example, it's typical to have a dev, test, and prod environment with their respective databases. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It's also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn't seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Lastly, having multiple database support will help people transitioning from an RDMS world to have an easier time making the switch.

A major reason for us multiple databases is such an important factor is multitenancy. We intend to implement multitenancy with a database schema per account. That makes data isolation a lot easier (which includes removing an account for example) without provisioning and maintaining thousands of database servers. Implementing multitenancy in dgraph as a schema per account is even a stronger case in my mind since it lacks any mid-level namespace to segment the data (like tables in SQL/Cassandra or collections in MongoDB). That leaves very few options to go about segmenting the data effectively.

I agree with @brianbroderick - we implemented a microservices approach and one of the services is currently using dgraph. We avoid using dgraph for any other service since it would involve automation complexity which we find hard justifying. Had it been any easier to work with more than a single schema, dgraph usage would certainly proliferate in our case.

Support for multiple isolated databases on a single server would significantly increase my API test's execution speed which is currently over 226 seconds since all tests need to be executed serially! If I had the guarantee of isolated databases I could setup a database instance for each test individually allowing API tests to run in parallel. It'd theoretically be possible to go from 226s to under 10s (which is huge!)

I could do it myself with graph namespacing, but that'd be very error prone since there's no isolation guarantees, one test could start mutating another tests's database leading to a big mess.

I hope this feature will be implemented soon!

This would also be extremely useful for my use case as well using dgraph to support multiple workspaces. Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

@aoighost

Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

It would be the opposite of useful. If you have relationships across "databases" you have a single database. The multi-database feature is about isolation such that one database is physically isolated from another yet maintained by the same process for convenience.

@romshark good point

Whoa, this is a popular request!

OK, we'll be working on this and seeing whether it can be part of our next release v1.2 expected to be released end of September.

This might be a good place for the label field in n-quads.

Hi there @AgentZombie,

Could you explain what you mean by "the label field in n-quads"?

Sorry. I was speaking specifically about the graph label field in RDF n-quads. dgraph specifically reads RDF n-quads as a superset of n-triples but doesn't use the fourth value to specify a named graph.

From https://www.w3.org/TR/n-quads/#sec-intro

The simplest statement is a sequence of (subject, predicate, object) terms forming an RDF triple and an optional blank node label or IRI labeling what graph in a dataset the triple belongs to, all are separated by whitespace and terminated by '.' after each statement.

This was referenced here, #1143, and probably other places.

I wasn't aware of that, and it does make sense to consider it as part of our support for multi-tenancy.

Thanks, @AgentZombie

I think it will be a great feature

I'd appreciate it if this were not an enterprise feature. I'm trying to build a app that uses dgraph on the backend as a graph store and multi tenancy would make it a lot easier to build without having to spin up a new docker instance for each workspace. Enterprise only would kill the use of that feature for me. I should also note multi tenancy would make it a lot easier for app developers to use dgraph in general, as it would make it easier to have multiple apps on one pc running dgraph for a backend.

We're attempting to use one big dgraph instance to serve many discrete customers and need data isolation. This would be a great feature for us.

In the meantime we've been experimenting with putting a tenant predicate on every entity- however I worry that this might have some performance drawbacks since every query we send into dgraph has to be a tenant = x query, followed by a @filter of what the end user actually wanted.

From how I understand how dgraph does query planning, I think this means all my queries can only be as fast as that original tenant = x lookup (which hits millions of documents), right? (Since I always need to start at the tenant predicate and then filter)

We are evaluating / prototyping further the use of dgraph.
For us the minimal set to evaluate Dgraph, requires 3 environments as per the regular dev pipeline : Development / Staging / Production.

Further more the GDPR constrains my company in Europe to partition the data.
We need security by design at the organization level.
A database without Multi-Tenancy feature is a No Go for most companies in Europe.

Even an academic project in Europe cannot use the community edition if they use some kind of personal data (As you cannot tell who can access the data precisely / easily).

As a core DB feature, I believe it should be part of the community edition.

Another upvote for this feature :D

Another upvote for this feature, it can also help Dgraph Labs to launch their own Dgraph As A Service easier

This feature was marked for the 1.2 milestone but I don鈥檛 see it in the change log of the 1.2 release, did this feature not made it?

Hi @dvaldivia,

Thanks for your question. Correct, it has not been included in v1.2, but it is in our roadmap and we are currently working on it

Another upvote for this feature :D

To further the discussion on this issue, I would like to bring up that using the RFC for RDF-NQUADs we see the field that was added to the original RDF triple was the _graphLabel_.

This RDF triple is added to the graph labeled by the production graphLabel, if no graphLabel is present the triple is added to the RDF datasets default graph.

This graph label seems like a very idiomatic way to ingest data for "different graphs", thus providing a functional separation layer here.

In fact, dgraph uses the NQUAD format during ingestion and even allows setting the graph Label on the NQUAD used for mutations in the public gRPC API.

I can see a backwards-compatible API change where a new parameter is added to the public API to specify graphLabel at query time, falling back to the default graph.

Obviously true multi-tenancy would require more strict ACL coordination with this graphLabel, as well as the Types belonging to individual 'graphs' - but as a first step, the application layer on top of dgraph could provide that validation before a query is sent to dgraph.

Since the Q1 label on the roadmap is obviously not happening for this feature, can we get it updated to reflect the actual expected feature scheduling?

Thanks!

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pepoospina picture pepoospina  路  3Comments

pjebs picture pjebs  路  4Comments

andrewsmedina picture andrewsmedina  路  4Comments

jerodsanto picture jerodsanto  路  3Comments

djdoeslinux picture djdoeslinux  路  4Comments