Dgraph: Support Multi-Tenancy

Created on 25 Oct 2018 · 23Comments · Source: dgraph-io/dgraph

What you wanted to do

Create multiple db/schema on the same server such as tradition dbs (Postgres/MySql/SqlServer). Useful for personal VMs and raspberry pi where I can replace traditional db with dgraph and run in only on server.

What you actually did

Add prefix.

Why that wasn't great, with examples

cumbersome because now I need to create variables for prefix. If I accidental drop schema, I drop everything.

Any external references to support your case

CREATE DATABASE foodatabase or CREATE SCHEMA fooschema.
Would love to have something like localhost:8080/alter/foodatabase. If foodatabase is not provided it would default to existing behavior.

aresecurity aretools exexpert kinfeature popular prioritP1 statuaccepted statuneeds-specs

Source

prabirshrestha

👍15 👀4

Most helpful comment

There are many reasons to have multiple databases; for example, it's typical to have a dev, test, and prod environment with their respective databases. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It's also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn't seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Lastly, having multiple database support will help people transitioning from an RDMS world to have an easier time making the switch.

brianbroderick on 24 Dec 2018

👍22

All 23 comments

The advantage of a graph DB is that multiple data sources can be combined together into one, and queried across. Given that benefit, having the division of a database is at best a low priority feature request.

manishrjain on 6 Nov 2018

Lastly, having multiple database support will help people transitioning from an RDMS world to have an easier time making the switch.

brianbroderick on 24 Dec 2018

👍22

A major reason for us multiple databases is such an important factor is multitenancy. We intend to implement multitenancy with a database schema per account. That makes data isolation a lot easier (which includes removing an account for example) without provisioning and maintaining thousands of database servers. Implementing multitenancy in dgraph as a schema per account is even a stronger case in my mind since it lacks any mid-level namespace to segment the data (like tables in SQL/Cassandra or collections in MongoDB). That leaves very few options to go about segmenting the data effectively.

I agree with @brianbroderick - we implemented a microservices approach and one of the services is currently using dgraph. We avoid using dgraph for any other service since it would involve automation complexity which we find hard justifying. Had it been any easier to work with more than a single schema, dgraph usage would certainly proliferate in our case.

liqweed on 31 Dec 2018

👍18

Support for multiple isolated databases on a single server would significantly increase my API test's execution speed which is currently over 226 seconds since all tests need to be executed serially! If I had the guarantee of isolated databases I could setup a database instance for each test individually allowing API tests to run in parallel. It'd theoretically be possible to go from 226s to under 10s (which is huge!)

I could do it myself with graph namespacing, but that'd be very error prone since there's no isolation guarantees, one test could start mutating another tests's database leading to a big mess.

I hope this feature will be implemented soon!

romshark on 24 May 2019

👍5

This would also be extremely useful for my use case as well using dgraph to support multiple workspaces. Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

peter-clemenko on 19 Jun 2019

@aoighost

Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

It would be the opposite of useful. If you have relationships across "databases" you have a single database. The multi-database feature is about isolation such that one database is physically isolated from another yet maintained by the same process for convenience.

romshark on 20 Jun 2019

👍1

@romshark good point

peter-clemenko on 26 Jun 2019

Whoa, this is a popular request!

OK, we'll be working on this and seeing whether it can be part of our next release v1.2 expected to be released end of September.

campoy on 13 Jul 2019

👍14

This might be a good place for the label field in n-quads.

jasonmf on 2 Aug 2019

Hi there @AgentZombie,

Could you explain what you mean by "the label field in n-quads"?

campoy on 6 Aug 2019

Sorry. I was speaking specifically about the graph label field in RDF n-quads. dgraph specifically reads RDF n-quads as a superset of n-triples but doesn't use the fourth value to specify a named graph.

From https://www.w3.org/TR/n-quads/#sec-intro

The simplest statement is a sequence of (subject, predicate, object) terms forming an RDF triple and an optional blank node label or IRI labeling what graph in a dataset the triple belongs to, all are separated by whitespace and terminated by '.' after each statement.

This was referenced here, #1143, and probably other places.

jasonmf on 6 Aug 2019

👍2

I wasn't aware of that, and it does make sense to consider it as part of our support for multi-tenancy.

Thanks, @AgentZombie

campoy on 17 Sep 2019

I think it will be a great feature

Willem520 on 24 Sep 2019

I'd appreciate it if this were not an enterprise feature. I'm trying to build a app that uses dgraph on the backend as a graph store and multi tenancy would make it a lot easier to build without having to spin up a new docker instance for each workspace. Enterprise only would kill the use of that feature for me. I should also note multi tenancy would make it a lot easier for app developers to use dgraph in general, as it would make it easier to have multiple apps on one pc running dgraph for a backend.

peter-clemenko on 28 Oct 2019

👍1

We're attempting to use one big dgraph instance to serve many discrete customers and need data isolation. This would be a great feature for us.

In the meantime we've been experimenting with putting a tenant predicate on every entity- however I worry that this might have some performance drawbacks since every query we send into dgraph has to be a tenant = x query, followed by a @filter of what the end user actually wanted.

From how I understand how dgraph does query planning, I think this means all my queries can only be as fast as that original tenant = x lookup (which hits millions of documents), right? (Since I always need to start at the tenant predicate and then filter)

seanlaff on 30 Oct 2019

We are evaluating / prototyping further the use of dgraph.
For us the minimal set to evaluate Dgraph, requires 3 environments as per the regular dev pipeline : Development / Staging / Production.

Further more the GDPR constrains my company in Europe to partition the data.
We need security by design at the organization level.
A database without Multi-Tenancy feature is a No Go for most companies in Europe.

Even an academic project in Europe cannot use the community edition if they use some kind of personal data (As you cannot tell who can access the data precisely / easily).

As a core DB feature, I believe it should be part of the community edition.

hubyhuby on 9 Nov 2019

👍17 😕1

Another upvote for this feature :D

cosmotek on 4 Jan 2020

Another upvote for this feature, it can also help Dgraph Labs to launch their own Dgraph As A Service easier

ChStark on 9 Jan 2020

This feature was marked for the 1.2 milestone but I don’t see it in the change log of the 1.2 release, did this feature not made it?

dvaldivia on 29 Jan 2020

👀3

Hi @dvaldivia,

Thanks for your question. Correct, it has not been included in v1.2, but it is in our roadmap and we are currently working on it

sleto-it on 5 Feb 2020

Another upvote for this feature :D

sinnergarden on 20 May 2020

To further the discussion on this issue, I would like to bring up that using the RFC for RDF-NQUADs we see the field that was added to the original RDF triple was the _graphLabel_.

This RDF triple is added to the graph labeled by the production graphLabel, if no graphLabel is present the triple is added to the RDF datasets default graph.

This graph label seems like a very idiomatic way to ingest data for "different graphs", thus providing a functional separation layer here.

In fact, dgraph uses the NQUAD format during ingestion and even allows setting the graph Label on the NQUAD used for mutations in the public gRPC API.

I can see a backwards-compatible API change where a new parameter is added to the public API to specify graphLabel at query time, falling back to the default graph.

Obviously true multi-tenancy would require more strict ACL coordination with this graphLabel, as well as the Types belonging to individual 'graphs' - but as a first step, the application layer on top of dgraph could provide that validation before a query is sent to dgraph.

Since the Q1 label on the roadmap is obviously not happening for this feature, can we get it updated to reflect the actual expected feature scheduling?

Thanks!

iluminae on 3 Jun 2020

👀3 👍1

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing