Efcore: Dynamically Connect to Multiple Databases

Created on 15 Jul 2016  路  21Comments  路  Source: dotnet/efcore

From @GSPP' comment in #5626:

In the age of cloud it becomes more common to shard across databases based on tenant, geography and scaling needs. EF should support querying across databases better.

It's not enough to change the connection string because one query might need to pull from multiple databases. Example: Join customer-specific data with master data.

I think it's really required that EF allows us to specify the database name _per table reference_. A strawman:

from o in db.Orders.WithDatabase("NewYork-Tenant1234")
join c in db.Customers.WithDatabase("NewYork-master") on ...
select new { o, c }

This is supposed to be a scenario where the database name depends on a geographical sharding as well as on the tenant name. Both are dynamic and can change for every query. Connection strings cannot do this. Statically configured database names cannot do that either.

This query would translate to:

 select *
 from [NewYork-Tenant1234].dbo.Orders o
 join [NewYork-master].dbo.Customers c on ...

The same configuration mechanism would lend itself well to configuring hints such as NOLOCK or the common UPDLOCK, ROWLOCK, HOLDLOCK combination. This is unrelated to this issue but I'm mentioning this to justify with the WithDatabase syntax makes sense.

A simple extension to this would be to allow completely dynamic table names. WithTableName(database, schema, name) would map to database.schema.name on the SQL side. This is useful for example when sharding by date with one table per day. SQL Server Standard Edition does not have partitioning.

What is the team's stance on supporting such scenarios?

type-enhancement

Most helpful comment

Just like to vote up this feature; we are looking at providing a multi-tenant restful API which switches database context / schema based on URI / JWT user claims; it would be great in to have some way of resolving / dynamically creating a context / connection based on some predicate filter, storing / pooling the resulting instance so it can be reused.

All 21 comments

More notes from @GSPP

A few more points:

  1. Since my issue has been merged into this (I agree with that) I want to stress this point: The table location (db, schema, name) must be changeable _per table reference_. Not per query or per application run. Not even per query is enough because I might want to union over all existing tenants (e.g. db.Orders.WithDatabase("Tenant1").Concat(db.Orders.WithDatabase("Tenant2")).Concat(db.Orders.WithDatabase("Tenant3"))...; this expression would, of course, be built through a loop and not hard-coded like that).
  2. Navigation properties should be supported across databases. I am using this pattern right now with a large L2S application. The use case is that one database is small and on different storage. The other one is for "bulk data" (all kinds of logs) and is on different storage. It is super convenient to hide this split from the application. Of course there can't be any FKs spanning the databases.
  3. Navigation properties should be configurable as well. If not we'd be forced to use explicit joins a lot. Strawman syntax: from o in db.Orders.WithDatabase("Tenant") select WithDatabase(o.Customer, "Master"). Semantically, a WithDatabase wrapping call would be interpreted as returning the first argument unmodified but with location meta-data attached. This function call would only be valid inside of an expression tree. The same wrapping could be done for sequence navigation properties. If extension syntax is wanted this requires an extension method on System.Object alas. This smells, I admit. So it could be a static method QueryHelpers.WithDatabase.
  4. There's still a need to configure location statically. This is useful in the scenario described under (2). Sometimes, there's a fixed number of databases and each table resides in exactly one database. This should be conveniently supported.
  5. I think this is a relational concept that not every provider can support. It can be a runtime error to use this with a provider not liking this.

Since there are so many different use cases there should be a hierarchy of location configuation:

  1. An explicit WithDatabase inside of the query takes precedence over everything.
  2. There should be a per-query default that places every table in a specific database and schema (e.g. myQuery.WithDefaultTableLocation(db, schema).ToList()).
  3. There should be a per-context default.
  4. Per-table static configuration.
  5. Connection string.

For each of these levels I can present you a real-world use case that I have had... If that hierarchy is properly document I don't think it's hard to understand or hard to work with.

Is there not an issue tracking table hints such as NOLOCK? There should be. It's important. Surely, the team is tracking it internally.

I tackled this a different way based on my requirements around building modular applications (not specifically multi-tenant).

My approach was to define what I call logical groupings, e.g. "Core", "Commerce", "Media", etc.

Several modules can belong to the same logical group, i.e. my Security module (Users, Roles, etc.) belongs in Core, as does my Settings module.

I defined an attribute which can be used at the context level:

[LogicalGrouping("Core")]
public class SecurityDbContext : DbContext
{

}

This LogicalGrouping attribute is used when determining which connection string to use. The LogicalGrouping is also used through my entity type configurations, so I can explicitly say that an entity belongs to a specific database. This enables me to support cross-database navigation properties. I.e., I could define a User object in my Core logical group, but then include it in a different context:

[LogicalGrouping("Commerce")]
public class OrdersDbContext : DbContext
{
  public DbSet<User> Users { get; set; }
}

This is handled through overriding the SqlServerQuerySqlGenerator::VisitTable method:

/// <inheritdoc />
public override Expression VisitTable(TableExpression tableExpression)
{
    // MA - We need to resolve the database name.
    string database = _databaseLookupCache.GetOrAdd($"{tableExpression.Schema ?? "dbo"}.{tableExpression.Table}", k => GetDatabaseName(tableExpression.Schema ?? "dbo", tableExpression.Table));

    if (!string.IsNullOrWhiteSpace(database))
    {
        Sql.Append(SqlGenerator.DelimitIdentifier(database))
            .Append(".");
    }

    if (tableExpression.Schema != null)
    {
        Sql.Append(SqlGenerator.DelimitIdentifier(tableExpression.Schema))
            .Append(".");
    }

    Sql.Append(SqlGenerator.DelimitIdentifier(tableExpression.Table))
        .Append(" AS ")
        .Append(SqlGenerator.DelimitIdentifier(tableExpression.Alias));

    return tableExpression;
}

/// <summary>
/// Resolves the database name through the entity type builder for the given schema/table using connection string data.
/// </summary>
/// <param name="schema">The schema name.</param>
/// <param name="table">The table name.</param>
/// <returns>The database name.</returns>
private string GetDatabaseName(string schema, string table)
{
    var modelBuilder = _modelBuilderCache.GetEntityTypeBuilder(schema, table);
    if (modelBuilder == null)
    {
        return null;
    }

    // MA - The logical group can be used to identify connection string info for the target database.
    string logicalGroup = modelBuilder.LogicalGroup;

    // MA - Discover the connection string and return the database.
    return _connectionStringProvider.GetForLogicalGroup(logicalGroup)?.Database;
}

I know this is quite a specialised implementation, but I'd be interested to know what plumbing is changed to support the above request, should it affect my current implementation.

@rowanmiller my team are hard to look for this feature, yes, we need this feature too.
Our scenarios is:

  1. one table per month;
  2. one database per year;

Just like to vote up this feature; we are looking at providing a multi-tenant restful API which switches database context / schema based on URI / JWT user claims; it would be great in to have some way of resolving / dynamically creating a context / connection based on some predicate filter, storing / pooling the resulting instance so it can be reused.

@rowanmiller Any news on this topic?

Edit: I just found this link, and it seems this is the solution for the problem: http://benfoster.io/blog/aspnet-core-multi-tenancy-data-isolation-with-entity-framework

I also like to vote for this feature (or at least better documentation on the subject).

I want to have one database per customer (to make perfectly sure no data can be crossed between customers)

I can load a specific connection string based on some information coming from a JWT token (basically what Ben Foster proposed), I'm going to make a POC about it but I'm confident it'll work. I can also use what's explained in #9616. That means having one DbContext that can be different every time.

Now with EF Core 2.0, we have DbContext pooling ... which is a very good news for performance and all but I have absolutely no idea of how it'll work when every Context created can be totally different from each other.

On the other hand I can create 30~40 different DBContext (DBContextCustomerA, DBContextCustomerB, ...) and having a Factory choosing between everyone of them ...... But it's against every principle I have :(

I guess there should be a proper documentation of what is feasible with EF Core and what's the recommanded way of handling many usecase.

Did the POC for what @seblucas suggested get created? Specifically the "load a specific connection string based on some information coming from a JWT token" part? Would be interested in how to do this.

What's up with this ? This feature is important to us , we would use this to group some client together instead of using a 1 bucket database.. Our client need to be grouping
while maintaining their data integrity

@pilouk This issue is in the Backlog milestone. This means that it is not going to happen for the 3.0 release. We will re-assess the backlog following the 3.0 release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

For us a way to specifyconstruct a different sql connection string would be enough.
Because of scaling issues we need to split our customers over separate databases AND physical database machines

Preferably WITH db context pooling support of course

Hi,
I am trying to implement Multi-Tenant -per Tenant separate Schema (Database single Instance).
https://gunnarpeipman.com/aspnet-core-database-per-tenant/ Taking help from this tutorial. Now I have one Situation That the Owner of the Application wants to have a dashboard which will show data from all the Tenant - Is this possible to connect with different DBContext and get data, the number of Tenant is not fixed it can increase at run time.
Apart from that -
Is this possible to create database at runtime - when the send metadata is added in DB - That time if we can have a button to initiate the the process.
Need some suggestion- which way should I go.

thanks

@pilouk My orga has this as a selling point: Database per customer

@pilouk My orga has this as a selling point: Database per customer

I understand, there are many benefits to database per customer.
However, cross query on multiple context /database is not really performant.
Maybe it's possible to use the elastic search on top ?

In my case its easy because the customers are not connected and every customer has its own instance of the .net app. I don't know about @santosh2812

In my case its easy because the customers are not connected and every customer has its own instance of the .net app. I don't know about @santosh2812

So you don't need to cross query on all context right?

No we decided against this idea because of the problems named in the issue. But it would be great if we do not have to run the exact same app 80 times.

@ajcvickers Possible to provide an update or workaround for supporting a query that accesses 2 databases?

@mrlife I'm not aware of any workarounds that would allow a single EF context to access multiple databases in the same query.

It's possible to have a single LINQ query that performs a client-side set operation (Concat/Union) or some sort of join over data coming from two different databases (and therefore from two different DbContext instances). Not sure how helpful that is though.

@mrlife I'm not aware of any workarounds that would allow a single EF context to access multiple databases in the same query.

@ajcvickers Thanks for letting me know. For implementing in EF Core source code, it must be more complicated than adding the name of the database onto beginning of a table reference, e.g. [db-name].[schema].[table-name]?

@roji Thanks, that is worth mentioning. The performance would be a consideration, but it could work for some situations.

Was this page helpful?
0 / 5 - 0 ratings