Efcore: Support for multiple databases in the same model

Created on 24 Mar 2016  路  20Comments  路  Source: dotnet/efcore

Issue based on #4019 and PR #4672
It nice to have a feature to access multiple database from same model context.
This feature pretty massive. We need to modify following areas:
1) Add database property to relational type annotations and follow this property from explicit mapping to default query sql generator (to create proper table alias including database name)
2) Enhance data model with new field for catalogue name
3) Implement sql generator and model validator for code first migrations from model with multiple databases (it means we should modify all generators and migration operators)
4) Implement additional annotations to support attribute database generation, not only explicit

There are still some open design questions that needs to be resolved:
1) There is no database/catalog propery in TableAttribute

So we could not implement data annotation mapping to different database. There is a possible solution by introducing additional attribute for this need (like DatabaseAttribute) with single proprte Name. In this case usage could be fairly simple

[Database("Pink")]
[Table("Unicorns", Schema="Fluffy")]
class DifferentDatabaseItem{ }

Also there is should be modelBuilder interface to setup database catalogue manually. Something like this:

 modelBuilder.Entity<MappingQueryTestBase.DifferentDatabaseItem>(e =>
            {
                e.Metadata.Relational().TableName = "Unicorns";
                e.Metadata.Relational().Schema = "Fluffy";
                e.Metadata.Relational().Database = "Pink";
            });

For such models we should generate additional catalogue prefix for FROM and JOIN statements.

2) There is issue with navigation property building. We could easily generate proper join statement cross catalogues, but there is no proper way to add foreign key for such properties.
There is couple possible solutions:

  • Forbid adding navigation properties between databases. This aproach is not really healthy. If we implement this - there is no particullar reason to keep multiple catalogues in single model context.
  • Allow adding navigation properties, but enhance migration to prevent generating foreighn key operations for this properties. In this case we could work with our entities without any penalties, but from data storage perspective we will not have any "real" foreighn keys between them.

3) There is open question regards to should this be a relational concept, or specific to SQL Server. As far as i know this behaivior supported in MySql and Oracle (if placed on same server). Not sure for postgre SQL and sql lite.

closed-duplicate type-investigation

Most helpful comment

@rowanmiller this ticket goes far beyond the other ticket. The other ticket is about statically specifying a database name. This ticket includes dynamic options. If you want to deduplicate I suggest closing the other way around.

All 20 comments

@anpete will help with guidance on this one

From @GSPP' comment in #5626:

In the age of cloud it becomes more common to shard across databases based on tenant, geography and scaling needs. EF should support querying across databases better.

It's not enough to change the connection string because one query might need to pull from multiple databases. Example: Join customer-specific data with master data.

I think it's really required that EF allows us to specify the database name _per table reference_. A strawman:

from o in db.Orders.WithDatabase("NewYork-Tenant1234")
join c in db.Customers.WithDatabase("NewYork-master") on ...
select new { o, c }

This is supposed to be a scenario where the database name depends on a geographical sharding as well as on the tenant name. Both are dynamic and can change for every query. Connection strings cannot do this. Statically configured database names cannot do that either.

This query would translate to:

 select *
 from [NewYork-Tenant1234].dbo.Orders o
 join [NewYork-master].dbo.Customers c on ...

The same configuration mechanism would lend itself well to configuring hints such as NOLOCK or the common UPDLOCK, ROWLOCK, HOLDLOCK combination. This is unrelated to this issue but I'm mentioning this to justify with the WithDatabase syntax makes sense.

A simple extension to this would be to allow completely dynamic table names. WithTableName(database, schema, name) would map to database.schema.name on the SQL side. This is useful for example when sharding by date with one table per day. SQL Server Standard Edition does not have partitioning.

What is the team's stance on supporting such scenarios?

This is very usefull. Small question about linq API. Will it looks like this?

var result = db.Order
               .WithDatabase("NewYork-Tenant1234")
               .Join(t => t.Customers.WithDatabase("NewYork-Tenant1234"))
               .Select(o => new { o.OrderLiner, o.Customer }

If so this information probably should be added as extension to DbSet. All these logic related only for relation concept. So we need to reorgonize current DbSet implementation to DbSet and RelationalDbSet (which extend base DbSet with relational concepts).

For the configuration hint (nolock etc) it probably should be separated in different issue.
And since this really hard coupling with MS SQL should be implemented as part of Microsoft.EntityFrameworkCore.SqlServer, so it require new implementation of RelationalDbSet like SqlServerDbSet which could contain any tweek you like (and maybe some api for extension).

Any thoughts on this?

A few more points:

  1. Since my issue has been merged into this (I agree with that) I want to stress this point: The table location (db, schema, name) must be changeable _per table reference_. Not per query or per application run. Not even per query is enough because I might want to union over all existing tenants (e.g. db.Orders.WithDatabase("Tenant1").Concat(db.Orders.WithDatabase("Tenant2")).Concat(db.Orders.WithDatabase("Tenant3"))...; this expression would, of course, be built through a loop and not hard-coded like that).
  2. Navigation properties should be supported across databases. I am using this pattern right now with a large L2S application. The use case is that one database is small and on different storage. The other one is for "bulk data" (all kinds of logs) and is on different storage. It is super convenient to hide this split from the application. Of course there can't be any FKs spanning the databases.
  3. Navigation properties should be configurable as well. If not we'd be forced to use explicit joins a lot. Strawman syntax: from o in db.Orders.WithDatabase("Tenant") select WithDatabase(o.Customer, "Master"). Semantically, a WithDatabase wrapping call would be interpreted as returning the first argument unmodified but with location meta-data attached. This function call would only be valid inside of an expression tree. The same wrapping could be done for sequence navigation properties. If extension syntax is wanted this requires an extension method on System.Object alas. This smells, I admit. So it could be a static method QueryHelpers.WithDatabase.
  4. There's still a need to configure location statically. This is useful in the scenario described under (2). Sometimes, there's a fixed number of databases and each table resides in exactly one database. This should be conveniently supported.
  5. I think this is a relational concept that not every provider can support. It can be a runtime error to use this with a provider not liking this.

Since there are so many different use cases there should be a hierarchy of location configuation:

  1. An explicit WithDatabase inside of the query takes precedence over everything.
  2. There should be a per-query default that places every table in a specific database and schema (e.g. myQuery.WithDefaultTableLocation(db, schema).ToList()).
  3. There should be a per-context default.
  4. Per-table static configuration.
  5. Connection string.

For each of these levels I can present you a real-world use case that I have had... If that hierarchy is properly document I don't think it's hard to understand or hard to work with.

Is there not an issue tracking table hints such as NOLOCK? There should be. It's important. Surely, the team is tracking it internally.

@GSPP the use case is pretty clear here.
But question about API still open. Static QueryHelper can solve issue for sure.
Migrations also pretty intresting question. We should determine where table locate and could we define a REAL foreighn key or just virtual one.

I'm not sure if it falls into this category (if not, I'll open a new issue - let me know), but the ability for a single database to have a different table per tenant for the same entity would be good, too. Using the example above, it'd be like

select *
from [Orders-Tenant1234] o
join [Products] p on ...

Master/shared table for some things, tenant-specific table for others.

You mean something like have different order table per tenant?

@laskoviymishka Exactly. Rather than partition by _database_ as noted in the earlier example, partition by _table_ in a single database. (I'd like EF to be able to support that. I have a situation like this right now.)

As far as I know there is limitation for tables count in azure. It may be not so scalable as it looks :)

Maybe I'm not hosting in Azure. Or using SQL Server. Point being, the ability to have dynamic table names for an entity seems just as interesting to me as having different databases. If this isn't the right place to note that, like I said, I'm happy to start a different issue. It just seemed like _similar_ concerns - the ability to dynamically change where an entity comes from.

It may have sense

@tillig this is absolutely part of this issue (at least the part that's by me and was merged into this one).

@rowanmiller this ticket goes far beyond the other ticket. The other ticket is about statically specifying a database name. This ticket includes dynamic options. If you want to deduplicate I suggest closing the other way around.

@GSPP fair point. The initial description does line up with #4019 but I agree the discussion in the comments talks about a lot more. Remarking for triage so that we can discuss this scenario.

Good call! I think this is important.

Given that this issue ended up being about a couple of features, we decided to track them separately:

  • Support specifying catalog for table mapping #4019
  • Dynamically Connect to Multiple Databases #6089

Neither of these are things we are planning to tackle right away, so they are both on the backlog for the moment.

@rowanmiller

Given that this issue ended up being about a couple of features, we decided to track them separately:

Support specifying catalog for table mapping #4019
Dynamically Connect to Multiple Databases #6089

Any update. How can one use one entity model to connect multiple database provider like mssql, mysql,oracle ,mongodb etc . Found this but not helpful https://www.codeproject.com/Articles/82017/Preparing-an-Entity-Framework-model-for-multi-prov https://www.codeproject.com/Tips/1062375/Entity-Framework-Multi-DB-Support
Please can you help on this.

@narendrasinghrathore This issue is about accessing multiple databases of the same kind--for example, multiple SQL Server databases. It's not about different types of databases. You seem to be wanting to access different types of database using the same model. There are several dimensions to this question, so it's hard to give an answer here. I would suggest either opening a new issue explaining clearly what it is you need to do, or perhaps doing the same thing but in a Stack Overflow question--you'll get more knowledgeable eyes on it on Stack Overflow.

Finally, keep in mind this GitHub repo is for _EF Core_. The articles you referenced are for earlier versions of EF, for which the best place to log issues is https://github.com/aspnet/EntityFramework6

@ajcvickers Thanks, https://github.com/aspnet/EntityFramework6/issues/269

Was this page helpful?
0 / 5 - 0 ratings