Efcore: TransactionScope with Multiple DbContext's

Created on 29 Jun 2018  路  4Comments  路  Source: dotnet/efcore

Title

Multiple DbContext's residing in the same TransactionScope are unaware of each other.

Functional impact

I am not sure if this behavior is expected. So please bear with me. I want to add/update several tables at once asynchronously. Since async calls on a DbContext is not permitted, I tried to do this with multiple DbContext objects all within same TransactionScope. However, changes in one DbContext can not be seen in others.

Minimal repro steps

Here is my context, along with some sample models.

```c#
public class Context : DbContext
{
public Context()
{

}

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseNpgsql("UserID=postgres;Password=root;Host=localhost;Port=5432;Database=transactiondemo;Pooling=true;");
}

public DbSet<Master> Masters { get; set; }
public DbSet<Detail> Details { get; set; }
public DbSet<OtherDetail> OtherDetails { get; set; }

}

public class Master
{
public long Id { get; set; }
public string Name { get; set; }
public DateTime DateTime { get; set; }
}

public class Detail
{
public long Id { get; set; }
public string Name { get; set; }
public DateTime DateTime { get; set; }

public long MasterId { get; set; }
public Master Master { get; set; }

}

public class OtherDetail
{
public long Id { get; set; }
public string Name { get; set; }
public DateTime DateTime { get; set; }

public long MasterId { get; set; }
public Master Master { get; set; }

}


This is my business code. Here I try to first insert a master record to get its Id (PK), then use that in adding multiple different table records asynchronously at the same time. 

```c#
public async Task<int> MultipleAsyncContextsInline(Master masterOld, List<Detail> detailsOld, List<OtherDetail> otherDetailsOld)
{
    var master = new Master { DateTime = masterOld.DateTime, Name = masterOld.Name };
    var details = detailsOld.Select(x => new Detail() { DateTime = x.DateTime, Name = x.Name }).ToList();
    var otherDetails = otherDetailsOld.Select(x => new OtherDetail() { DateTime = x.DateTime, Name = x.Name }).ToList();

    long id;
    Task<int> task1;
    Task<int> task2;
    using (var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
    {
        using (var context1 = new Context())
        {
            using (var context2 = new Context())
            {
                using (var context3 = new Context())
                {
                    context1.Masters.Add(master);
                    await context1.SaveChangesAsync();

                    details.ForEach(x => x.MasterId = master.Id);
                    context2.Details.AddRange(details);
                    task1 = context2.SaveChangesAsync();

                    otherDetails.ForEach(x=> x.MasterId = master.Id);
                    context3.OtherDetails.AddRange(otherDetails);
                    task2 = context3.SaveChangesAsync();

                    await Task.WhenAll(task1, task2);
                }
            }
        }
        scope.Complete();
    }
    return task1.Result + task2.Result;
}

Expected result

Although this approach seems absurd to me, I have some collegues who insist that this "small async adds/updates" must perform better than modifying a single context and calling SaveChanges once for all detail(and otherDetail) operations. So I expect this operation to save all records without an exception.

Actual result

I am getting an exception for detail records.

c# An error occurred while updating the entries. See the inner exception for details. 23503: insert or update on table "Details" violates foreign key constraint "FK_Details_Masters_MasterId" Key (MasterId)=(1426) is not present in table "Masters".

Also, I would be grateful if I can get some feedback on the general approach here. Whether small asynchronous concurrent database write operations are good performance-wise or not. Thanks in advance.

Further technical details

EF Core version: 2.1
Database Provider: Npgsql.EntityFrameworkCore.PostgreSQL
Operating system: Windows 10 Enterprise
IDE: Visual Studio 2017 15.7.3

closed-question customer-reported

Most helpful comment

The exception you're seeing is quite explicit: you're trying to insert related entities into the database via different connections, so you're violating foreign key constraints. Each context has a different database connection, and each connection functions independently (even if you wrap them in a single transaction), so this simply cannot work. In theory, you can use the different contexts to insert unrelated entities, but keep on reading.

Regarding the idea that this is more efficient, here's my take...

First, using several connections in parallel to send updates within the same transaction is particularly bad idea. When you put multiple connections inside the same transaction (by enlisting them in a TransactionScope), you're creating a distributed transaction - all connections must successfully commit or all must fail. There is a considerable complexity cost to this mechanism, and especially a performance cost that is surely going to negate any perf improvements gained by the parallelism. Avoid distributed transactions for all cases except where you really do need to update multiple databases in a single transaction.

Even without performing the updates in a transaction the approach is a bad idea in most cases. When you call SaveChanges() (or SaveChangesAsync()), EF Core batches all your updates in one network roundtrip. Unless your individual updates are really big (e.g. you're pushing large blobs to the database), you're unlikely to see any sort of performance gains from parallelizing your updates this way - but as always, do a simple benchmark to see the effects yourself. In addition, your update will now tie down multiple connections instead of just one, making those connections unavailable to other needs in your application.

Terminology note: you say that you "want to add/update several tables at once asynchronously", but what you're actually asking is about updating several tables concurrently. Asynchronously only refers to your thread not blocking while the operation is in progress, and can (and should) be achieved simply be awaiting on SaveChangesAsync().

All 4 comments

The exception you're seeing is quite explicit: you're trying to insert related entities into the database via different connections, so you're violating foreign key constraints. Each context has a different database connection, and each connection functions independently (even if you wrap them in a single transaction), so this simply cannot work. In theory, you can use the different contexts to insert unrelated entities, but keep on reading.

Regarding the idea that this is more efficient, here's my take...

First, using several connections in parallel to send updates within the same transaction is particularly bad idea. When you put multiple connections inside the same transaction (by enlisting them in a TransactionScope), you're creating a distributed transaction - all connections must successfully commit or all must fail. There is a considerable complexity cost to this mechanism, and especially a performance cost that is surely going to negate any perf improvements gained by the parallelism. Avoid distributed transactions for all cases except where you really do need to update multiple databases in a single transaction.

Even without performing the updates in a transaction the approach is a bad idea in most cases. When you call SaveChanges() (or SaveChangesAsync()), EF Core batches all your updates in one network roundtrip. Unless your individual updates are really big (e.g. you're pushing large blobs to the database), you're unlikely to see any sort of performance gains from parallelizing your updates this way - but as always, do a simple benchmark to see the effects yourself. In addition, your update will now tie down multiple connections instead of just one, making those connections unavailable to other needs in your application.

Terminology note: you say that you "want to add/update several tables at once asynchronously", but what you're actually asking is about updating several tables concurrently. Asynchronously only refers to your thread not blocking while the operation is in progress, and can (and should) be achieved simply be awaiting on SaveChangesAsync().

One last note... If you're looking to insert large amount of data into your database and are really looking to optimize performance, take a look at bulk insert APIs (COPY in the PostgreSQL case). These have the disadvantage of operating completely outside EF Core, but they are extremely fast.

Thank you very much for your quick reply and detailed explanations.

Thanks @roji!

Was this page helpful?
0 / 5 - 0 ratings