Efcore: Speed up insert operation (BulkSaveChanges)

Created on 18 Mar 2019  路  11Comments  路  Source: dotnet/efcore

Currently inserting thousands (1000, 10000, 100000) of records is really slow. I think there is room for improvement, so I am creating feature request for bulk SaveChanges.

Inspiration came form (EntityFramework Extensions).

In my case, the factor between SaveChanges and BulkSaveChanges is even greater then:
chrome_2019-03-18_12-29-32

I think this is somehow related with #9270 and #795, but not duplicated. (I might be mistaken.)

Use case:

  • import entire table from another source (excel, web ...)
  • custom reports - query some data from multiple tables, process results and write it to (temp) table
area-perf customer-reported type-enhancement

Most helpful comment

Hi,
maybe just one small thing. There are scenarios where you don't need to track inserted entities. Not even get back Id (identity). Maybe some options to SaveChanges to skip tracking this entities, and not receiving back Id would be legitimate use for imports ...

I will update thread when I get back to the real life scenario, where I experience larger factor.

All 11 comments

You can do that with linq2db extension right now if you can't wait EF Core implementation.

With this extension you will be able to insert any query in any table, even in temporary and then reuse it in linq query. And for reporting you will quickly find that you need more SQL extensions which are provided by linq2db out of the box.

@MaklaCof When you say, "In my case, the factor between SaveChanges and BulkSaveChanges is even greater then," can you provide some details? What have you tested and what were the results?

@MaklaCof Also, can you post the version of EF Core that you are using, and the database provider that you are using?

Of course. I will need a day or two. I am on the tight schedule until Wednesday.

Out of my head, version was 2.1 (currently I am on 2.2), and provider was SQL Server express 2016 (currently I use SQL Server Developer 2017).

Hi,

I made a demo in which ratio is not so much, but the case is simple and repeatable.
I think that these numbers already indicate that there is room for improvement in performance.

The difference I mentioned occurs when the table has 20-50 columns, and it has several relational data and not just one as in a demo.

RC: Records count
SC: Save changes
BSC: Bulk save changes
F: Factor betwen SaveChanges / BulkSaveChanges
DSC: Save changes with one detail table
DBSC: Bulk save changes with one detail table
DF: Factor betwen SaveChanges / BulkSaveChanges with one detail table

RC           SC     BSC      F     DSC    DBSC    DF
      10    0.1s    0.1s    0.5    0.1s   0.1s   0.4
     100    0.0s    0.1s    0.3    1.0s   0.4s   2.9
    1000    0.5s    0.2s    3.2   10.8s   2.6s   4.2
   10000    5.2s    1.3s    4.1  106.8s  22.7s   4.7
  100000   51.5s   11.5s    4.5      /      /      /
 1000000  514.5s  129.2s    4.0      /      /      /

As we can see, with 10000 rows, BulkSaveChanges is 4-5 times faster.

Database is SQL Server Developer 2017 in local network on HP server with two 1Gb network card.

Here is source code.

This seems to be comparing EF Core's regular SaveChanges() to a 3rd-party product, which, apart from providing bulk CUD operations (#795), may be implementing inserts in another way (e.g. SqlBulkCopy?). If that's the case then note similarity with https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/113, which would be about implementing SaveChanges() with PostgreSQL's optimized COPY protocol.

@MaklaCof Thanks for the additional information. A few comments:

  • These numbers are a lot closer than the numbers originally reported, which makes sense because those numbers look like they are for EF6 which does not do batch updates. EF Core on the other hand does do batching anyway.
  • It's not clear that the extensions are doing exactly the same thing as EF Core, especially with regard to things like propagating identity values back to the entity instances. It's also possible that the extensions are functionally equivalent with the default options. The point is that without further investigation it's hard to know if the comparison is really apples to apples.
  • Regardless, this doesn't mean that there isn't room for improvement in the EF Core batching implementation. Therefore, putting this on the backlog to look at perf improvements here.

Hi,
maybe just one small thing. There are scenarios where you don't need to track inserted entities. Not even get back Id (identity). Maybe some options to SaveChanges to skip tracking this entities, and not receiving back Id would be legitimate use for imports ...

I will update thread when I get back to the real life scenario, where I experience larger factor.

Are there any plans for implementing this? Inserting large numbers of rows in EFCore is painful.

@Mikilll94 Can you share the numbers you are seeing using the current EF Core batching implementation?

Was this page helpful?
0 / 5 - 0 ratings