Efcore: Update: Relational: Potential optimization in MERGE SQL batching path?

Created on 9 May 2017  路  16Comments  路  Source: dotnet/efcore

(Based on feedback from https://www.brentozar.com/archive/2017/05/case-entity-framework-cores-odd-sql)

Right now when using the MERGE batch update path, we perform a final ORDER BY on the server to retrieve the store gen'd values in the correct batch order. However, it could be that the values are already in the correct order, so we could perform the ordering on the client, lazily, by detecting out of order rows when they occur. NB. This would sometimes require client buffering of the reader.

area-perf closed-fixed type-enhancement

Most helpful comment

EDIT: Looks like @anpete beat me to the punch. The below confirms his findings.

My reading of #4080 is that non-deterministic output from the multi-row INSERT caused the client to incorrectly reconcile inserted identities with their respective entities. Is that correct?

If so, I think we'd be fine here, because the above construct still allows the explicit _Position column for deterministic ordering. Indeed, if client-side sorting were possible (as mentioned in the OP), then both variables could be eliminated, leaving only a single MERGE statement.

All 16 comments

cc @divega

Another comment from https://www.brentozar.com/archive/2017/05/case-entity-framework-cores-odd-sql/#comment-2398183:

Zac Torkelson May 9, 2017 4:42 pm
Were Table Value Constructors (https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql) considered instead of the temporary toInsert table variable? I have had good results with this technique in the past and would be interested in your benchmarking results.

Something like (apologies for the formatting and any syntax errors as I am presently mobile):

  MERGE [Weapons] USING (VALUES
  (@p0, @p1, 0),
  (@p2, @p3, 1),
  (@p4, @p5, 2)
  ) AS i ([IsPointy], [Name], [_Position]) ON 1=0
  WHEN NOT MATCHED THEN
  INSERT ([IsPointy], [Name])
  VALUES (i.[IsPointy], i.[Name])
  OUTPUT INSERTED.[WeaponId], i._Position
  INTO @inserted0;

However this looks very similar to multi-row INSERT syntax, which turned out not to have deterministic order (https://github.com/aspnet/EntityFramework/issues/4080).

@divega The Table Value Constructor pattern still allows for an explicit order sentinel and so it doesn't matter whether the order is deterministic. It looks to be a slightly cleaner syntax by omitting one TV.

EDIT: Looks like @anpete beat me to the punch. The below confirms his findings.

My reading of #4080 is that non-deterministic output from the multi-row INSERT caused the client to incorrectly reconcile inserted identities with their respective entities. Is that correct?

If so, I think we'd be fine here, because the above construct still allows the explicit _Position column for deterministic ordering. Indeed, if client-side sorting were possible (as mentioned in the OP), then both variables could be eliminated, leaving only a single MERGE statement.

I think we'd be fine here, because the above construct still allows the explicit _Position column for deterministic ordering

@anpete @zac-torkelson you are right. BTW, thanks for the feedback!

@AndriySvyryd Thoughts on this for 2.0? Should it be punted for now? /cc @divega

I think it can go in. These aren't breaking changes.

I removed the ORDER BY and ran a microbenchmark inserting 100x10000 rows with identity and saw no measurable difference so this doesn't seem to be worth doing.

@AndriySvyryd Nice! Can you share the query plans for both?

The query plan shows that ordering takes up a big part of the execution time, but the actual duration is the same. I don't have enough expertise to interpret this.
noorder
order

Looks like the merge takes so long that the subsequent select duration is insignificant
merge

Interesting, thanks for looking at this. Great to understand what's going on here.

@anpete @AndriySvyryd should we close this issue as fixed then?

I wonder whether we should still track offloading ORDER BY from the database server. Perhaps some benefit would be observable under load, but we don't have time to test it now.

@divega Agree, it's hard for me to imagine that there is no scenario where this doesn't show an improvement. There is also some benefit in just having simpler SQL.

We should close this and track further optimization in separate issues when we find scenarios that need improvement.
While the SQL would be simpler client code would be much more complicated and could degrade overall performance in some scenarios.

Ok, created https://github.com/aspnet/EntityFramework/issues/8849 to track ORDER BY removal. Closing as fixed in preview2.

Was this page helpful?
0 / 5 - 0 ratings