Runtime: API for writing parameters without boxing

Created on 29 May 2016 · 31Comments · Source: dotnet/runtime

In the current ADO.NET API, writing a parameter to the database involves passing it through an object. This implies a boxing operation, which can create lots of garbage in a scenario where lots of value types (e.g. ints) are written to the database.

A generic subclass of DbParameter could solve this, if properly implemented by providers.

api-needs-work area-System.Data enhancement

Source

roji

👍10

Most helpful comment

OK, I've done this in Npgsql (https://github.com/npgsql/npgsql/issues/1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way.

General Benefits

Offers a strongly-typed, generic API (TypedValue) alongside the existing weakly-typed object-based API. Promotes type safety, is a more modern API, etc.
Avoids needless boxing when writing value types. Writing a parameter value to the database in Npgsql in now a zero-allocation operation (and in 3.3 ExecuteNonQuery as a whole will probably be zero allocation).

Adding to ADO.NET

This would consist of adding either a new DbParameter<T> abstract base class , inheriting from DbParameter, or an IDbParameter<T> (see discussion below).

In addition, DbProviderFactory would need to be fitted with a new GetParameter<T>() alongside the existing GetParameter() (a GetParameter<T>(PermissionState) may also be necessary). The default implementation of this method would return a shim wrapping the result of the provider's GetParameter(); this would allow providers not providing a real generic parameter implementation to continue working seamlessly.

Base classes vs. interfaces

As @divega mentioned above, ADO.NET APIs are based on base classes rather than interfaces. I worked in both directions for a while to explore what the API would look like, here are some points:

Via base class (`DbParameter<T>`)

Does not allow for easy code sharing between NpgsqlParameter<T> and NpgsqlParameter - (substantial) logic has to be either duplicated or refactored out somehow.
An internal interface must be introduced to capture the common API between NpgsqlParameter and NpgsqlParameter<T>; this is necessary to allow internal provider code to continue working. This adds considerable clutter to the codebase and is cumbersome.
If any user-facing APIs exist which accept/return an NpgsqlParameter, these also have to be changed (or counterparts added) to accept/return an interface capturing the two parameter types. This interface would need to be distinct from the internal one from the previous point, to avoid exposing internal functionality (so we end up with INpgsqlParameter and INpgsqlInternalParameter).

Via interface (`IDbParameter<T>`)

Allows the NpgsqlParameter<T> to inherit from NpgsqlParameter. Since both are parameter classes, there's likely to be a lot of shared code; if NpgsqlParameter<T> inherits NpgsqlParameter we only need to add the typed value property plus some minimal generic-specific handling.
In order for IDbParameter<T> to be useful, it must duplicate the API surface of DbParameter, otherwise the user can't manipulate things like Size, Precision.

Conclusions

For provider codebase maintenance and sanity, I'd really prefer it if NpgsqlParameter<T> could extend from NpgsqlParameter. However, the fact that IDbParameter<T> needs to duplicate the DbParameter API is problematic: it would make it impossible to add a method with a default implementation to DbParameter.

On the other hand, I hear that C# 8 will have default interface methods, so maybe it's not so bad :)

roji on 18 Jul 2017

👍5 🎉1

All 31 comments

Isn't the boxing overhead next to nothing compared to the fixed cost of making a SQL call? I cannot imagine this being an issue even for 1000 parameters.

GSPP on 2 Jun 2016

@GSPP I'm definitely not talking about the overhead of allocating the memory and copying the value - i.e. the cost of the boxing operation itself. The problem is that boxing allocates an object on the heap, producing potentially large amounts of garbage. This garbage creates pressure on the GC, which can be a problem for some applications. Basically it's a different kind of overhead compared to making an SQL call.

roji on 2 Jun 2016

@roji (or anyone) can you provide more details what is your plan here?

karelz on 11 Nov 2016

On the read side of things, there's DbDataReader.GetFieldValue<T>() allowing users to generically read values. This allows ADO.NET providers to provide an implementation that doesn't box value types - users can read ints without needless heap allocations.

Unfortunately nothing like this exists on the write side - DbParameter has a object Value property, so writing ints via ADO.NET necessarily implies boxing. This could be resolved by having a generic SqlParameter<T>, whose Value would be of type T. This class could extend could inherit the non-generic SqlParameter for backwards compatibility. It would probably be a good idea to have an IDbParameter<T> interface which would be implemented by the provider-specific generic parameter classes (SqlParameter<T>, NpgsqlParameter<T>).

It would also be necessary to add CreateParameter<T>() to DbProviderFactory to allow portable creation of these new parameters.

Let me know if this makes sense or if you'd like more info.

roji on 12 Nov 2016

👍2

@saurabh500 @YoungGah is it sufficient info for you? Or do you need more details?
If we have enough of direction and you agree with it, please remove the "needs more info" and add "up for grabs" label.

karelz on 12 Nov 2016

@saurabh500 @YoungGah thoughts?

danmosemsft on 27 Dec 2016

Can anybody take a look at this? It would be good to know if you guys see this somewhere on your roadmap etc.

roji on 2 Jul 2017

@saurabh500 @divega @corivera any opinion here? Can we at least set expectations / timeline when we will have time to look at it? Thanks!

karelz on 2 Jul 2017

@karelz @danmosemsft @saurabh500 @corivera I think we should remove the "needs-more-info" label and add "up-for-grabs". This sounds like a good idea to at least explore.

@roji it would be great if you could do some prototyping of this in Npgsql if you haven't already. I suspect it should be possible to do enough to asses the API and make some measurements of the impact without making the actual changes on System.Data.Common. If the change turns very positive results then we can take the next step.

I am not sure about the IDbParameter<T> interface. The extensibility model of ADO.NET has been consistently based on class inheritance since .NET Framework 3.0. The existing interfaces are there only for compatibility so adding new interfaces would be strange. Unless there is really good reason I would try to stick to classes.

cc @ajcvickers

divega on 9 Jul 2017

@divega sounds good to me - feel free to make such labels changes yourself, as area expert/owner :)

When you mark things "up for grabs", just please try to describe what is needed (next steps) & rough complexity / time investment - see triage rules for details. Thanks!

karelz on 9 Jul 2017

Marking as "up for grabs". In order to make progress on this issue we need to do some exploration to understand both the magnitude of the performance impact (e.g. how many allocations we can actually avoid and how that benefits performance) and how to best extend the API. See https://github.com/dotnet/corefx/issues/8955#issuecomment-260108275 and https://github.com/dotnet/corefx/issues/8955#issuecomment-313905218.

divega on 11 Jul 2017

FYI I'm working on implementing this within Npgsql, I'll be coming back with some info pretty soon.

roji on 15 Jul 2017

OK, I've done this in Npgsql (https://github.com/npgsql/npgsql/issues/1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way.