Runtime: API for writing parameters without boxing

Created on 29 May 2016  路  31Comments  路  Source: dotnet/runtime

In the current ADO.NET API, writing a parameter to the database involves passing it through an object. This implies a boxing operation, which can create lots of garbage in a scenario where lots of value types (e.g. ints) are written to the database.

A generic subclass of DbParameter could solve this, if properly implemented by providers.

api-needs-work area-System.Data enhancement

Most helpful comment

OK, I've done this in Npgsql (https://github.com/npgsql/npgsql/issues/1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way.

General Benefits

  • Offers a strongly-typed, generic API (TypedValue) alongside the existing weakly-typed object-based API. Promotes type safety, is a more modern API, etc.
  • Avoids needless boxing when writing value types. Writing a parameter value to the database in Npgsql in now a zero-allocation operation (and in 3.3 ExecuteNonQuery as a whole will probably be zero allocation).

Adding to ADO.NET

This would consist of adding either a new DbParameter<T> abstract base class , inheriting from DbParameter, or an IDbParameter<T> (see discussion below).

In addition, DbProviderFactory would need to be fitted with a new GetParameter<T>() alongside the existing GetParameter() (a GetParameter<T>(PermissionState) may also be necessary). The default implementation of this method would return a shim wrapping the result of the provider's GetParameter(); this would allow providers not providing a real generic parameter implementation to continue working seamlessly.

Base classes vs. interfaces

As @divega mentioned above, ADO.NET APIs are based on base classes rather than interfaces. I worked in both directions for a while to explore what the API would look like, here are some points:

Via base class (DbParameter<T>)

  • Does not allow for easy code sharing between NpgsqlParameter<T> and NpgsqlParameter - (substantial) logic has to be either duplicated or refactored out somehow.
  • An internal interface must be introduced to capture the common API between NpgsqlParameter and NpgsqlParameter<T>; this is necessary to allow internal provider code to continue working. This adds considerable clutter to the codebase and is cumbersome.
  • If any user-facing APIs exist which accept/return an NpgsqlParameter, these also have to be changed (or counterparts added) to accept/return an interface capturing the two parameter types. This interface would need to be distinct from the internal one from the previous point, to avoid exposing internal functionality (so we end up with INpgsqlParameter and INpgsqlInternalParameter).

Via interface (IDbParameter<T>)

  • Allows the NpgsqlParameter<T> to inherit from NpgsqlParameter. Since both are parameter classes, there's likely to be a lot of shared code; if NpgsqlParameter<T> inherits NpgsqlParameter we only need to add the typed value property plus some minimal generic-specific handling.
  • In order for IDbParameter<T> to be useful, it must duplicate the API surface of DbParameter, otherwise the user can't manipulate things like Size, Precision.

Conclusions

For provider codebase maintenance and sanity, I'd really prefer it if NpgsqlParameter<T> could extend from NpgsqlParameter. However, the fact that IDbParameter<T> needs to duplicate the DbParameter API is problematic: it would make it impossible to add a method with a default implementation to DbParameter.

On the other hand, I hear that C# 8 will have default interface methods, so maybe it's not so bad :)

All 31 comments

Isn't the boxing overhead next to nothing compared to the fixed cost of making a SQL call? I cannot imagine this being an issue even for 1000 parameters.

@GSPP I'm definitely not talking about the overhead of allocating the memory and copying the value - i.e. the cost of the boxing operation itself. The problem is that boxing allocates an object on the heap, producing potentially large amounts of garbage. This garbage creates pressure on the GC, which can be a problem for some applications. Basically it's a different kind of overhead compared to making an SQL call.

@roji (or anyone) can you provide more details what is your plan here?

On the read side of things, there's DbDataReader.GetFieldValue<T>() allowing users to generically read values. This allows ADO.NET providers to provide an implementation that doesn't box value types - users can read ints without needless heap allocations.

Unfortunately nothing like this exists on the write side - DbParameter has a object Value property, so writing ints via ADO.NET necessarily implies boxing. This could be resolved by having a generic SqlParameter<T>, whose Value would be of type T. This class could extend could inherit the non-generic SqlParameter for backwards compatibility. It would probably be a good idea to have an IDbParameter<T> interface which would be implemented by the provider-specific generic parameter classes (SqlParameter<T>, NpgsqlParameter<T>).

It would also be necessary to add CreateParameter<T>() to DbProviderFactory to allow portable creation of these new parameters.

Let me know if this makes sense or if you'd like more info.

@saurabh500 @YoungGah is it sufficient info for you? Or do you need more details?
If we have enough of direction and you agree with it, please remove the "needs more info" and add "up for grabs" label.

@saurabh500 @YoungGah thoughts?

Can anybody take a look at this? It would be good to know if you guys see this somewhere on your roadmap etc.

@saurabh500 @divega @corivera any opinion here? Can we at least set expectations / timeline when we will have time to look at it? Thanks!

@karelz @danmosemsft @saurabh500 @corivera I think we should remove the "needs-more-info" label and add "up-for-grabs". This sounds like a good idea to at least explore.

@roji it would be great if you could do some prototyping of this in Npgsql if you haven't already. I suspect it should be possible to do enough to asses the API and make some measurements of the impact without making the actual changes on System.Data.Common. If the change turns very positive results then we can take the next step.

I am not sure about the IDbParameter<T> interface. The extensibility model of ADO.NET has been consistently based on class inheritance since .NET Framework 3.0. The existing interfaces are there only for compatibility so adding new interfaces would be strange. Unless there is really good reason I would try to stick to classes.

cc @ajcvickers

@divega sounds good to me - feel free to make such labels changes yourself, as area expert/owner :)

When you mark things "up for grabs", just please try to describe what is needed (next steps) & rough complexity / time investment - see triage rules for details. Thanks!

Marking as "up for grabs". In order to make progress on this issue we need to do some exploration to understand both the magnitude of the performance impact (e.g. how many allocations we can actually avoid and how that benefits performance) and how to best extend the API. See https://github.com/dotnet/corefx/issues/8955#issuecomment-260108275 and https://github.com/dotnet/corefx/issues/8955#issuecomment-313905218.

FYI I'm working on implementing this within Npgsql, I'll be coming back with some info pretty soon.

OK, I've done this in Npgsql (https://github.com/npgsql/npgsql/issues/1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way.

General Benefits

  • Offers a strongly-typed, generic API (TypedValue) alongside the existing weakly-typed object-based API. Promotes type safety, is a more modern API, etc.
  • Avoids needless boxing when writing value types. Writing a parameter value to the database in Npgsql in now a zero-allocation operation (and in 3.3 ExecuteNonQuery as a whole will probably be zero allocation).

Adding to ADO.NET

This would consist of adding either a new DbParameter<T> abstract base class , inheriting from DbParameter, or an IDbParameter<T> (see discussion below).

In addition, DbProviderFactory would need to be fitted with a new GetParameter<T>() alongside the existing GetParameter() (a GetParameter<T>(PermissionState) may also be necessary). The default implementation of this method would return a shim wrapping the result of the provider's GetParameter(); this would allow providers not providing a real generic parameter implementation to continue working seamlessly.

Base classes vs. interfaces

As @divega mentioned above, ADO.NET APIs are based on base classes rather than interfaces. I worked in both directions for a while to explore what the API would look like, here are some points:

Via base class (DbParameter<T>)

  • Does not allow for easy code sharing between NpgsqlParameter<T> and NpgsqlParameter - (substantial) logic has to be either duplicated or refactored out somehow.
  • An internal interface must be introduced to capture the common API between NpgsqlParameter and NpgsqlParameter<T>; this is necessary to allow internal provider code to continue working. This adds considerable clutter to the codebase and is cumbersome.
  • If any user-facing APIs exist which accept/return an NpgsqlParameter, these also have to be changed (or counterparts added) to accept/return an interface capturing the two parameter types. This interface would need to be distinct from the internal one from the previous point, to avoid exposing internal functionality (so we end up with INpgsqlParameter and INpgsqlInternalParameter).

Via interface (IDbParameter<T>)

  • Allows the NpgsqlParameter<T> to inherit from NpgsqlParameter. Since both are parameter classes, there's likely to be a lot of shared code; if NpgsqlParameter<T> inherits NpgsqlParameter we only need to add the typed value property plus some minimal generic-specific handling.
  • In order for IDbParameter<T> to be useful, it must duplicate the API surface of DbParameter, otherwise the user can't manipulate things like Size, Precision.

Conclusions

For provider codebase maintenance and sanity, I'd really prefer it if NpgsqlParameter<T> could extend from NpgsqlParameter. However, the fact that IDbParameter<T> needs to duplicate the DbParameter API is problematic: it would make it impossible to add a method with a default implementation to DbParameter.

On the other hand, I hear that C# 8 will have default interface methods, so maybe it's not so bad :)

@roji Good stuff! /cc @anpete

I saw this mentioned in the issue review earlier and thought it might be useful to provide some feedback since it isn't dead.

I did some investigation on doing this with SqlClient because I wanted to try and remove the parameter box. It's messy and I'm not sure how you would achieve it without having the parameter instance write the data into the tds buffer..

At the moment the parameter is asked for it's value object (including any coercion) and that object is then written by the TdsParserStateObject which understands the layout of all the relevant types. Doing this without the object variable means you've either got to have the correct storage location generated at runtime by non-generic code or you delegate the writing of bytes to the generic object which can avoid the variable. Asking the parameter to do it exposes the internals of the tds layer to the user layer or will force multiple allocation and copying between buffers, neither of which is a great idea.

It's worth doing but it seems difficult to implement practically at the moment. I also think it will flow new members into System.Data which may cause compatibility problems without care.

@Wraith2 you're right that this isn't necessarily a trivial thing to do inside an ADO.NET provider, and can mean serious refactoring to actually avoid refactoring. The main idea here is to at least allow providers to do this API-wise - if they do it or not is a different question. The default implementation for this new generic API would in any case delegate to the existing non-generic API, in order to prevent breaking changes, so existing providers would simply continue to work.

I don't know anything about the SqlClient internal implementation... Full generic parameter handling indeed means that writing has to be generic "all the way down", without passing through a non-generic layer (such as TdsParserStateObject?) which switches on the specific type etc. Definitely not trivial.

Full generic parameter handling indeed means that writing has to be generi

Possibly, at the very least it means that the thing writing the value had to be generic though the caller may not need to be aware of the exact type. It's well worth doing but it will be a big very complicated job.

I don't know anything about the SqlClient internal implementation..

It's quite fun in there. Lots of history to learn 馃榿

Note: one non-trivial issue to be resolved is how nulls are written - with the current non-generic DbParameter, nulls are represented via DBNull, but that's not possible with a generic DbParameter<T>.

Unlike non-generic DbParameter, we could accept null values, but that would only work for reference types. We could introduce another property on DbParameter<T> to express this, and also have a DbParameter<T>.Null as a shortcut (although it should still be possible to mutate an existing parameter instance to make it express null).

[edit] stuff to do with readers which wasn't relevant to parameters (https://github.com/dotnet/corefx/issues/27682#issuecomment-526928918).

@Wraith2 these is a very important thing that I really hope we get to improve for 5.0, but you're discussing nullability when reading values from a reader, whereas this issue is about the introduction of a generic parameter API which would avoid boxing when sending values to the database.

The issue of GetFieldValue nullability has been discussed in https://github.com/dotnet/corefx/issues/27682#issuecomment-436736787 - for now that issue seems like a good place to continue that conversation. Do you mind moving this comment over there?

You're right. They're sort of linked in my thinking since they're about carrying two pieces for information, is it null and what is the value which is easy as a tuple on output but better modelled as two properties on input. Much better to separate the information out to IsNullable, IsDBNull and Value properties on DbParameter<T> which is what you were talking about and prompted my recollection.

They're sort of linked in my thinking since they're about carrying two pieces for information, is it null and what is the value which is easy as a tuple on output but better modelled as two properties on input.

They definitely are, and I can imagine a world where the same solution holds for both (i.e. https://github.com/dotnet/csharplang/issues/2194), but for now that doesn't seem like it would happen...

PS have edited your comment above to link to the other issue.

I'm not sure whether equating language null to DBNull is correct. I can't find a scenario where you need the distinction but the my instinct is to keep them separate.

There's also the possibility of using a separate type, so having DbParameter<T> which is explicitly not nullable and then DbNullableParameter<T> : MDbParameter<T>, INullable which allows them.

I'm not sure whether equating language null to DBNull is correct. I can't find a scenario where you need the distinction but the my instinct is to keep them separate.

I'm not sure I see why... C# null maps to other nulls in other contexts (e.g. JSON serialization?), and apart from the problematic intersection of generics, value types and null, I think it would work quite well. If you have any concrete argument here I'd love to hear it.

Somewhat ironically, with the current non-generic DbParameter, mapping language null to database null works even better - since the parameter simply holds a object, it's always possible to simply set it to null. I'd be curious to learn why historically DBNull was chosen to express null instead of language null; it's possibly useful in that it distinguishes an uninitialized parameter (which contains language null) from a parameter set to null (DBNull), but I'm not sure of the actual importance of that (again, the C# language doesn't have that distinction and things seem to work out fine).

There's also the possibility of using a separate type, so having DbParameter which is explicitly not nullable and then DbNullableParameter : MDbParameter, INullable which allows them.

Wouldn't you have to solve the same question with DbNullableParameter, i.e. how to represent null when T is a value type? Having separate nullable and non-nullable parameter types would then be orthogonal to that question. In any case, is there any reason we need a non-nullable parameter type and a nullable parameter type? ADO.NET currently has a single nullable parameter type (DbParameter, where null is expressed via DBNull) and it seems to be working well.

PS DbParameter actually has an IsNullable property, which may resemble your distinction. I'm not sure what it's actually used for though, it possibly only plays a role when using the command builder.

Wouldn't you have to solve the same question with DbNullableParameter, i.e. how to represent null when T is a value type?

Not really no. If IsDBNull the value is undefined so in storage terms it'd be default(T) or the last value, doesn't matter since attempting to read it would be an error.

For some reason I see JSON null as a language null and think that the javascript null and c# null are compatible. I feel that DBNull is a data null which is distinct from a language null. As you pointed out you can use the difference to explicitly see the difference between not setting a value and setting it to be null. Not exactly convincing I agree but it comes from my quite direct dealings with data, I don't use orms.

I have found a case where DBNull provides a distinction of cases: https://stackoverflow.com/questions/4488727/what-is-the-point-of-dbnull/4488758

I consider DBNull to be an API design mistake. This ExecuteScalar situation could have been solved differently, for example by returning an object (or a struct) describing the result, or by throwing if no result set was returned. DBNull makes handling ADO.NET considerably harder. If its role can be reduced without hurting consistency too much then that's great.

I have never seen a situation where the existence of DBNull was desirable.

@GSPP thanks for linking to that question. FWIW I agree with you and Marc's response, and also think that DBNull was a mistake. But it actually doesn't matter that much, since once we move to generic type parameters DBNull simply becomes impossible anyway, so a different way to represent null on parameters must be found in any case. The same may be true also of a new API which would be an alternative to DbDataReader.GetFieldValue, if we choose to go down that path.

That is a useful SO answer. We shouldn't add new uses of DBNull, i think we all agree on that.

public class DbParameter<T>
{
    public bool IsAssigned { get;set; }
    public bool IsNullable { get;set; }
    public bool IsNull { get; set { if (value && !IsNullable) throw new InvalidOperationException() } )
    public T Value { 
        get { if (IsNull) throw new InvalidOperationsException() }  
        set {
            IsAssigned=true;
            if (typeof(T).IsClass && value is null) // the only point of contention?
            {
                IsNull=true;
            }
            _value = value;
        } 
    }
}

(just to set expectations - I personally am not going to start working on this right away, although I definitely intend to do this for 5.0)

Regarding the discussion around the purpose of DbNull, see also this SO example

As someone who has written an abstraction layer over ADO.Net providers (and now looking to extend it to Postgres via the npgsql library) DbNull is not only important for interpreting what you get back from a result, but what you send in as parameter values. I am familiar with how this works in particular with Sql Server, and was researching how it might apply to an abstraction over NpgsqlParameter<T> (hence I was led to this thread ).

If you want to tell Sql Server to set a column to an explicit null, you use DbNull.Value for the parameter value.
But if you happen to use the syntax in the example where you are referencing one or more columns via parameters (instead of leaving the columns out completely) and want Sql Server to use its (pre-declared) DEFAULT value for one or more of the colums to which the parameter refers, then you must set the parameter value to null and not to DbNull.

So indeed if an abstraction layer wants to take advantage of the NpgsqlParameter<T> it would also have to account for null and Nullable<T> semantics as well.

Thanks for that info @mldisibio.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jzabroski picture jzabroski  路  3Comments

v0l picture v0l  路  3Comments

nalywa picture nalywa  路  3Comments

aggieben picture aggieben  路  3Comments

bencz picture bencz  路  3Comments