Almost all get methods on DbDataReader require a column index, which is a brittle way to program as the query SQL changes and resultset columns shift around. It's possible to use the pattern reader.GetInt32(reader.GetOrdinal("name"))
, but that's needlessly verbose. In practice this encourages people to use the reader's indexer (reader["name"]
), which is a weakly-typed API returning object
(requires casting, boxes value types...).
We can add string overloads to DbDataReader to address this - we would have GetInt32("name")
, GetFieldValue<T>("name")
etc. Internally these would simply call GetOrdinal()
, in effect providing a shortcut for the above syntax, and being 100% backwards-compatible. These would be non-virtual, forcing their behavior to be consistent.
After implementation and merge of the string overloads (see original proposal below), EF Core broke since it was looking up DbDataReader methods via reflection, and assuming only one version existed. To remove any chances of similar breakage in other code, it has been proposed to move these methods out as extensions - to be contained in the same assembly (i.e. no new nuget is being proposed).
```c#
public static class DbDataReaderExtensions
{
public static bool GetBoolean(this DbDataReader reader, string name);
public static byte GetByte(this DbDataReader reader, string name);
public static long GetBytes(this DbDataReader reader, string name, long dataOffset, byte[] buffer, int bufferOffset, int length);
public static char GetChar(this DbDataReader reader, string name);
public static long GetChars(this DbDataReader reader, string name, long dataOffset, char[] buffer, int bufferOffset, int length);
public static DbDataReader GetData(this DbDataReader reader, string name);
public static string GetDataTypeName(this DbDataReader reader, string name);
public static DateTime GetDateTime(this DbDataReader reader, string name);
public static decimal GetDecimal(this DbDataReader reader, string name);
public static double GetDouble(this DbDataReader reader, string name);
public static Type GetFieldType(this DbDataReader reader, string name);
public static T GetFieldValue
public static float GetFloat(this DbDataReader reader, string name);
public static Guid GetGuid(this DbDataReader reader, string name);
public static short GetInt16(this DbDataReader reader, string name);
public static int GetInt32(this DbDataReader reader, string name);
public static long GetInt64(this DbDataReader reader, string name);
public static Type GetProviderSpecificFieldType(this DbDataReader reader, string name);
public static object GetProviderSpecificValue(this DbDataReader reader, string name);
public static Stream GetStream(this DbDataReader reader, string name);
public static string GetString(this DbDataReader reader, string name);
public static TextReader GetTextReader(this DbDataReader reader, string name);
public static object GetValue(this DbDataReader reader, string name);
public static bool IsDBNull(this DbDataReader reader, string name);
// For the following see section on async overloads below
public static Task<T> GetFieldValueAsync<T>(this DbDataReader reader, string name, CancellationToken cancellationToken = default);
public static Task<bool> IsDBNullAsync(this DbDataReader reader, string name, CancellationToken cancellationToken = default);
}
# Async overloads and ValueTask
In the original review, two async string overloads were left out - `GetFieldValueAsync()` and `IsDBNullAsync()` - because of the following reasons:
* It is less commonly used API (only required for sequential access)
* We were hoping to come up with some new API pattern for handling nullability that takes advantage of the new support for nullable reference types in C#
* We had doubts on whether new fine-grained async API should return ValueTask<T> instead of Task<T>
After additional discussion we feel that these string overloads should be included. There hasn't been any progress on the story for nullability, and from an API consistency standpoint it would be problematic for all DbDataReader column getters to have string overloads, except for these two. The ordinal-accepting Task-returning methods are already there on DbDataReader, and we'd merely be complementing them with string overloads, as with all the other getters.
We do feel strongly that ValueTask-returning versions are necessary, but consider this to be an orthogonal API addition (tracked by dotnet/runtime#25297). Even if we had a clear idea of what we want today, it wouldn't be advisable to add a ValueTask-returning `GetFieldValueAsync(string)` alongside a Task-returning `GetFieldValueAsync(int)` - we'd likely introduce new method names anyway.
The section replaces dotnet/corefx#35611.
# Original proposal (without extensions)
```c#
public partial class DbDataReader
{
public bool GetBoolean(string name);
public byte GetByte(string name);
public long GetBytes(string name, long dataOffset, byte[] buffer, int bufferOffset, int length);
public char GetChar(string name);
public long GetChars(string name, long dataOffset, char[] buffer, int bufferOffset, int length);
public DbDataReader GetData(string name);
public string GetDataTypeName(string name);
public DateTime GetDateTime(string name);
// Remove from scope - this is a protected virtual method not implemented almost all providers, little interest.
// public DateTime GetDbDataReader(string name);
public decimal GetDecimal(string name);
public double GetDouble(string name);
public Type GetFieldType(string name);
public T GetFieldValue<T>(string name);
// Out of scope for now, pending work on ValueTask-returning overload
// public Task<T> GetFieldValueAsync<T>(string name); // Forwards to CancellationToken
// public Task<T> GetFieldValueAsync<T>(string name, CancellationToken cancellationToken);
public float GetFloat(string name);
public Guid GetGuid(string name);
public short GetInt16(string name);
public int GetInt32(string name);
public long GetInt64(string name);
public Type GetProviderSpecificFieldType(string name);
public object GetProviderSpecificValue(string name);
// public int GetProviderSpecificValues(object[] values); <- Inserted by accident, does not belong here
public Stream GetStream(string name);
public string GetString(string name);
public TextReader GetTextReader(string name);
public object GetValue(string name);
public bool IsDBNull(string name);
// Out of scope for now, pending work on ValueTask-returning overload, and possibly a better null-
// handling strategy
// public Task<bool> IsDBNullAsync(string name); // Forwards to CancellationToken
// public Task<bool> IsDBNullAsync(string name, CancellationToken cancellationToken);
}
| Date | Modification |
| ----- | ------------ |
| | API shape added |
| 30/1 | Updated based on review decisions (removed async overloads) |
| 5/3 | New proposal via extension methods, plus add back async overloads |
These would be non-virtual, forcing their behavior to be consistent.
Should be virtual so the implementation could cache column lookups etc?
public partial class DbDataReader
{
public Task<T> GetFieldValueAsync<T>(string name); // Forwards to CancellationToken
public virtual Task<T> GetFieldValueAsync<T>(string name, CancellationToken cancellationToken);
public Task<bool> IsDBNullAsync(string name); // Forwards to CancellationToken
public virtual Task<bool> IsDBNullAsync(string name, CancellationToken cancellationToken);
public virtual bool GetBoolean(string name);
public virtual byte GetByte(string name);
public virtual long GetBytes(string name, long dataOffset, byte[] buffer, int bufferOffset, int length);
public virtual char GetChar(string name);
public virtual long GetChars(string name, long dataOffset, char[] buffer, int bufferOffset, int length);
public virtual DbDataReader GetData(string name);
public virtual string GetDataTypeName(string name);
public virtual DateTime GetDateTime(string name);
public virtual decimal GetDecimal(string name);
public virtual double GetDouble(string name);
public virtual Type GetFieldType(string name);
public virtual T GetFieldValue<T>(string name);
public virtual float GetFloat(string name);
public virtual Guid GetGuid(string name);
public virtual short GetInt16(string name);
public virtual int GetInt32(string name);
public virtual long GetInt64(string name);
public virtual Type GetProviderSpecificFieldType(string name);
public virtual object GetProviderSpecificValue(string name);
public virtual int GetProviderSpecificValues(object[] values);
public virtual DataTable GetSchemaTable();
public virtual Stream GetStream(string name);
public virtual string GetString(string name);
public virtual TextReader GetTextReader(string name);
}
Should be virtual so the implementation could cache column lookups etc?
Hmm, shouldn't this be part of the existing GetOrdinal()
? It seems that any sort of caching should happen in there...
Better as extensions then?
public static class DbDataReaderExtensions
{
public static Task<T> GetFieldValueAsync<T>(this DbDataReader reader, string name)
=> reader.GetFieldValueAsync<T>(name, default);
public static Task<T> GetFieldValueAsync<T>(this DbDataReader reader, string name, CancellationToken cancellationToken)
=> reader.GetFieldValueAsync<T>(reader.GetOrdinal(name), cancellationToken);
public static Task<bool> IsDBNullAsync(this DbDataReader reader, string name)
=> reader.IsDBNullAsync(reader.GetOrdinal(name), default);
public static Task<bool> IsDBNullAsync(this DbDataReader reader, string name, CancellationToken cancellationToken)
=> reader.IsDBNullAsync(reader.GetOrdinal(name), cancellationToken);
public static bool IsDBNull(this DbDataReader reader, string name)
=> reader.IsDBNull(reader.GetOrdinal(name));
public static bool GetBoolean(this DbDataReader reader, string name)
=> reader.GetBoolean(reader.GetOrdinal(name));
public static byte GetByte(this DbDataReader reader, string name)
=> reader.GetByte(reader.GetOrdinal(name));
public static long GetBytes(this DbDataReader reader, string name, long dataOffset, byte[] buffer, int bufferOffset, int length)
=> reader.GetBytes(reader.GetOrdinal(name), dataOffset, buffer, bufferOffset, length);
public static char GetChar(this DbDataReader reader, string name)
=> reader.GetChar(reader.GetOrdinal(name));
public static long GetChars(this DbDataReader reader, string name, long dataOffset, char[] buffer, int bufferOffset, int length)
=> reader.GetChars(reader.GetOrdinal(name), dataOffset, buffer, bufferOffset, length);
public static DbDataReader GetData(this DbDataReader reader, string name)
=> reader.GetData(reader.GetOrdinal(name));
public static string GetDataTypeName(this DbDataReader reader, string name)
=> reader.GetDataTypeName(reader.GetOrdinal(name));
public static DateTime GetDateTime(this DbDataReader reader, string name)
=> reader.GetDateTime(reader.GetOrdinal(name));
public static decimal GetDecimal(this DbDataReader reader, string name)
=> reader.GetDecimal(reader.GetOrdinal(name));
public static double GetDouble(this DbDataReader reader, string name)
=> reader.GetDouble(reader.GetOrdinal(name));
public static Type GetFieldType(this DbDataReader reader, string name)
=> reader.GetFieldType(reader.GetOrdinal(name));
public static T GetFieldValue<T>(this DbDataReader reader, string name)
=> reader.GetFieldValue<T>(reader.GetOrdinal(name));
public static float GetFloat(this DbDataReader reader, string name)
=> reader.GetFloat(reader.GetOrdinal(name));
public static Guid GetGuid(this DbDataReader reader, string name)
=> reader.GetGuid(reader.GetOrdinal(name));
public static short GetInt16(this DbDataReader reader, string name)
=> reader.GetInt16(reader.GetOrdinal(name));
public static int GetInt32(this DbDataReader reader, string name)
=> reader.GetInt32(reader.GetOrdinal(name));
public static long GetInt64(this DbDataReader reader, string name)
=> reader.GetInt64(reader.GetOrdinal(name));
public static Stream GetStream(this DbDataReader reader, string name)
=> reader.GetStream(reader.GetOrdinal(name));
public static string GetString(this DbDataReader reader, string name)
=> reader.GetString(reader.GetOrdinal(name));
public static TextReader GetTextReader(this DbDataReader reader, string name)
=> reader.GetTextReader(reader.GetOrdinal(name));
}
It could definitely be extensions, but is there a good reason to have something as an extension when it could be built-in? It's a bit like how the async methods always have their non-virtual overload without a cancellation token, which delegates to the virtual overload that does...
is there a good reason to have something as an extension when it could be built-in?
The only get compiled on use (rather than whole class for instance level) and they can be added downlevel via nuget; whereas changing the base needs a rev to the runtime.
If they are only using the public method and non-virtual, is there any reason to be an instance method?
is there a good reason to have something as an extension when it could be built-in?
The only get compiled on use (rather than whole class for instance level)
Are you talking about JIT compilation? Doesn't that only happen on first use of the method?
and they can be added downlevel via nuget; whereas changing the base needs a rev to the runtime.
That's true, but which nuget would you put them in? Would you expect users to depend on a nuget just to get this kind of utility? The last could possibly be mitigated by having some sort of general "BCL extensions package"... I also don't particularly mind for this to only make it into the next .NET Core (and .NET Standard).
Also, it's a bit concerning to see core/fundamental APIs migrate out of the BCL and into extension nugets just because they can be implemented as extensions... It hurts discoverability (doesn't show up in IntelliSense unless you've referenced the nuget), the extensions don't show up on the DbDataReader docs (I think), etc.
If they are only using the public method and non-virtual, is there any reason to be an instance method?
Nothing beyond general object-oriented design principles... From the user's point of view both GetInt32(int)
and GetInt32(string)
are equivalent/similar, so it seems like they should be defined in the same place.
I guess it's a very general design discussion... It feels like I've seen many utility overloads in the BCL as instance methods that could be extension methods (non-virtual, using public APIs), the async overload without cancellation token is just one example. We also have cases of extension methods in external nugets, but that seems to be a way to add necessary APIs without revving the runtime version - but these proposed APIs aren't exactly necessary/urgent.
is there a good reason to have something as an extension when it could be built-in?
The only get compiled on use (rather than whole class for instance level)
Are you talking about JIT compilation? Doesn't that only happen on first use of the method?
I think its more of a problem for generic classes; and may depend it if its being distrubuted as part of the runtime (AoT/Crossgen/Ready-To-Run)? Not sure of the specifics here /cc @jkotas
That's true, but which nuget would you put them in?
System.Data.Extensions
It hurts discoverability
Span is probably a good example as most of its extra methods are in Extension Methods both listed in docs on the Span page and come up as methods in IntelliSense as they are in the same namespace.
The extensions can be inboxed for a runtime/standard to be discoverable by default (without reference); and available as a Nuget to immediately be available on every runtime for maximum reach (netcore, netfx, Mono, Unity, iOS, Android, etc).
Also, it's a bit concerning to see core/fundamental APIs migrate out of the BCL and into extension nugets just because they can be implemented as extensions...
but these proposed APIs aren't exactly necessary/urgent.
Are they core/fundamental or are they just convenience wrappers over the current public api?
If they were virtual; then they'd definitely need to be on the type.
If they are non-virtual and just calling public methods; other than a tooling issue (discovery), what's the advantage of forcing every runtime to copy paste them and forcing users to wait for that runtime to rev to a version that includes the apis? (e.g. .NET Core 3.0; some unknown future version of Mono, some unknown future version of .NET Standard) when it could be a nuget targeting current NET Standard and available everywhere on the first release?
I've seen many utility overloads in the BCL as instance methods that could be extension methods (non-virtual, using public APIs), the async overload without cancellation token is just one example.
If they are added at the same time; is there an advantage to creating an extension class with single methods? (Maybe so you know its just a call through, rather than its own implementation).
Whereas this is an additional 24 methods to a preexisting class, all public api wrappers, so has more bulk to the extension class?
Not sure if there is specific guidance on non-virtual base class instance vs extension somewhere? /cc @terrajobst
I think its more of a problem for generic classes; and may depend it if its being distrubuted as part of the runtime (AoT/Crossgen/Ready-To-Run)? Not sure of the specifics here /cc @jkotas
Right, this is a problem of the frequently used generic types like Dictionary
. I do not think this concern applies here. I think these can be instance methods just fine.
That's true, but which nuget would you put them in?
System.Data.Extensions
Right, so we're talking about a new nuget here... Do we really want to basically start having a nuget extensions package for each BCL namespace? More nugets for people to discover and reference, more potential for the dependency hell inherent in additional dependencies...
Span is probably a good example as most of its extra methods are in Extension Methods both listed in docs on the Span page and come up as methods in IntelliSense as they are in the same namespace.
The extensions can be inboxed for a runtime/standard to be discoverable by default (without reference); and available as a Nuget to immediately be available on every runtime for maximum reach (netcore, netfx, Mono, Unity, iOS, Android, etc).
That's indeed a good example. But with Span an important goal was to make the new methods available backwards, for all existing platforms, and that can obviously be done only via extensions. Here I don't think we have this goal at all - it's a convenience wrapper that can be added going forward.
but these proposed APIs aren't exactly necessary/urgent.
Are they core/fundamental or are they just convenience wrappers over the current public api?
I don't think that convenience wrappers can't be core/fundamental. The point is that the class in question, DbDataReader, is definitely core/fundamental, and getting a column value by name seems to be completely basic functionality that shouldn't involve another nuget. The case of the async cancellation token overloads comes to mind again (I know I'm repeating myself...).
I've seen many utility overloads in the BCL as instance methods that could be extension methods (non-virtual, using public APIs), the async overload without cancellation token is just one example.
If they are added at the same time; is there an advantage to creating an extension class with single methods? (Maybe so you know its just a call through, rather than its own implementation).
I don't see the need for users to know whether that something is just a call through or not... From the users' perspective that seems like an unimportant implementation detail.
To summarize, it seems that perf isn't an argument in either direction. Adding this as extension methods via an external nuget is problematic in that there's an extra dependency and in that discoverability is hurt, whereas adding this as instance methods only makes it available going forward (and not to older versions).
Seems like it would indeed be good to have some input from someone like @terrajobst...
Do we really want to basically start having a nuget extensions package for each BCL namespace?
We have been on this plan (adding NuGets with extensions) for a while in the past. It did not work well. We are not doing that anymore.
Adding this as instance methods only makes it available going forward
Correct. We are fine with this constrain.
Instance methods it is then :)
@roji do you want to specify the exact api shape (new methods) in the summary (first item)?
Sure, you basically did all the work for me :)
@jkotas out of curiosity, what were the specific issues with the extension NuGet approach that made you drop it? Just interested if there's an additional con I haven't thought of.
We typically needed to fold the addition back into the platform, e.g. because of it is too valuable API to not have by default; or to optimize the implementation in a way that is not possible to when it is an extension. The end result was a lot of small .dlls (you cannot get rid of the extra NuGet/assembly even once it is folded into the platform), and version mixes that were impossible to reason about during roll-forward.
@jkotas & @karelz, I'm curious, why this one is added to Future milestone? This is an easy implementation that does not break people and adds value. There is no project I recently worked on without adding my own Get*(string column)
extensions :)
I can implement this if you're interested.
https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md describes the process to add new APIs.
This one is on System.Data area owners to move this forward.
cc @divega @ajcvickers @keeratsingh @afsanehr @david-engel
OK, thanks. I'll wait for a comment from the owners then :)
This looks like a good addition to me. I don't see any reason we wouldn't take a well-written PR for it. /cc @roji @divega
@ajcvickers If you like the proposal, you should mark it as api-ready-for-review
and get it approved first (see https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md).
I was intending to submit a PR on this in a few weeks, but if someone else has time that's great.
@ajcvickers @David-Engel @keeratsingh @AfsanehR can you please agree on the API shape from System.Data ownership point of view and if you do, mark it as api-ready-for-review as suggested above?
We want to avoid surprise API changes after community invested time into PR submission. Thanks!
@karelz I would like to hear back from @divega since while I own from an engineering side, he is the PM owner for .NET data.
I'll defer API suggestions in System.Data to @divega also. We service the code, but he should review API surface area suggestions.
@roji @ajcvickers @David-Engel, Sorry I didn't respond before. I am a bit surprised I didn't see all the mentions :smile:
I looked at the signatures, they seem correct, and I am confident that if there is anything missing or incorrect @roji will find it while his coding it. So @roji sure, you can send a PR :smile:
BTW, I think this is a small improvement, but also low hanging fruit. I only have a couple of comments I would like to share on the proposal, and if anyone has feedback on these, I am interested in learning what you think:
Even after this improvemnt, this type still feels old odd. If I am writing an application, I am personally not going to want to use this API _type_ directly. And if I am writing and O/RM, I would probably avoid the proposed string methods in most cases for performance (I'd rather build a materializer using GetOrdinal and then avoid repeating the lookup for each field in each row). I wonder what kind of data reader abstraction we could design if we started again in 2019, given that we have now figured out generics, IEnumerable
, Nullable<T>
, streaming with reusable ValueTask<T>
(similar to the IAsyncEnumerable<T>
proposal), and even the TryGet
pattern?
It also feels odd that we are extending this API over the ordinal vs. field name pivot, when we have recently avoided doing it in other possible dimensions. For example, if you want to get a field value asynchronously, your only option is to use the generic GetFieldValueAsync
. We added IsDBNullAsync
because it is necessary (you have to call it defensively before GetFieldValueAsync
, yet another usability issue in the API), but we didn't add GetBooleanAsync
, or GetByteAsync
, we haven't updated the equivalent APIs in DbDataRecord for several years, etc. Though I totally get that needing to get a field value asynchronously is less common than doing it synchronously (you should only ever do it if you are reading with SequentialAccess), it is weird that everything is so inconsistent.
@divega if you are OK with the API, please get formal API approval (mark it as api-ready-for-review & join us in Tue meeting - cc Immo)
Once approved, it will be good time to put up PR (just to avoid wasting @roji's time in case the API approval asks for adjustments ...)
If I am writing an application, I am personally not going to want to use this API directly.
For single row data this is a bit of clunky usage pattern
int field0 = reader.GetInt32(reader.GetOrdinal("field0"));
// or
int field1Index = reader.GetOrdinal("field1");
int field1 = reader.GetOrdinal(field1Index);
vs the proposed
int field0 = reader.GetInt32("field0");
The current named column option as it goes via boxing gives unhelpful errors when casting:
long field0 = (long)reader["field0"];
InvalidCastException: Specified cast is not valid.
(as field is int
and can't be cast boxed in object to long
)
I wonder what kind of data reader abstraction we could design if ...
Interesting 😄
We added
IsDBNullAsync
because it is necessary (you have to call it defensively beforeGetFieldValueAsync
Would using nullables with the generic be valid for this? (Although not helpful for string
necessarily)
int field0 = await reader.GetFieldValueAsync<int?>(0);
For single row data this is a bit of clunky usage pattern
Yes, I am not disagreeing that it is an improvement for some scenario. I meant I would not want to use DbDataReader without an O/RM or micro-O/RM in an application at all :smile:
Would using nullables with the generic be valid for this?
Maybe. I am honestly not sure what new trade offs we would make. Perhaps some things would remain the same. Are non-generic methods for common (even if not universally supported) types easier to use anyway? Is the distinction between null and DBNull a necessary evil?
@karelz done, thanks!
@divega and @roji, can I give this one a shot?
@divega,
Even after this improvement, this type still feels odd. If I am writting an application, I am personaly not going to want to use this API type directly. And if I am writting and O/RM, I would probably avoid the proposed string methods in most cases for perf (I'd rather build a materializer using GetOrdinal and then avoid repeating the lookup for each field in each row). I wonder what kind of data reader abstraction we could design if we started now that we have figured out generics, IEnumerable, Nullable
, reusable ValueTask (as the IAsyncEnumerable proposal), and even the TryGet pattern?
I definitely think it's worth thinking about a better/more modern data reader abstraction (we've talked about this from time to time). At the same time, I've done some coding directly against DbDataReader and IMHO it isn't that bad for low-level access :) It's true that every time I do it I add some extensions (like the proposed string-based overloads) to improve the code, but the general concept still seems OK/valid (and corresponds to patterns in other languages). In any case, it seems like DbDataReader is here to stay, and it would be good for it to become a more usable API, especially as this is such a trivial low hanging fruit.
Regarding the perf note, I'd expect responsible ADO.NET drivers to cache the string-to-ordinal mapping (inside GetOrdinal()
), making this argument a bit less powerful (but not invalid).
[...] but we didn't add GetBooleanAsync, or GetByteAsync, etc. I totally get that needing to get a field value asynchronously is less common (you should only ever do it if you are reading with SequentialAccess), though.
I'd guess this could be because once the generic GetFieldValue<T>()
was introduced, the specialized getters (GetBoolean()
) became much less useful, and it wasn't worth adding all the async overloads (especially since, as you say, they only make sense for SequentialAccess, which is rare). I guess the first argument is valid for the string-based overloads too (in which case we'd really consider the specialized getters deprecated, and only introduce string overloads for GetFieldValue{,Async}()
, IsDBNull{,Async}()
). However, the specialized getters seem vastly more useful then async field access, so this would potentially simplify a lot more user code.
@benaadams,
Would using nullables with the generic be valid for this? (Although not helpful for string necessarily)
int field0 = await reader.GetFieldValueAsync
(0);
We actually implemented exactly this feature in Npgsql (see https://github.com/npgsql/npgsql/issues/1728). To me this does seem to be an improvement over forcing users to call IsDBNull()
for value types (maybe in C# 8 also for reference types?). However, it's not something that is represented/enforced in the API shape, so it's up to each ADO provider to either support this or not, so if you use this your code is less portable. In any case this seems orthogonal to the string-based overload discussion...
@TheBlueSky, I'm not sure but I think that now that this has the api-ready-for-review
label it will be discussed in API review, and it's best to wait until then before spending any time on this (maybe this will get rejected or modified).
I've done some coding directly against DbDataReader and IMHO it isn't that bad for low-level access...
@roji ok, if you say so :smile:
@TheBlueSky what @roji and @karelz said, let’s wait until the API gets reviewed.
Apparently we had missed moving this to the 3.0 milestone for consideration in API review. Moved it now.
FWIW I already had to implement all these overloads in MySqlConnector for API compatibility with Oracle's Connector/NET (which had independently decided that they would be useful overloads): mysql-net/MySqlConnector#435.
I'd guess this could be because once the generic GetFieldValue
() was introduced, the specialized getters (GetBoolean()) became much less useful, and it wasn't worth adding all the async overloads (especially since, as you say, they only make sense for SequentialAccess, which is rare).
SequentialAccess is used by ORMs. Since they tightly control both the SQL generation and the data reading, they can afford the extra effort of ensuring the columns are read in order.
However, I don't like using async methods for reading specific fields because of the Task overhead. If we were to offer async version of GetBoolean() and friends, I would want them to be ValueTask instead of Task.
@Grauenwolf you're right that SequentialAccess is important (for ORMs and otherwise) - the above comment was only to hypothesize on why no type-specific async getters (e.g. GetBooleanAsync()
) exist in the current API. I don't think it's very important - IMHO since the generic GetFieldValue<T>()
and GetFieldValueAsync<T>()
already exist, I don't see any reason to introduce any new type-specific getters.
~Regarding Task vs. ValueTask, it's true that a method similar to GetFieldValueAsync<T>()
returning a VauleTask (as opposed to Task) could be useful. It's worth noting that this would provide a perf improvement only when actual I/O needs to be done to read the column - if it's already in memory (as it many times is), the returned Task can already be cached and does not involve any perf hit.~
It's worth noting that this would provide a perf improvement only when actual I/O needs to be done to read the column - if it's already in memory (as it many times is), the returned Task can already be cached and does not involve any perf hit.
Surely the inverse? A cached Task<T>
would only be useful if you are reading the column a second time; for some reason, or the data is not changing much per row.
Whereas ValueTask<T>
has the benefit when you already have the data; so don't need to allocate a Task<T>
to return it.
@roji @benaadams Yeah, it is too easy to get things flipped in your mental model when we are talking about so many variables :smile: Caching of Task<T>
only applies when there is a small number of possible values of T
, e.g. the Task<book>
returned by ReadAsync
.
Issue https://github.com/dotnet/corefx/issues/15011 is about having ValueTask<T>
versions of some of the field-level methods.
Note that it was written at a time the common practice was to create ValueTask<T>
backed by instances of either T
or a Task<T>
.
Now based on the work for done in 2.2, we have patterns that allow minimizing allocations when using ValueTask<T>
, which we can leverage for this.
(edit: the following paragraph was the result of misunderstanding how things work)
I haven’t thought it trough, but it is interesting to think that you could get all the ValueTask instances from the reader upfront, and then then just advance the sliding window to get the values for each row. The ValueTask then becomes an async value accessor that you only need to get one for each field in the reader. ML.NET uses a similar pattern in its data stack (only it is not async and the accessors as just Funcs).
@benaadams yes of course, I was mixing in the conversation about ReadAsync(), apologies. A method analogous to GetFieldValueAsync<T>()
returning a ValueTask would indeed be valuable regardless of whether the value is already present in memory. It's worth noting again that this is all relevant only when SequentialAccess is specified, either the user can simply use the non-async GetFieldValue()
and assume that no I/O will happen.
We concluded that we want to leave these ones off for now. The reason being that ADO.NET really wants to expose a ValueTask<T>
as opposed to Task<T>
but we'll need to revisit how we'd plumb this through:
```C#
public Task
public Task
public Task
public Task
This one is a mistake:
```C#
public int GetProviderSpecificValues(object[] values);
Which leaves us with this:
C#
public partial class DbDataReader
{
public bool GetBoolean(string name);
public byte GetByte(string name);
public long GetBytes(string name, long dataOffset, byte[] buffer, int bufferOffset, int length);
public char GetChar(string name);
public long GetChars(string name, long dataOffset, char[] buffer, int bufferOffset, int length);
public DbDataReader GetData(string name);
public string GetDataTypeName(string name);
public DateTime GetDateTime(string name);
public DateTime GetDbDataReader(string name);
public decimal GetDecimal(string name);
public double GetDouble(string name);
public Type GetFieldType(string name);
public T GetFieldValue<T>(string name);
public float GetFloat(string name);
public Guid GetGuid(string name);
public short GetInt16(string name);
public int GetInt32(string name);
public long GetInt64(string name);
public Type GetProviderSpecificFieldType(string name);
public object GetProviderSpecificValue(string name);
public Stream GetStream(string name);
public string GetString(string name);
public TextReader GetTextReader(string name);
public object GetValue(string name);
public bool IsDBNull(string name);
}
IMHO since the generic
GetFieldValue<T>()
andGetFieldValueAsync<T>()
already exist, I don't see any reason to introduce any new type-specific getters.
Those currently require boxing internally (at least the SqlClient impl). The type-specific getters avoid that.
IMHO since the generic GetFieldValue
() and GetFieldValueAsync () already exist, I don't see any reason to introduce any new type-specific getters. Those currently require boxing internally (at least the SqlClient impl). The type-specific getters avoid that.
That seems like an SqlClient implementation detail, no? Npgsql doesn't box in these implementations...
The implementation uses an SqlBuffer
which contains a discriminated union of common value types rather than throwing everything in an object. If you request an int and it's typed as an int in the metadata it should not box to return it. There's a ton of other stuff that allocates when you read data but that one doesn't as far as I can see.
That seems like an SqlClient implementation detail, no? Npgsql doesn't box in these implementations...
@roji I'd be curious to see the npgsql implementation of those. Is it a big switch on the typeof(T)
, or something more elegant? I remember when I saw those generic field accessors I was curious how they were implemented in SqlClient, because doing it without boxing didn't seem straightforward to me. Internally it just calls GetValueFromSqlBufferInternal which returns "object", the generic method calls that and casts to (T).
@Wraith2 No boxing happens if you access the int
via GetInt32. If you access it via GetFieldValue
I see it now, GetFieldValueFromSqlBufferInternal
. Since that's generic though it should be possible to use jit branch elimination to prevent that. I'll investigate when I get chance.
@roji And the answer to my question is: "something more elegant". The column metadata has a handler that implements a concrete interface to read the specific type of that column. This concrete implementation is used in a generic way to get the T without boxing. @Wraith2 I don't know anything about "jit branch elimination", but the npgsql pattern was what I was expecting to find in the SqlClient implementation.
Can you point me at the pattern you were expecting?
@MarkPflug the internal Npgsql architecture seems like it's a bit simpler... There's a type handler hierarchy, with a handler being a generic class that knows how to read and write a certain type. When you call GetFieldValue()
, a type handler has already been resolved for that column based on the type OID sent from PostgreSQL, so we simply call into the resolved generic handler, which proceeds to read the data from the network buffer, parse it and return the requested type. It can get a bit elaborate at times, but overall it's pretty straightforward.
Long story short, there simply isn't any sort of non-generic SqlBuffer
which contains an arbitrarily-typed value - (almost) everything is generic.
@Wraith2 This line: https://github.com/npgsql/npgsql/blob/2b417165e69bd5187b1b067b6981f233f06824b6/src/Npgsql/NpgsqlDefaultDataReader.cs#L95.
Which calls into a abstract Read
https://github.com/npgsql/npgsql/blob/2b417165e69bd5187b1b067b6981f233f06824b6/src/Npgsql/TypeHandling/NpgsqlSimpleTypeHandler.cs#L102
Each of the concrete "handlers" implements INpgsqlSimpleTypeHandler
@roji Yeah, it isn't that complicated, but it definitely is an elegant application of generics.
That pattern should be able to be applied to the SqlClient code without any public API change to remove the boxing overhead from GetFieldValue
Now I remember why I was looking at that code path in SqlClient: we deal with Guid as binary(16) in our database (don't ask), and I was trying to figure out how sql "swizzles" the byte order on Guids. I noticed that SqlGuid is stored as an "object" in the SqlBuffer:
https://github.com/dotnet/corefx/blob/9df66573b5b80362b21103d44bc87c40042e9662/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs#L112
So Guids will always be boxed when read from a SqlDataReader. Can't the Storage
struct (union) be expanded to accommodate SqlGuid? It looks like it should already fit in the existing struct, since NumericInfo is pretty sizable (19 bytes or is it padded to 20?).
@roji, I'll give the implementation a go, is that okay?
public DateTime GetDbDataReader(string name);
This looks like a typo (the return type is wrong?).
I also think it should not be public because its sibling method GetDbDataReader(int ordinal)
is currently protected
. (Or should be dropped entirely, and the functionality provided by GetData(string name)
.)
@bgrainger agreed, apart from the return type typo, this seems like a very obscure method only implemented in OleDbDataReader, and delegating to GetData()
even there. I think we should leave it out.
@TheBlueSky sure, feel free to submit a PR - I can give it a first review.
@roji @divega
Is the plan to wait for a community member to pick this up? If we complete this in time for .NET Core 3.0 we could also expose it from .NET Standard 2.1 but this window is also closing.
@terrajobst @TheBlueSky has picked it up and has a PR pending https://github.com/dotnet/corefx/pull/35211 which needs some assistance to work out why the build is failing.
@terrajobst, I created a pull request that implements this; however, I have some difficulty in figuring out why certain targets cannot see the new overloads.
Could you assist on this? I tagged couple of people but got no response :(
PR doesn't work on NETFX and UWP so feels like a packaging issue?
Looking at the output dll it doesn't have PreferInBox assembly level attribute, It could well be, since it isn't in the standard version that netfx implements inbox would cause the error. Does UAP work the same way? what actual runtime/ref is the target there?
PR merged for .NET Core 3.0. Leaving issue open for @divega to track remaining work.
The primitive boxing issue @MarkPflug and I were discussing earlier in the thread has been addressed in SqlClient with https://github.com/dotnet/corefx/pull/34999 , it just needs reviewing and integrating. I've got a number of PR's open on SqlClient that would be good to get into 3.0 some of which have been open for several months and just need attention.
@TheBlueSky thanks for your work on the PR.
Unfortunately, once this was merged Entity Framework Core broke, since it was using reflection to look up the methods, and assuming that only one method exists with the given name (e.g. DbDataReader.GetInt32()
). While this has been fixed in EF Core (https://github.com/aspnet/EntityFrameworkCore/pull/14853) it has raised the question of the backwards compatibility bar, and if EF Core broke over this change, other O/RMs and data access layer could as well.
The proposed solution is simply to implement the new methods as extension methods rather than adding them to the type (as originally proposed by @benaadams in this issue :)). Note that these extension methods would be part of the System.Data.Common assembly.
@TheBlueSky are you interested in making this additional change?
@roji, I am interested in making this change.
I have a question related to EF Core issue. What makes this "special"? Is it because we got to know about it, or is there a rule on when and where we can add new APIs? I mean any additional API in any random Core FX class can break someone's library.
@TheBlueSky please hold off for a day or two before actually implementing anything, we're still discussing the level of backwards compatibility we want to provide.
I have a question related to EF Core issue. What makes this "special"? Is it because we got to know about it, or is there a rule on when and where we can add new APIs? I mean any additional API in any random Core FX class can break someone's library
You're right, and I think there aren't any completely clear-cut rules about this. It's more of a pragmatic choice trying to consider the risk of breakage - and in this case we do have the indicator that a major O/RM was broken by it. It's definitely not an easy decision.
Discussed again with @divega and we feel that this proposal should indeed include the string overloads for GetFieldValueAsync()
and IsDBNullAsync()
, returning Task. The idea here would be to maintain API consistency - if all other column getters have these overloads, we don't see why this one shouldn't. We think that other overloads returning ValueTask (and possibly behaving better with regards to null) can and should be introduced, but see this as a separate question - those overloads would be introduced with new names in any case.
Otherwise there's still the question of implementing via methods on DbDataReader or extensions.
@divega can you please mark ready-for-review?
@roji I have added the label before I forget, but could you please make sure before Tuesday we have the right set of APIs for review on the first comment? E.g. the extension methods, including the two existing async methods? I can close the issue I created for the async ones afterwards.
I was also re-reading the initial comments from @benaadams and now that we have to do extension methods anyway, I realize the benefits he mentioned will be cool :smile:
@roji let me know when we can review and I will ping @terrajobst to get a slot to re-review this one.
@divega sorry this took so long. I have updated the proposal above (both for extensions and for adding back the two async overloads), this should be ready for review again.
FWIW I'm still not entirely convinced about the extensions, but it's not very important :) When the option of extensions was originally discussed, I think one of the main points was bringing this functionality to older TFMs as well by putting the extensions in an external nuget. We're not going to do that, so I'm not sure there's any advantage to extensions beyond preventing breakage. But I admit there's no big downside either :)
The extensions could always be put into a source nuget for older TFMs if needed. I know i've written those extensions several times so people may not need them in older projects but if we prevent them having to find and pull in those snippets of code in new projects i think it's a success.
@divega removing api-approved as you apparently need another round of API review ...
@roji @Wraith2 I believe since DbDataReaderExtensions already exists and ships in the System.Data.Common package as a real implementation (as opposed to type-forwarding), these extension methods will probably be available everywhere.
Anyway, we can go over the details in the API review.
Sure, sounds like a good place to put them.
@TheBlueSky assuming you're still interested, can you please submit a PR removing the new string overloads out of DbDataReader and reintroducing them as extension methods?
@roji, yeah, I'll do that.
For the record, I don't see the benefit of having them as extensions if they are going to be baked in and not distributed as a NuGet package. I always thought of extension methods as a way to provide additional functionality to a class where there is a limitation or to provide functionalities that are specific to the extender. In this case, it's just to prevent breaking a hypothetical 3rd party library.
I hope this case won't set a precedent for how future Core FX APIs are decided.
@TheBlueSky if you read above in the thread, there are possible advantages of extension methods. But the real reason we switch to extension methods is compatibility with reflection code that assumes there is a single instance method with each name.
@TheBlueSky we got approval for changing this as described in the proposal - will you be able to submit a PR for that early next week? This includes both the switch to extension methods and the additional async methods.
This really needs to happen soon as it's breaking various users, if you're too busy please let me know and I'll take care of it.
Thanks!
@roji, will do that.
@roji, I created a PR; however, although builds and tests pass, some checks fail at "Send to Helix" step. Could you have a look.
the returned Task can already be cached and does not involve any perf hit.
Are you sure about that? Wouldn't you have to cache a separate Task object
for every possible value?
On Sat, Jan 12, 2019 at 9:52 AM Shay Rojansky notifications@github.com
wrote:
@Grauenwolf https://github.com/Grauenwolf you're right that
SequentialAccess is important (for ORMs and otherwise) - the above comment
was only to hypothesize on why no type-specific getters (e.g. GetBoolean())
exist in the current API. I don't think it's very important - IMHO since
the generic GetFieldValue() and GetFieldValueAsync () already exist,
I don't see any reason to introduce any new type-specific getters.Regarding Task vs. ValueTask, it's true that a method similar to
GetFieldValueAsync() returning a VauleTask (as opposed to Task) could
be useful. It's worth noting that this would provide a perf improvement
only when actual I/O needs to be done to read the column - if it's already
in memory (as it many times is), the returned Task can already be cached
and does not involve any perf hit.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/31595#issuecomment-453767528,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKIqTcpuKnC_KKPl6mgRtwa7rw3OYKqBks5vCiDhgaJpZM4Vu275
.
@Grauenwolf I assume you're reacting to https://github.com/dotnet/corefx/issues/31595#issuecomment-453767528? If so yeah, I mixed up two discussions and my comment there wasn't relevant (I've already added strike-through).
@roji I think this issue was addressed. Can we close it now?
Yes, finally... :)
Most helpful comment
@roji, yeah, I'll do that.
For the record, I don't see the benefit of having them as extensions if they are going to be baked in and not distributed as a NuGet package. I always thought of extension methods as a way to provide additional functionality to a class where there is a limitation or to provide functionalities that are specific to the extender. In this case, it's just to prevent breaking a hypothetical 3rd party library.
I hope this case won't set a precedent for how future Core FX APIs are decided.