We already have the following "convert to .NET type" APIs on Utf8JsonReader:
```C#
public ref partial struct Utf8JsonReader
{
// These throw InvalidOperationException if there is a token type mismatch
public string GetStringValue();
public bool GetBooleanValue();
// These throw InvalidOperationException if there is a token type mismatch (i.e. not JsonTokenType.Number).
public bool TryGetInt32Value(out int value);
public bool TryGetInt64Value(out long value);
public bool TryGetSingleValue(out float value);
public bool TryGetDoubleValue(out double value);
public bool TryGetDecimalValue(out decimal value);
}
**Add the following:**
```C#
public ref partial struct Utf8JsonReader
{
// These throw InvalidOperationException if there is a token type mismatch.
// These throw OverflowException if the number value doesn't fit into the .NET primitive.
public int GetInt32Value();
public long GetInt64Value();
public float GetSingleValue();
public double GetDoubleValue();
public decimal GetDecimalValue();
}
Questions:
Value from the names?GetValue(out X value) and TryGetValue(out X value)GetInt16, GetUInt64, etc.?cc @JeremyKuhne, @GrabYourPitchforks, @KrzysztofCwalina, @stephentoub, @terrajobst , @bartonjs
Do you have a customer for JSON+BigInteger? If not, I'd hold off, just to keep the layering more flexible.
public int GetInt32() looks like it has more precedent than public int GetInt32Value().
So, maybe
```C#
public ref partial struct Utf8JsonReader
{
public bool TryGetValue(out BigInteger value);
// Similar for the others
public int GetInt32();
public long GetInt64();
public float GetSingle();
public double GetDouble();
public decimal GetDecimal();
public BigInteger GetBigInteger();
}
```
Do you have a customer for JSON+BigInteger? If not, I'd hold off, just to keep the layering more flexible.
No, not yet. How would adding it reduce the flexibility of this layer?
In any case, I split off just the BigInteger API into a separate issue: https://github.com/dotnet/corefx/issues/33458
What should the semantics of these APIs be (particularly the TryGet APIs)? Should we throw on token type mismatch or return false?
For context: https://github.com/dotnet/corefx/pull/33216#discussion_r231570155
From @stephentoub:
I'm confused by the
Tryprefix now. In what situations do we returnfalsefrom theseTrymethods?
From @ahsonkhan:
Yes, agreed. Throwing muddied the pattern. We return false if the underlying
Utf8Parserreturned false (for example if precision too high) OR if we didn't consume the entire ValueSpan (for example if we callTryParse(..., out int, ...)on a double which contains a decimal).
From @stephentoub:
It seems strange to have two failure modes, when both failure modes represent different forms of "I was unable to parse the data as the thing you wanted". Seems like we should either return false for both cases, or we should throw for both cases and make it a Get instead of TryGet method.
From @ahsonkhan :
I followed @bartonjs's suggestion to make this change: https://github.com/dotnet/corefx/pull/33216#discussion_r231328722
If we decide to revert this, I would go back to returning false in all cases and following the TryGet pattern.
From @stephentoub:
which means I don't know what to do as a caller
The caller can disambiguate if they need to by checking the token type first. I don't see this as a case of the
Trybool being used to mean two different things, e.g. I ask it to parse "9876543210" as an Int32, and I ask it to parse "987654321d" as an Int32... I don't see why I should expect false for the first case and an exception for the latter. int.TryParse, for example, doesn't throw if it hits something non-numeric, it returns false. The bool just means "could I parse the data as the requested format or not".
From @bartonjs:
Framework Design Guidelines (2ed), 7.5.2 (Try-Parse Pattern)
When using this pattern, it is important to define the try functionality in strict terms. If the member fails for any reason other than the well-defined try, the member must still throw a corresponding exception.
(Emphasis mine)
@KrzysztofCwalina said that what this means is something like "every return false should imply the same recovery action". In the case of Span-filling methods it means "there's not enough space". My suggestion here is that false means "this is a JSON number, but not one that fits in Int32" (so try Int64 or double), but that isn't the same recovery that you'd take if the current token type was PropertyName.
From @stephentoub:
My suggestion here is that false means "this is a JSON number, but not one that fits in Int32"
And mine is: "this cannot be parsed as an Int32" :) I'm not sure why it's valuable to distinguish whether it's too big to be an Int32 or contains non-digits by returning false for one and throwing an exception for the other.
From @bartonjs:
Non-digits shouldn't be an issue here (unless the data was modified after the call to Next), since the reader verified that the number was a legal BigDecimal before saying it found a number.
{ "value": 1234d }would have been a JsonReaderException during Next().
- Not in the state table here
{ "value": "1234" }just completely isn't a number, why are you asking if it's an int?
- (I'm suggesting this should throw)
{ "value": 2147483648 }is too big for int32 (but not int64 or double)
- return false, caller throws because it's a schema violation, or calls TryGetDouble/etc.
{ "value": 123.4 }is non-integral
- same as above
{ "value": (a 600 byte long series of digits) }
- TryGetInt32 returns false, TryGetInt64 returns false, TryGetDouble returns false, caller now decides what to do with a value that's too big (BigInteger would be valid for this case)
Essentially, that false is a data problem, throw is a state problem (indicating the question made no sense)
From @stephentoub:
That just adds conceptual overhead that doesn't buy me anything.
The reader has a TryGetValueAsInt32 method. Conceptually why should that behave any differently from e.g.
int.TryParse(ValueSpan)? I'm asking it to try to parseValueas an Int32.
From @KrzysztofCwalina:
The excerpt from design guidelines should be interpreted as follows: normally APIs should throw exceptions when the API errors. Such APIs should throw a different exception type for each error that can be handled independently and the same exception for all errors that are handled in the same way. TryXxx pattern allows to replace one and only one of these independently handled error conditions with false return value, i.e. do not replace what otherwise would be more than one exception type with [single] false return.
From @stephentoub:
Right. I think our disagreement here stems from what does it actually mean for there to be "different" errors. From my perspective, we've already decided what that categorization means, with APIs like int.TryParse, BinaryPrimitives.TryRead, etc., returning true if the data could be parsed as the target type and otherwise false. I'm not sure why we'd make a different categorization here.
From @bartonjs:
To me the difference is the reader state. My model is the state of "I think I'm looking at a property" means calling GetValueAsInt32() would be
InvalidOperationException, and "this is too big" would beFormatException(let's pretend for a moment that int.Parse unified OverflowException and FormatException).
My feeling is that the caller has a program error if they're calling TryGetValueAsInt32 when the reader is on PropertyName, hence exception.
I'm okay with either way, and really it comes down to how we think someone would use it... which we might not actually know at this juncture.
From @KrzysztofCwalina:
I think our disagreement here stems from what does it actually mean for there to be "different" errors.
Whoever thinks (@bartonjs?) there should be two separate errors should write a code sample showing how these two errors could be handled differently/independently in some real scenario. If such sample cannot be produced, we should treat these two conditions as one error.
From @bartonjs:
```C#
private static double GetValueAsPercentage(this Utf8JsonReader reader, int max)
{
if (reader.TryGetValueAsInt32(out int intVal))
return 100.0 * intVal / max;
if (reader.TryGetValueAsInt64(out long longVal))
return 100.0 * longVal / max;
if (reader.TryGetValueAsDouble(out double doubleVal))
return 100.0 * doubleVal / max;
// Vaporware extension method
if (reader.TryGetValueAsBigInteger(out BigInteger bigVal))
{
// Some BigInteger math goes here.
}
// Remaining cases:
// * The number is too precise (trim it, we're doing division)
// * The number is too close to zero (return +/- Epsilon)
// * The number is unfathomably large (1e999999999999999999999999)
throw new OverflowException("Value doesn't fit");
}
```
This code either needs to start with a "am I a number" (and throw), end with "am I an number?" (and throw something different), or just count on the framework to throw; or it's throwing the wrong exception.
If the expectation is that this code should have checked the TokenType first, then really that means that we don't expect anyone to call TryGetValueAsInt32 without checking that it's a number; so why _wouldn't_ it throw in that state? (At which point they don't need to, so I go back in time, so I don't, et cetera)
But I'm having trouble envisioning code that does the other thing... gets false because it's a string and has a sensible answer for moving on to TryGetValueAsString.
From @ahsonkhan :
Talked with @bartonjs offline and here is the conclusion:
- Rename
TryGetValueAsXtoTryGetXValuesince 'As' implies a quick re-interpretation, while we are actually transforming (and it isn't quick - requires encoding/parsing/etc.). This makes the .NET type conversion more explicit so it is clearer to the caller when we throw on mismatched token type.- Change
TryGetStringValueandTryGetBooleanValueto justGetStringValueandGetBooleanValuewhich throw and return the .NET type since these will never return false (no more out parameter).- Keep the "number"-based APIs as TryGetXValue returning bool, but these throw InvalidOperationException for mismatched token type.
- To support round-tripping (in the sense that someone can create a JSON payload from reading all the tokens of an existing valid JSON using these APIs and that JSON payload should remain valid. We want to make sure that when the new JSON payload is passed in again to the Utf8JsonReader, it shouldn't throw a JsonReaderException for invalid JSON), if Utf8Parser.TryParse(double) returns true with a
double.PositiveInfinityordouble.NegativeInfinityfor really large numbers (based on IEEE compliance work that @tannergooding is doing), we do not return that value, but instead return false.
In the interest of time, and to unblock this PR, I am going to go with this approach for now.
We could have a follow up discussion and change the semantics based on new API additions (GetXValue version of the TryGetXValue APIs and TryGetBigIntegerValue) during an API review: https://github.com/dotnet/corefx/issues/33320.
From @stephentoub:
but these throw InvalidOperationException for mismatched token type
I don't agree with the decision. I've heard the arguments, and I still do not believe this should diverge from an expectation that these methods are any different
Try-wise thanint.TryParseandBinaryPrimitives.TryReadon theValueSpan; I simply do not see a good enough reason to deviate from that mental model, which is the most easily explainable thing. But I'm not going to stand in the way of the PR.
How would adding it reduce the flexibility of this layer?
If there were need to move the JSON reader into System.Private.CoreLib it would require taking BigInteger with it. (I hope we don't need to move it down, but I'm just suggesting that there's a cost to adding dependencies)
Do we really need methods for Decimal parsing on Utf8JsonReader?
Decimal is specialized type just like BigInteger (and many other types one can thing of). We should enable parsing of all specialized types via composition: Get the Span with the value and pass it to the parse method of the specialized type.
Looks good. We should strip the Value suffix from both the Try- as well as the Get- methods -- it doesn't add anything.
We should also expose UInt32, UInt64. We shouldn't remove decimal, as usage seems common enough. BigInteger is a fine one to leave out.
We shouldn't remove decimal, as usage seems common enough.
What kind of data we have looked at? Is decimal common enough with Json?
@jkotas GitHub finds 9k hits: https://github.com/search?q=ReadAsDecimal+jsonreader&type=Code
Looking through the first several pages of https://github.com/search?q=ReadAsDecimal+jsonreader&type=Code, I have not found anybody who actually wanted to read decimal. The common patterns are:
ReadAsDecimal() and immediate cast to float or double. Reading it as float or double directly would be better.ReadAsDecimal() used to skip a value. There is better way to do that.ReadAsDecimal() wrapped by some auto generated tracing library.ReadAsDateTimeOffset has 9k hits as well: https://github.com/search?q=ReadAsDateTimeOffset+jsonreader&type=Code . So if we were to include Decimal based on github hits, we should include ReadAsDateTimeOffset as well...
I have not found anybody who actually wanted to read decimal.
Immo brought up the use of decimal is common when dealing with money:
https://github.com/LykkeCity/Lykke.Service.ExchangeConnector/blob/2129ce9fd00b8dcdb91e1b305720e551c3ce8fbd/src/Lykke.Service.ExchangeConnector/Exchanges/Concrete/Kraken/Entities/TradeData.cs#L78
ReadAsDateTimeOffset has 9k hits as well
Almost all of that is coming from TraceJsonReader (first 50+ out of 100 pages) or from tests.
May I pick this up? To me, it appeared that the discussion is not concluded or then the APIs haven't been updated in Description -> "Add the following". Appreciate inputs. Thanks!
As for the expected overflow exception, let's say in case of GetInt32, Utf8JsonReader.TryGetInt32Value method (here) deep down calls Utf8Parser.TryParseInt32D (here). The overflow information detected in this method (here) is not bubbled up from this method.
out bool overflow in these methods to send out that information?Utf8Parser.TryParse (here). But that being public API would require a new private (?) Utf8Parser.TryParseInternal to be called in existing Utf8Parser.TryParse and new Utf8Parser.GetInt32?Appreciate inputs if I am thinking in right direction.
Update: (In addition to previous comment) Utf8Parser.TryParse is part of System.Private.Corelib and thence it cannot have TryParseInternal (the one that mentioned in previous comment) as that too would have to be public to be usable in new method Utf8Parser.GetInt32 in CoreFx.
I am wondering how could the overflow detection logic inside Utf8Parser.TryParseInt32D an private in corelib be leveraged and the decision bubbled out.
You would need to test the result of Utf8Parser.TryParse(...) and if it returns false, throw the exception yourself within the Utf8JsonReader.GetX() APIs. You don't need to bubble up the overflow test explicitly or call internal methods.
You would need to test the result of
Utf8Parser.TryParse(...)and if it returns false, throw the exception yourself within theUtf8JsonReader.GetX()APIs. You don't need to bubble up the overflow test explicitly or call internal methods.
Yes, I actually considered doing that but there are exits (here, here and here) that return false but are not overflow conditions. Essentially there is information loss when false is returned with some conditions being genuine parsing issues while others being overflow conditions. But would it be proper to throw overflow exception regardless?
But would it be proper to throw overflow exception regardless?
In this case, yes. The checks you mentioned would never be hit since Utf8JsonReader.GetInt32() is only ever called on valid JSON (the JsonReader would throw on invalid input), i.e. the JSON number exists and is correctly formatted (i.e. not empty, not just a '-'/'+', etc.).
Thanks @ahsonkhan. I now get it! Will start on this!
Also are we final on the names of the APIs and which all types are to be supported, as mentioned in description? Because I do see some discussions in the thread about dropping Value suffix and also need to have APIs for unsigned types.
@ahsonkhan I don't think the correct names are in the issue... I feel like the consensus was
C#
int GetInt32();
bool TryGetInt32(out int value);
(repeat for Int64, Decimal, Single, and Double)
As well as the simple GetString() and GetBoolean(), of course.
Are there any special steps for generating the string resources for exceptions etc.? I added few new strings to Strings.resx but I am not able to figure out how to have them in SR.cs file. Appreciate inputs.
@WinCPP SR.cs is autogenerated from Strings.resx. If you build the library after editing Strings.resx it should regen the SR file and Intellisense gets happy. (If you care capable of ignoring Intellisense telling you it doesn't exist you can just add it + use it, then rebuild to confirm happy).
I am getting errors when I try to run specific test cases as per the link here. Please help if my local code is missing something.
D:\WinCPP\corefx\src\System.Text.Json\tests>dotnet msbuild /t:RebuildAndTest /p:XunitMethodName=System.Text.Json.Tests.Utf8JsonReaderTests.TryGetInvalidConversion
Microsoft (R) Build Engine version 15.7.179.6572 for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
D:\WinCPP\corefx\src\Directory.Build.props(24,26): error MSB4184: The expression "[MSBuild]::NormalizePath(D:\WinCPP\corefx\src\, Common, src, System, SR.*)" cannot be evaluated. Illegal characters in path. Parameter name: path [D:\WinCPP\corefx\src\System.Text.Json\tests\System.Text.Json.Tests.csproj]
But would it be proper to throw overflow exception regardless?In this case, yes. The checks you mentioned would never be hit since Utf8JsonReader.GetInt32() is only ever called on valid JSON (the JsonReader would throw on invalid input), i.e. the JSON number exists and is correctly formatted (i.e. not empty, not just a '-'/'+', etc.).
@ahsonkhan I encountered a condition which returns false (here) which is hit if the condition here evaluates to false. This can happen when the json text is a number but is not an integer e.g. 10123.834, etc. I hit this while writing test cases for GetInt32 on lines of this and the new implementation threw OverflowException, which was unexpected. Essentially the JsonReader evaluates such non-integers as Numbers but later on the parsing fails while trying to get an integer value when the string doesn't represent one, although it is a number. And a false returned in this case is really not an overflow condition.
Appreciate your inputs.
For JsonDocument I went with "success or FormatException"; because it felt more generally applicable than OverflowException.
https://github.com/dotnet/corefx/pull/34485/files#diff-e5aaecd3e4431bca87f81fa80370ded2R158
I encountered a condition which returns
@WinCPP, do you have a branch with your changes that I can look at?
TryGetInt32 should return false if the number was not an integer (i.e. when Utf8Formatter returned false). This API wouldn't throw an OverflowException.
As @bartonjs mentioned, the non-try GetInt32 API should throw FormatException when Utf8Formatter returned false.
I encountered a condition which returns
@WinCPP, do you have a branch with your changes that I can look at?
@ahsonkhan Please refer to the above WIP PR.
TryGetInt32 should return false if the number was not an integer (i.e. when Utf8Formatter returned false). This API wouldn't throw an OverflowException.
As @bartonjs mentioned, the non-try GetInt32 API should throw FormatException when Utf8Formatter returned false.
I was not expecing TryGetInt32 to throw OverflowException but was just saying that false cannot be mapped solely to OverflowException. So essentially I will change code to throw FormatException instead of OverflowException as @bartonjs suggested. Thanks!
Most helpful comment
Do we really need methods for Decimal parsing on
Utf8JsonReader?Decimal is specialized type just like BigInteger (and many other types one can thing of). We should enable parsing of all specialized types via composition: Get the Span with the value and pass it to the parse method of the specialized type.