Currently Json.NET parses in a more lenient manner than RFC7159. For example:
{
"foo": Infinity
}
... will be parsed by Json.NET, but isn't valid under RFC7159.
I'm sure that's generally useful, but it makes it harder to build on Json.NET when strict parsing is required. Other features such as date detection can be disabled already, but I don't _think_ there's currently any way of disabling this leniency.
This is only one example - there may well be others that James is aware of, given that he's rather more deeply "in" this world than I am :) (Comments, possibly the quote characters used, etc.)
What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read(). If QuoteChar is ' then throw an error, if TokenType is a Float and Value is double.Infinity or double.NaN then throw an error, if TokenType is Comment then throw an error, etc.
Thanks, that's definitely a practical workaround. I think this would be generally useful functionality to be in Json.NET itself at some point, but I'll give that a try for now. (I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...)
(I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...
@jskeet I'm adding JSON editor support for Roslyn over here: https://github.com/dotnet/roslyn/pull/24110
I've been using json.net to base how parsing will work (as that's likely the json parser that most of them wil be using in practice). However, i think it would be nice to allow a strict mode as well. There are very obvious ways that json.net deviates from standard JSON. But i'm concerned i may miss other cases. Do you happen to have the list that identifies all the ways Json.Net is more lenient? That would be very helpful for ensuring i don't miss anything.
Off the top of my head:
Probably the most commonly used "invalid" JSON would be properties with single quotes or no quotes, single and multi-line comments, new lines in strings, and trailing commas.
@JamesNK Thanks for the list. Fortunately, reading the actual json spec was simple enough that i was able to find all that as well. Other things i can think of are leniency around numbers and whitespace. Also, there are stricter rules around what is allowed in a string and what needs to be escaped.
Interestingly enough, i haven't found an actual .net json parser that tries to match the spec precisely. I've found deviations in several (including MS's own DataContractJsonSerializer and JavaScriptSerializer).
@JamesNK
Note: i'm a big fan of json.net and how it's addressed this space (including accepting a superset of json). Do you think there's a place in json.net to have an option for more strict processing of json? If so, i'd be happy to contribute. But i don't know if such an option fits your vision for the project.
Thanks!
Hexadecimal numbers (I think...)
Yup. Not allowed in json. Octal isn't allowed either. Leading zeros are also verboten. Fun stuff.
Do you think there's a place in json.net to have an option for more strict processing of json?
I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.
+1 This is needed. The reason case on my end is that I use a "Repository" for json files that the public can upload and download. PHP uses the proper RFC implementation so if the user makes a JSON that works fine with the C# tester application, they think there wont be any issues, but in fact, there will be when its trying to be parsed and used on my PHP upload script.
So a strict mode here, would be insanely useful to tell the user on my C# end that its not proper syntax, and they can google or lint check to figure out there issue. Otherwise they just say my PHP script is broken plz fix m8 bla bla annoying shit when in fact, its there JSON thats broken.
I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options.
I think there's a different dimension of change between "valid by the standard" and "valid by the standard except for in the following cases…" that makes this request different to a request like that.
What I could see happening though is a request for validity according to some RFC 8259bis that replaces RFC 8259 (indeed, the relevant RFC was 7159 when this issue was opened, though I think the only change in validity was that UTF-8 is required for interchange which is at a different level than parsing anyway) and/or a later version of ECMA 404 / ISO/IEC 21778 (presumably the three standards will continue to track each other, but the timings of versions could differ). It might be wise future-proofing to define validity through an enum that could have later not-fully-compatible standard versions added to it, but not to offer such validation for anything other than a published standard.
But a validating reader could also be built as a separate package.
Just add a bool flag, honestly if people ask for more options, we can discuss if it will be required or not.
You could add a flag enum to represent the allowable deviations. If you make it int based it would support up to 32 different options but 64 different options if you make it long based.
Personally, i'd have no problem just adding a strict mode without any flexibility. If you want any form of looseness, you don't ask for strict. If you want strict, you get it. But i'm not the project owner :)
@cyrusnajmabadi Manatee.Json only supports strict mode (I believe).
I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.
You don't need to add lots of fields just because someone is asking. ;)
Introducing a option for enabling strict mode would help many users. It makes sense, as it allows users to read and validate RFC-conform Json data without the hassle of building and maintaining unwieldy JsonTextReader.Read() overrides.
For anyone else wanting an individual blend of a little strictness here and a little looseness there, just refer them to your comment from Sept. 2015:
What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read().
What about deserializing and then serializing and comparing to original string? This validates that properties are surrounded by ".." for example. Useful to validate if json is valid if it is sent to client side to be parsed by javascript.
JsonConvert.SerializeObject(JsonConvert.DeserializeObject(str)) == str
@guylando: That would be a very inefficient way of validating, IMO.
For what it's worth, I am running into exactly this issue. I need to determine whether a document is valid JSON. A strict mode would be perfect for that.
The whole purpose of a spec is to ensure interoperability. If different implementations deviate from the spec, they are no longer interoperable. In my case interoperability is paramount.
I think DateParseHandling.None should also be a part of strict mode. Strict mode would make the road from Json.Net to System.Text.Json less bumpy. Edit: STJ follow spec and restrict date format to only support ISO 8601-1:2019 (i think), in case they are more strict than spec. These two will interop a lot as SJT become more popular, it make sense that Json.Net start getting closer to STJ (opt-in).
Most helpful comment
You don't need to add lots of fields just because someone is asking. ;)
Introducing a option for enabling strict mode would help many users. It makes sense, as it allows users to read and validate RFC-conform Json data without the hassle of building and maintaining unwieldy JsonTextReader.Read() overrides.
For anyone else wanting an individual blend of a little strictness here and a little looseness there, just refer them to your comment from Sept. 2015: