Runtime: Should JsonSerializer.ToString output be minimally escaped JSON by default?

Created on 8 Jun 2019  Â·  13Comments  Â·  Source: dotnet/runtime

We have the following JsonSerializer "write/serialize" APIs:
C# public static string ToString(object value, System.Type type, System.Text.Json.JsonSerializerOptions options = null) { throw null; } public static string ToString<TValue>(TValue value, System.Text.Json.JsonSerializerOptions options = null) { throw null; } public static byte[] ToUtf8Bytes(object value, System.Type type, System.Text.Json.JsonSerializerOptions options = null) { throw null; } public static byte[] ToUtf8Bytes<TValue>(TValue value, System.Text.Json.JsonSerializerOptions options = null) { throw null; } public static System.Threading.Tasks.Task WriteAsync(System.IO.Stream utf8Json, object value, System.Type type, System.Text.Json.JsonSerializerOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; } public static System.Threading.Tasks.Task WriteAsync<TValue>(System.IO.Stream utf8Json, TValue value, System.Text.Json.JsonSerializerOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }

All these APIs (along with the Utf8JsonWriter), currently follow the escaping behavior of JavascriptEncoder.Default:
https://github.com/dotnet/corefx/blob/c63e5ebecc5449a360b6293ecdbf151bfd11a1a6/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs#L52-L79

Should the ToString() overloads escape only what is minimally required by the JSON RFC, by default (maybe with a few exceptions, like HTML characters, +, etc.) rather than all non-ASCII characters?

Note: This would mean the other serialize methods (that write UTF-8 bytes or to a stream) would have different escaping behaviors than the string-returning ones.

For the other write methods (including Utf8JsonWriter APIs), a custom JavascriptEncoder would have to be passed-in as an option to avoid the over-escaping in the non-ascii range. One such encoder can be provided as a static (maybe JavascriptEncoder.Utf8JsonMinimal) to make the opt-in easier.

cc @rynowak, @GrabYourPitchforks, @bartonjs, @BrennanConroy, @joshfree, @stephentoub, @steveharter, @JamesNK

area-System.Text.Json enhancement

Most helpful comment

https://github.com/dotnet/core-setup/issues/7137 was created to make the .deps.json compatible with existing xunit releases.

I suspect this only affects corefx because you're using the unsupported xunit.console.dll runner.

It affects xunit projects started with dotnet test.

DependencyContextJsonReader throws in IsXunitPackageReferenced which then returns false, which means: this isn't an xunit test assembly.
From VsTestRunner.cs:

        static bool IsXunitPackageReferenced(string assemblyFileName)
        {
            var depsFile = assemblyFileName.Replace(".dll", ".deps.json");
            if (!File.Exists(depsFile))
                return false;

            try
            {
                using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(File.ReadAllText(depsFile))))
                {
                    var context = new DependencyContextJsonReader().Read(stream);
                    var xunitLibrary = context.RuntimeLibraries.Where(lib => lib.Name.Equals("xunit") || lib.Name.Equals("xunit.core")).FirstOrDefault();
                    return xunitLibrary != null;
                }
            }
            catch
            {
                return false;
            }
        }

All 13 comments

The specific concern here is about the default escaping behavior. Which will escape:

  • + (used in dates)
  • non-ASCII letter characters such as Cyrillic or Chinese characters

JSON is supposed to be a human-readable format, but if you're not using another alphabet, our JSON serializer will produce non-human-readable output.

We've gotten feedback from users repeatedly about the HTML encoding behavior on ASP.NET Core which is based on similar rules.


Personally, I think the idea that we'd have different escaping defaults based on what type of output you're producing (string vs Stream) is really vexing as well. I'm not in favor of this change, but it's one of the ideas we came up up with while discussing how to mitigate the human-readable problem while keeping the same level of security.

I asked @ahsonkhan to at least file the issue for consideration in case others have feedback (or better ideas).

Over the past few weeks I've been following up with some of the other security folks regarding "safe" escaping behavior. The default behaviors were chosen to help protect against _real_ attacks that we actually saw in the wild. For example, the + character is escaped to help avoid situations where an attacker can take advantage of a server that forgets to send the charset=... part of the Content-Type response header. Non-ASCII characters are escaped to avoid situations where an application escapes then round-trips the data through a database that uses best-fit mapping (this could lead to XSS).

You're correct in that these defenses aren't strictly necessary per the JSON spec, but they absolutely and unquestionably do provide defense against application-level bugs. We're not trying to provide defense against every possible bug with these behaviors since that would be a losing proposition, but we have identified some common anti-patterns that have security impact. Forgetting to set the response _charset_ (as mentioned earlier) is one such common anti-pattern. And yes, it's still possible - even in 2019 - to exploit information disclosure vulns in this manner. Ping me offline and I can show you our proofs of concept if you're curious.

One potential workaround here is that for servers like ASP.NET which we know will send the appropriate response headers, they can via their DI system inject a more permissive encoder. That way things like Razor pages will have the output that people expect, minimally (but safely) escaped, where non-ASCII characters and '+' are allowed. But that's because in these systems, we can reason about the entire stack all the way up and down. If there's a standalone call to JsonSerializer or similar API, we don't really know where that output is destined, and we can't make the same assumptions.

There's potentially a conversation to be had regarding whether .NET should be attempting to guard against these bugs. Perhaps we need to shift our attitude and place more responsibility on the developer to do the right thing. This particular work item isn't the right forum to have that conversation, as the folks who are truly knowledgeable in this area aren't present here. But I will note that it's an interesting (and depressing!) exercise to go to the web sites of several top U.S. banks, use a network sniffer to look at the application/json responses, and compare the actual response headers against industry recommended practices.

Another interesting nit from some earlier testing I performed: Twitter's JSON API doesn't escape '+' but does escape UTF-16 surrogates and '{' and '}'. Google's JSON API escapes '=' and non-letter best-fit sequences like '‹', '›', '•'. Instagram's JSON API escapes non-ASCII. No high-profile web service performs minimal escaping in practice; it's simply too dangerous given the landscape. Even if we do relax our escaping rules it still won't be "minimal".

+ (used in dates)

We've hit an issue on the source-build .NET Core 3.0 preview 6 sdk. The generated .deps.json file now contains escaped + in the sha265 properties. The xunit json parser had an issue parsing escape sequences (https://github.com/xunit/xunit/issues/1980) which causes the xunit tests to no longer work (https://github.com/microsoft/vstest/issues/2079).

So, users will either need to upgrade xunit to a version that has the fix, or .NET Core should workaround the issue by generating a .deps.json file that doesn't escape + characters.

CC @bradwilson @onovotny @dagood @omajid

I suspect this only affects corefx because you're using the unsupported xunit.console.dll runner. Nothing else in 2.4.1 should be trying to parse the .deps.json file as this is holdover code from dotnet xunit days.

I should amend that to say nothing in the _core framework_ should be parsing the .deps.json file. It's likely the VSTest adapter is still doing some parsing of that file. @onovotny

https://github.com/dotnet/core-setup/issues/7137 was created to make the .deps.json compatible with existing xunit releases.

I suspect this only affects corefx because you're using the unsupported xunit.console.dll runner.

It affects xunit projects started with dotnet test.

DependencyContextJsonReader throws in IsXunitPackageReferenced which then returns false, which means: this isn't an xunit test assembly.
From VsTestRunner.cs:

        static bool IsXunitPackageReferenced(string assemblyFileName)
        {
            var depsFile = assemblyFileName.Replace(".dll", ".deps.json");
            if (!File.Exists(depsFile))
                return false;

            try
            {
                using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(File.ReadAllText(depsFile))))
                {
                    var context = new DependencyContextJsonReader().Read(stream);
                    var xunitLibrary = context.RuntimeLibraries.Where(lib => lib.Name.Equals("xunit") || lib.Name.Equals("xunit.core")).FirstOrDefault();
                    return xunitLibrary != null;
                }
            }
            catch
            {
                return false;
            }
        }

That catch { return false; } is sloppy. Thanks to that, nobody will be able to find the problem if we can't get https://github.com/dotnet/core-setup/issues/7137 fixed. Saying that there are no tests discovered when there was actually a parsing bug in the code is most unfortunate.

The error message hints there is some issue:

Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.

What's also unfortunate is dotnet test returns a zero exit code. So in CI, these tests looks like they are passing.

@steveharter how much of your escaping PR addresses this issue?

@GrabYourPitchforks do you have an update on this?

I am not using JsonSerializer in a web-scenario and it is really unsuitable that I get JSON which is littered with `\u00df" and the like.

I understand the situation with certain escape characters and their possible security implications if left unescaped, but I don't understand why normal letters are escaped. Suppose you are Russian, Chinese, Japanese (I tried only the first of these) and you are serializing "Hello", the garbage you get back makes you quickly and permanently loose this API from your code.

@ahsonkhan this is still in the 3.0 milestone. and 3.0 is done. Would this be a breaking change at this point?

Yes, it would be a breaking change, but I don't see why that would preclude us from taking it in 5.0. This seems like the kind of breaking change we'd allow to go through in a future release if we were so inclined to do this work. It doesn't change the forward-compatibility or backward-compatibility of any generated payloads.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yahorsi picture yahorsi  Â·  3Comments

omajid picture omajid  Â·  3Comments

sahithreddyk picture sahithreddyk  Â·  3Comments

omariom picture omariom  Â·  3Comments

Timovzl picture Timovzl  Â·  3Comments