Runtime: String.Format escaped closing brace treated as part of format string

Created on 8 Feb 2019  路  33Comments  路  Source: dotnet/runtime

I just spent way too long debugging an issue related to this:

https://blogs.msdn.microsoft.com/brada/2003/12/26/string-format-gottach/

It's completely counterintuitive and makes no sense whatsoever to treat these braces differently:
```c#
string.Format("{{{0}}}", 5);
string.Format("{{{0:G}}}", 5);

In the second example a closing brace gets fed into the format specifier. At least there should be consistent behavior here - if the current behavior for the second line is "correct" then the first one should throw an error because `0}` is not a valid token. 

That would be silly of course - format specifiers don't use closing braces so the behavior of the second line should be fixed instead. The current behavior makes it impossible to wrap a value with a format specifier in curly braces when grabbing the format string from a resource file unless a closing brace is passed in as an extra parameter, but for obvious reasons we're not going to do that everywhere we use formatted strings from resource files just in case one of them might need a closing brace. Not to mention, third-party libraries won't do that either.

I know this is technically a "breaking change", but I would be shocked if anyone relied on closing curly braces being fed into their custom format specifiers. I think this case warrants the breaking change. For consistency's sake, this should probably just throw a `FormatException` instead of feeding the opening brace in - curly braces just shouldn't be allowed as part of format specifiers:

```c#
string.Format("{{{0:G{{}", 5);

That said, I'm fairly indifferent about that part...I don't know if consistency is a good enough reason to make another breaking change but it makes more sense to me to just disallow curly braces entirely in the format specifier instead of only allowing one brace.

area-System.Runtime

Most helpful comment

Existing:

``` C#
if (ch == ':')
{
pos++;
int startPos = pos;

while (true)
{
    // If reached end of text then error. (Unexpected end of text)
    if (pos == len) FormatError();
    ch = format[pos];
    pos++;

    // Is character a opening or closing brace?
    if (ch == '}' || ch == '{')
    {
        if (ch == '{')
        {
            // Yes, is next character also a opening brace, then treat as escaped. eg {{
            if (pos < len && format[pos] == '{')
                pos++;
            else
                // Error Argument Holes can not be nested.
                FormatError();
        }
        else
        {
            // Yes, is next character also a closing brace, then treat as escaped. eg }}
            if (pos < len && format[pos] == '}')
                pos++;
            else
            {
                // No, then treat it as the closing brace of an Arg Hole.
                pos--;
                break;
            }
        }

        // Reaching here means the brace has been escaped
        // so we need to build up the format string in segments
        if (unescapedItemFormat == null)
        {
            unescapedItemFormat = new StringBuilder();
        }
        unescapedItemFormat.Append(format, startPos, pos - startPos - 1);
        startPos = pos;
    }
}

}

Updated:

``` C#
if (ch == ':')
{
    pos++;

    while (true)
    {
        // If reached end of text then error. (Unexpected end of text)
        if (pos == len) FormatError();
        ch = format[pos];
        pos++;

        if (ch == '}')
        {
            // Argument hole is closed
            pos--;
            break;
        }
        else if (ch == '{')
        {
            // We don't support curly brackets in format specifier strings
            FormatError();
        }
    }
}

All 33 comments

@JeremyKuhne was looking at this code.

I'm personally fine with killing the ability to use curly braces in format specifier strings. It would simplify the code and it does seem like it is too terribly useful to have curly brackets in your specifiers. I'd change the code to fail when hitting { when parsing the specifier and assume that a } finishes the argument hole. @vancem What do you think?

@vancem What do you think?

I don't actually understand the rules for parsing a format specifier. Today, how do we decide when it ends? We seem to have rather 'ad hoc' logic. Certainly if you were to ask someone what the rule would be certainly they would say that the } end it (which is what everyone is suggesting here).

From a design perspective, it is clearly the right thing to do as it is simpler to code and it is what people expect it really has no down sides (reserving } as a terminator is more than fine.

So the only negative is compatibility risk. Here things are trickier because while I agree that specifiers with } are not interesting, you can't totally rule out weird scenarios where we rejected the formatter (thus printed something lame, yet that is what was 'captured' in a 'baseline' and any change (even if it is obvious that it is more 'fixed' then before), can lead to a break.

So you can argue about the compat risk. However I would also argue that it is exactly these kinds of 'breaks' that we should be allowing. The bar is not 'does ANYONE break', but rather that breaks are rare, and when those rare cases are investigated, reasonable people would agree that the new behavior was worth the break (it is 'better'). I think this hits that bar, and .NET Cores side-by-side capabilities allow us to absorb breaks like that (and thus allow the platform to 'fix' past mistakes).

So I am on board as well with making the change.

I know that you are think to disallow both { and }. While it is not a big deal I would argue AGAINST that (just make } special). The rationale here is that you are disallowing { for not STRONG reason (it is esthetics), and in fact you are making the code MORE COMPLEX (you have to have a if that is ONLY there because you want to make { illegal. It also makes it that much more breaking (it may be OK to break for a GOOD reason, but it is hard to argue that esthetics is one of them). If there were version 1 of this code outlawing { may make sense because you are 'reserving' that character for future use, but that is really not applicable here (if people used it it is already out there (and disallowing it at this point is a DEFINATLY a bad idea (since we got no value but we did cause breaks)).

So I would basically add the requirement that } not be in format specifiers, and make the logic what we all expected it was in the first place.

@vancem In terms of what it does now for deciding when the format specifier ends - it ends on }, but it treats }} as an escaped brace so it keeps going until it finds a single brace.

@vancem Hmm, allowing opening curly braces creates another problem that is going to need special logic anyway so I'm leaning towards not allowing either. If you decided to allow opening curly braces then how would you treat this:

c# string.Format("{{{0:G{{}", 5); // format specifier="G{" string.Format("{{{0:G{}", 5); // format specifier=???

You need logic to throw a FormatException for the second line anway, and I imagine it's easier to just check that the next brace is an closing brace and doing away with the escape logic as opposed to keeping escape logic and still making sure you don't have an unescaped brace in there.

Ahh, thanks for that insight, I had not really understood this wierd corner case.

I don't think it changes my recommendation above. Users only expect } to be escaped 'outside' the format specifier.

@vancem Yeah that's the crux of the issue. Should { be escapable inside the format specifier though? Seems just as odd to me. Like you said, if this was v1 then I think it would be a no brainer to just make both reserved characters in format specifiers, but now it's a more difficult choice.

Should { be escapable inside the format specifier though?
I am arguing above that

  1. Don't break things unless you can articulate value. It have not seen articulation for changing the behavior for {.
  2. Don't make code more complicated without articulated value. Thus adding special logic to make { illegal in a format specifier has added new code, so it needs value to justify it. I have not seen that yet.

If you were to tell me that the new behavior for { is the result of SIMPLIFYING (e.g. removing) the code. I would be more inclined to say yes, but (1) above still holds, and so unless the simplification helps fix bugs etc, I don't think we have satisfied requirement (1).

So my advice is to leave it...

  1. It have not seen articulation for changing the behavior for {.

It will simplify the existing code- we'll never have to consider escaping in the format specifier which will avoid having to keep an additional buffer for the format specifier. We already check for the {. I'll add the code diff in a sec.

Existing:

``` C#
if (ch == ':')
{
pos++;
int startPos = pos;

while (true)
{
    // If reached end of text then error. (Unexpected end of text)
    if (pos == len) FormatError();
    ch = format[pos];
    pos++;

    // Is character a opening or closing brace?
    if (ch == '}' || ch == '{')
    {
        if (ch == '{')
        {
            // Yes, is next character also a opening brace, then treat as escaped. eg {{
            if (pos < len && format[pos] == '{')
                pos++;
            else
                // Error Argument Holes can not be nested.
                FormatError();
        }
        else
        {
            // Yes, is next character also a closing brace, then treat as escaped. eg }}
            if (pos < len && format[pos] == '}')
                pos++;
            else
            {
                // No, then treat it as the closing brace of an Arg Hole.
                pos--;
                break;
            }
        }

        // Reaching here means the brace has been escaped
        // so we need to build up the format string in segments
        if (unescapedItemFormat == null)
        {
            unescapedItemFormat = new StringBuilder();
        }
        unescapedItemFormat.Append(format, startPos, pos - startPos - 1);
        startPos = pos;
    }
}

}

Updated:

``` C#
if (ch == ':')
{
    pos++;

    while (true)
    {
        // If reached end of text then error. (Unexpected end of text)
        if (pos == len) FormatError();
        ch = format[pos];
        pos++;

        if (ch == '}')
        {
            // Argument hole is closed
            pos--;
            break;
        }
        else if (ch == '{')
        {
            // We don't support curly brackets in format specifier strings
            FormatError();
        }
    }
}

This code is performance sensitive too, so simplification may have a perf benefit.

I want to get this change in for 3.0 as it is technically breaking. I think we've satisfied @vancem feedback for the bar on this. @mikernet, do you have an interest in creating a PR for this?

This simplification will make my non allocating proposal easier to pull off as this logic is hard to replicate. https://github.com/dotnet/corefxlab/pull/2595

My two cents is to leave this code behavior alone -- the string formatting has been this way since ~2001 with .NET Framework 1.0 beta 2 - the benefit of tweaking this for .NET Core-only doesn't seem high to me.

My two cents is to leave this code behavior alone -- the string formatting has been this way since ~2001 with .NET Framework 1.0 beta 2 - the benefit of tweaking this for .NET Core-only doesn't seem high to me.

I understand that it has been there forever and I don't take this lightly. I'm very much for this given that it:

  1. Fixes a real-world usage problem
  2. Simplifies a perf-critical code path
  3. Facilitates other formatting work in progress
  4. Is a dot zero release
  5. Is expected to break close to, if not, zero customers

@JeremyKuhne Yeah I can do a PR on this if you can leave it with me for a couple days.

@JeremyKuhne - while I am not against the code you put above (it is simple and straightforward enough), what bad thing happens if we simply drop the code for '{'

        else if (ch == '{')
        {
            // We don't support curly brackets in format specifier strings
            FormatError();
        }

It is arguably simpler (less lines of code), it is every-so-slightly more compatible (we don't fail in cases with {), and achieves the original goal.

Are their secondary effects that I am overlooking?

For what it is worth, I don't really have a problem with your suggested code. Mostly I am just wondering...

Are their secondary effects that I am overlooking?

We used to fail with single { so I'm not sure it would be any better to have them work now. It certainly would be easier to document that brackets aren't supported at all as opposed to left brackets are and they aren't escaped any more. :)

@vancem and I spoke at length today about this and we're both on the same page on this- we want to fix the issue here. @mikernet I've assigned you as an owner, if you have any issues getting to this please let me know. :)

The tests I added into StringBuilderTests.cs are building and running properly but changes made to StringBuilder.cs don't seem to be building or having any effect on the tests. Even if I just write a bunch of nonsense in StringBuilder.cs to try to get it to fail a build, nothing happens...I can still build from corefx root or from System.Runtime just fine.

Any suggestions?

What's the actual solution I should be opening when editing StringBuilder.cs?

What's the actual solution I should be opening when editing StringBuilder.cs?

You need to make the change in CoreCLR as that is where the code is actually compiled into. It is a little contorted for code that lives in System.Private.Corelib.dll. I'll dig up the links for you in a minute.

Here's the link on how to use your local build of CoreCLR: https://github.com/dotnet/corefx/blob/master/Documentation/building/advanced-inner-loop-testing.md

Generally what we do is create a PR in both repos- code in CoreCLR and tests in CoreFX. The test PR is marked as blocked on the CoreCLR change being merged.

We keep the tests aligned with the assembly where the type is exposed, not where it is actually implemented to allow other runtimes to share the same tests without requiring "internal" implementation details to be identical.

Since my tests are testing the new behavior where escaped strings don't make it into the custom format specifier, should they be conditionally executed based on which runtime it's running on?

Last question - do I leave StringBuilder.cs that's inside corefx alone then, only change it in coreclr or both?

Anything in the 'shared' directory (which includes StringBuilder.cs) in coreclr is automatically cloned into core-fx at a fast cadence. I believe you should be modifying it in coreclr is the correct procedure, it will be propagated for you.

should they be conditionally executed based on which runtime it's running on?

Yes, and we do that in a variety of ways. If there is new API surface we use a .netcoreapp file where the tests are only compiled for runs on .NET Core. Otherwise we use this attribute to test old (on 4.x) and new behavior:

C# [SkipOnTargetFramework(~TargetFrameworkMonikers.NetFramework)] // <- Only run on .NET Framework (i.e. 4.x) [SkipOnTargetFramework(TargetFrameworkMonikers.NetFramework)] // <- Run on everything but .NET Framework (i.e. .NET Core)

Moving this back to 3.0, as I believe the thought is either we do it in 3.0 or we don't do it, due to behavioral changes incurred.

This code is performance sensitive too, so simplification may have a perf benefit.

What was the perf impact of this change from a corefx perspective?

@stephentoub was this not fixed by https://github.com/dotnet/coreclr/pull/23062 ?

@danmosemsft, yes, it was. Are you asking me because of my moving it to 3.0 back in March? The fix was only merged two weeks ago.

Are you asking me because of my moving it to 3.0 back in March? The fix was only merged two weeks ago.

Ah of course, the PR is linked when created not when merged. Thanks.

Was this page helpful?
0 / 5 - 0 ratings