Roslyn: String interpolation can be optimized

Created on 9 Oct 2017  路  20Comments  路  Source: dotnet/roslyn

String interpolation has proven to be a very successful feature. Some people even replace efficient concatenation by interpolation, assuming the new syntax is just as efficient. However, that is not the case.

The LDM of May 21, 2014 noted:

The compiler would be free to not call String.Format, if it knows how to do things more optimally. This would typically be the case when there are no format specifiers in the string.

So far, nothing has been done to optimize interpolated strings in this way.

I believe there are five scenarios:

  1. The interpolated string has no fill-ins. This case is handled already.
  2. None of the fill-ins have alignment or format specifiers, and they are all strings.
  3. None of the fill-ins have alignment or format specifiers, and they are all of reference type, some of which are not string (and may or may not implement IFormattable).
  4. None of the fill-ins have alignment or format specifiers, and some of them are value types.
  5. Some of the fill-ins have alignment or format specifiers.

Case 2 is surprisingly common. Last time I counted, Roslyn.sln had 1355 cases. It is also the easiest one to optimize. Basically, the call to string.Format can be replaced with string concatenation. That string concatenation is itself then lowered, often to a call to string.Concat. That lowering will perform constant folding, so in some cases no call is needed at all. Performance differences can be significant.

I have prepared pull request #22595 to handle this case.

Area-Compilers Feature Request Resolution-Fixed

All 20 comments

Alignment can optimised too, if expression's type is string.
```c#
$"{HelloWorldString,-20}"
/* can transformed into */
HelloWorldString.PadLeft(20," "c)

```c#
 $"{HelloWorldString,20}"
 /* can transformed into */
HelloWorldString.PadRight(20," "c) 

@AdamSpeight2008 I know, but that will significantly increase the size of the generated code. Also, in my experience, this happens far less frequently in real world code.

More importantly, I think that is a different case. I wanted to tackle this case first. Alignment can be handled later if deemed useful.

This sounds like a variation of all the other (closed) issues, like https://github.com/dotnet/roslyn/issues/9212 (see the web of linked issues as well)

@khellang It is a variation on #9212, but the approach taken is fundamentally different. #9212 was about replacing the lowering to string.Format completely. This is about leaving the existing lowering for all but one specific (but very common) situation, that can be easily optimized. Also, this suggestion does not require the presence of any method, except for string.Concat which is already required by the compiler anyway.

So https://github.com/dotnet/roslyn/issues/6669? Hence why I said "see the web of linked issues as well" 馃槈

The key difference between this suggestion and similar suggestions in the past (including #6669), is that this modifies only the case where all fill-ins are strings. Those do not have the complexity of culture-sensitive formatting.

My only real concern with such an optimization is that it encodes assumptions regarding the implementation of String.Format into the compiler, namely that string.Format("foo{0}baz", "bar") will always be functionally equivalent to string.Concat("foo", "bar", "baz"). Who's to say that some future version of the framework won't make System.String implement IFormattable and offer support for formatting based on the current culture just to satisfy the needs of some Fortune 100 with a penchant for Pig Latin?

Beyond that I think it's a good idea, though.

Or would this also fall under the post-compilation optimization step that has been suggested for inlining LINQ queries?

@HaloFour

My only real concern with such an optimization is that it encodes assumptions regarding the implementation of String.Format into the compiler, namely that string.Format("foo{0}baz", "bar") will always be functionally equivalent to string.Concat("foo", "bar", "baz").

To be precise, the way I implemented my PR, it encodes the assumption that string.Format("foo{0}baz", "bar") will always be functionally equivalent to "foo" + "bar" + "baz".

The assumption that "foo" + s + "baz" (where s is a non-constant string) is equivalent to string.Concat("foo", s, "baz") was already encoded in the compiler, as well as the assumptions that "foo" + ("bar" + "baz") is equivalent to "foo" + "bar" + "baz" and that "foo" + "bar" + "baz" is equivalent to "foobarbaz". The same is true for the assumption that "ba" + 'r' is equivalent to "ba" + "r", and others.

More broadly speaking, every constant folding optimization depends on the assumption that a calculation done at compile time is equivalent to doing the same (or an equivalent) calculation at runtime. That also applies to the assumption that 1 + 1 is equivalent to 2.

Who's to say that some future version of the framework won't [...] offer support for formatting based on the current culture just to satisfy the needs of some Fortune 100 with a penchant for Pig Latin?

If any of the above assumptions are broken, it would be a massive breaking change, impacting virtually all programs running on .NET. That is very unlikely to happen, to say the least. And when it does, the compiler is in trouble anyway.

Beyond that I think it's a good idea, though.

I agree :-)

Or would this also fall under the post-compilation optimization step that has been suggested for inlining LINQ queries?

No. The way I implemented this in #22595 is in the lowering of interpolated strings, similarly to how the optimizations of string concatenation are implemented.

I took a deep dive into the question whether this optimization is correct a couple of years ago. It looks to me like the specification for string formatting allows for the possibility of a custom culture that provides its own custom formatting for a string value when used as an interpolation. Based on that I concluded that it would not be correct to do this optimization. See https://github.com/dotnet/roslyn/issues/9212#issuecomment-189170257

However. I tried constructing such a custom culture and found that the implementation never would call it for a value of type string. Technically, this is a bug in the implementation of string formatting, but it is probably something that will never change. If anything, I would expect the spec (or my understanding of it) to be changed. Therefore the optimization is probably OK.

This was previously considered and rejected in #9212, #6669, #4678, #194, and #76. But now @KrisVandermotten has used up his accumulated Karma and forced us to reconsider. Karma reset to zero. :wink:

@gafter

Might that open the door to potentially using interpolation for string constants as long as the arguments are also string constants?

@HaloFour Seems plausible but possibly complex (the formatting would have to be specified in the language spec, and the constant is still convertible to FormattableString with the fill-ins separately preserved). I suggest opening a new issue for it. I'm not sure what would motivate us to want to do that (over other things we could do with our time).

@gafter

https://github.com/dotnet/csharplang/issues/384

There are a couple of issues on this repo too although they seem to focus on the performance aspect rather than possibly opening up interpolation up to other scenarios, such as use with attributes. Granted, interpolation in this limited form isn't appreciably different from just using + to concat. The argument has always revolved around the aspect of interpolation being an expression that must be evaluated at runtime in the context of the current culture.

@gafter

I'm not sure what would motivate us to want to do that (over other things we could do with our time).

I wish you could throw these "nice to have" features to X.X milestone to get feedback and prioritize, at least it shows that the proposal is accepted in principle. It seems like the discussions are more constructive and helpful if the issue has the word "champion" in the title. The fact that there's an official thread for a potential language feature helps to keep the discussion in one place instead of multiple issues scattered all over the repo.

@gafter

However. I tried constructing such a custom culture and found that the implementation never would call it for a value of type string. Technically, this is a bug in the implementation of string formatting, but it is probably something that will never change.

Indeed. The way I understand it is that the ToString methods on String simply don't allow for culture dependent formatting:

    public override String ToString() {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();
        return this;
    }

    public String ToString(IFormatProvider provider) {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();
        return this;
    }

That is what makes this optimization safe. And honestly, I cannot imagine this to ever change.

Furthermore, as I said before, an optimization with a similar assumption already exists in string concatenation today: the assumption that s + 'a' == s + "a".

See LocalRewriter_StringConcat.cs line 370:

    // Is the expression a literal char?  If so, we can 
    // simply make it a literal string instead and avoid any  
    // allocations for converting the char to a string at run time. 
    if (operand.Kind == BoundKind.Literal) 
    { 
        ConstantValue cv = ((BoundLiteral)operand).ConstantValue; 
        if (cv != null && cv.SpecialType == SpecialType.System_Char) 
        { 
            return _factory.StringLiteral(cv.CharValue.ToString()); 
        } 
    }

A similar reasoning applies: for both char and string, the result of .ToString() at compile time or at runtime is the same.

That is what makes this optimization safe. And honestly, I cannot imagine this to ever change.

Perhaps if we can get the Core people to guarantee that (maybe with a doc change), then this would be more palatable? It seems like something they'd be willing to say and doc.

@CyrusNajmabadi

Not sure what kind of _guarantee_ you're looking for. The behavior is _documented_:

Returns this instance of String; no actual conversion is performed.

I would consider that a good guarantee :)

@KrisVandermotten Should this issue remain open for further optimizations? Or can it be closed (fixed by #26612)?

@jcouv Further optimizations are certainly not impossible, but what was described here has been done. New work, if any, can be done through new issues and/or pull requests.

Thanks again.
The fix will be in 15.8 preview 3.

Was this page helpful?
0 / 5 - 0 ratings