Roslyn: Code generation involving some literals does not work correctly

Created on 2 May 2019 · 58Comments · Source: dotnet/roslyn

For instance, I want to generate something like:

```C#
class A
{
public A() : this(2.0d)
{
}

public A(double value)
{
}

}

What we do with ``value`` is really beside the point. The point is the ``double`` literal ``d`` is dropped from the ``Literal`` specification. As is the ``f`` if I wanted to specify a ``float`` literal.

There are ``double`` and ``float`` versions of ``Microsoft.CodeAnalysis.CSharp.SyntaxFactory.Literal``, right? i.e.

```C#
public static SyntaxToken Literal(float value);
public static SyntaxToken Literal(double value);

By contrast, the long version of the same does respect the literal specification, i.e. 2L,

C# public static SyntaxToken Literal(long value);

Area-Compilers

Source

mwpowellhtx

All 58 comments

can you supply a fully repro of your code?

CyrusNajmabadi on 2 May 2019

I can't repro this. The behavior seems to be working as intended. If a literal has the right type by default, then no suffix is added. i.e. if you generate a double or a normal int, then no suffix is added. If you want a literal whose natural type doesn't match the type you passed in, then a suffix is added. i.e. Floats will have F added (since the natural type would be double) and longs will have L added (since the natural type will be int).

If you want to control exactly waht text is emitted, you use the Literal(string text, PrimitiveType value) overload for you need. In this case, you'd use the Literal(string text, double d) overload, passing in exactly what you want the final text to be in the output.

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi No, but I can show you the generated code:

```C#
namespace Kingdom.OrTools.Sat.Parameters
{
public class RandomPolarityRatioParameter : Parameter
{
public RandomPolarityRatioParameter(): this(0)
// ^ original value parsed as an IVariant (abstraction my own, fully vetted)
{
}

    public RandomPolarityRatioParameter(double value): base(value, 45L)
    //                                                             ^^^ generated based on a long value
    {
    }
}

}

Based on the [Protocol Buffers v2](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec):

optional double random_polarity_ratio = 45 [default = 0.0];
// ^^^^^^^^^^^^^ informs the Literal(double) invocation
```

mwpowellhtx on 2 May 2019

The generated code isn't useful to me here :) I need to see waht you're actually doing to generate it :D We can take this to gitter if you want.

However, based on what you've said, you can address this the appropriate way by calling Literal(value.ToString("R", CultureInfo.InvariantCulture) + "D", value)

CyrusNajmabadi on 2 May 2019

👎1

Literally generated by:

C# ... LiteralExpression(NumericLiteralExpression, Literal(x.Value)) ...

Where x.Value is double from IVariant<double>.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi I disagree, it is useful. It shows you that the result is not generating given the expected double literal outcome.

So, to be clear, this is a stop-gap workaround:

Literal(value.ToString("R", CultureInfo.InvariantCulture) + "d", (double) value)

mwpowellhtx on 2 May 2019

The behavior looks correct to me. There are two things you can do here. The first, as mentioned above, is to call the appropriate overload of Literal that allows you to specify exactly what you want for the text.

The second is to use SyntaxGenerator which is the nicer user-facing (i.e. "less-raw") approach to syntax generation. It will always place a 'D' after a double for example.

CyrusNajmabadi on 2 May 2019

👎1

I read that to mean there is a potential refactor from SyntaxGenerator to the "raw" bits that should potentially happen there. I'll have a look at SyntaxGenerator in the mean time, thank you.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi I disagree, it is useful. It shows you that the result is not generating given the expected double literal outcome.

It looks like it generated exactly the right thing. 0 is the representation for the double-value 0 to .net.

As i mentioned above, the two expected ways to handle this are to either explicitly provide the string in the format you want (since there are tons of ways it could be formatted). Or to use SyntaxGenerator, which attempts to be less strict and makes these sorts of decisions on your behalf.

You're using the raw API which doesn't interpret (which i mentioned yesterday). When you use the raw API, you get almost no special handling on your behalf. It gives you a huge amount of control, but you're taking a lot of the effort into your own hands. If you want that effort reduced, you can use the existing APIs that attempt to do that (however, you may lose some control you want at some points).

However, SyntaxFactory is not going to become SyntaxGenerator. Its purpose is to be the low-level raw api that everyone else can build on top of.

CyrusNajmabadi on 2 May 2019

👎1

The only thing with SyntaxGenerator is that it requires, Microsoft.CodeAnalysis.CSharp.Workspace, which is a bit heavier of a lift than we are aiming for for code generation purposes. Probably a non-starter to go that route. The right answer is to fix the Literal(double), or Literal(float) to do the right thing.

mwpowellhtx on 2 May 2019

The only thing with SyntaxGenerator is that it requires, Microsoft.CodeAnalysis.CSharp.Workspace, which is a bit heavier of a lift than we are aiming for for code generation purposes.

Why? It's just an added dll...

Probably a non-starter to go that route.

In that case, you will have to use the appropriate overloads where you specify the text you want.

The right answer is to fix the Literal(double), or Literal(float) to do the right thing.

They are doing the right thing. Literal(float) should already be adding F (like Literal(long) adds L), and Literal(double) should not be adding anything (just like Literal(int)).

CyrusNajmabadi on 2 May 2019

👎1

You're using the raw API which doesn't interpret.

This is an absurd statement. It clearly does interpret, i.e. the Literal(long) yields $"{value}L". Therefore I suggest Literal(double) should yield $"{value}D". Similarly Literal(float) should yield $"{value}F" (or equivalent).

Why? It's just an added dll...

The answer there is "no". That's not going to work for us.

mwpowellhtx on 2 May 2019

We'll go with the workaround for now.

mwpowellhtx on 2 May 2019

The answer there is "no". That's not going to work for us.

I literally (no pun intended) don't understand. I've mentioned how this will work on gitter if you'd like to discuss it. I think you'd just need to add hte reference (almost no cost there) and then just call generator.LiteralExpression(yourValue). Can you explain why this is just flat out "no". That's not going to work for us.?

CyrusNajmabadi on 2 May 2019

Similarly Literal(float) should yield $"{value}F"

It already does.

CyrusNajmabadi on 2 May 2019

Note: i wrote this up in about 20 seconds :)

```c#
var workspace = new AdhocWorkspace();
var proj = workspace.AddProject("temp", LanguageNames.CSharp);
var doc = workspace.AddDocument(proj.Id, "temp.cs", SourceText.From(""));

    var generator = SyntaxGenerator.GetGenerator(doc);
    double d = 0;
    var literal = generator.LiteralExpression(d);
    Console.WriteLine(literal.ToFullString());

```

As you want, it produces the value with the D:

CyrusNajmabadi on 2 May 2019

👎1

Note: using the above, you can also easily access the real formatting engine that powers VS, sidestepping the issues you were running into with NormalizeWhitespace. You'd just use the Formatter.FormatAsync method and you'd pass in your document as appropriate.

CyrusNajmabadi on 2 May 2019

This is the best workaround I could find:

```C#
IdentifierName($"{x.Value:R}d")

When I do this, however:

```C#
var z = LiteralExpression(NumericLiteralExpression, Literal(1d));
// then evaluate z.GetText() in debugger, etc: {1}
//                                             ^^^ result from debugger Watch Value

When this is evaluated in the debugger Watch Value:

```C#
IdentifierName($"{1d}d").GetText()
// ^^^^^^^^^ evaluates to {1d} as expected

@CyrusNajmabadi Which version(s) of ``Microsoft.CodeAnalysis.CSharp`` are you on?

Just to be clear, the result from a ``long`` evaluation:

```C#
var z = LiteralExpression(NumericLiteralExpression, Literal(1L));
// z.GetText() evaluates to {1L} as expected.

Even the ulong version of Literal(ulong) is digested correctly:

C# var z = LiteralExpression(NumericLiteralExpression, Literal(123ul)); // z.GetText() evaluates to {123UL} as expected.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi I told you once already, Workspace is too heavy of a lift for the code generation factoring that we are doing. Focus on the issue at hand would you please? We continually have this communication problem, you not listening to the issue for what it is.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi I told you once already, Workspace is too heavy of a lift for the code generation factoring that we are doing.

Happy to take this to gitter to discuss more.

CyrusNajmabadi on 2 May 2019

var z = LiteralExpression(NumericLiteralExpression, Literal(1d));

As i mentioned, you'll need to use the Literal overload that takes in the text to provide.

CyrusNajmabadi on 2 May 2019

This is the best workaround I could find:

IdentifierName($"{x.Value:R}d")

You don't need to use IdentifierName. Literal is fine as it already supports you providing exactly the text you want that evaluates to the value stored in the token. I showed how you can do this here: https://github.com/dotnet/roslyn/issues/35446#issuecomment-488766325

'IdentifierName' isn't really appropraite because anything later that is trying to figure out what is going on will think this is an identifier, and not an actual literal.

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi You still have not answered which version(s) of Microsoft.CodeAnalysis.CSharp you are using?

mwpowellhtx on 2 May 2019

For StyleCop Analyzers, we use a helper method WithLiteralSuffix.

Usage: https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/5456aef470eec0d7f3519ead4a04399f745b967d/StyleCop.Analyzers/StyleCop.Analyzers.CodeFixes/ReadabilityRules/SA1139CodeFixProvider.cs#L69

Definition: https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/57821a1c4d014e8d80b3e236ff444e0e30d8a112/StyleCop.Analyzers/StyleCop.Analyzers/Helpers/LiteralExpressionHelpers.cs#L50-L87

sharwell on 2 May 2019

@CyrusNajmabadi You still have not answered which version(s) of Microsoft.CodeAnalysis.CSharp you are using?

Yes i have. I directly messaged you. It's 3.0.0.0

CyrusNajmabadi on 2 May 2019

For the record, we are on Assembly Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35. Not quite ready to make the leap to 3.x version(s).

mwpowellhtx on 2 May 2019

@sharwell Interesting, thanks, I'll have a look.

mwpowellhtx on 2 May 2019

@mwpowellhtx A small mistake occurred in using the workaround in https://github.com/dotnet/roslyn/issues/35446#issuecomment-488786249. You are on the right track with directly converting the text you want to a SyntaxToken, but ended up with the wrong type of token. To produce a token with the correct token type, you'll want to use SyntaxFactory.ParseToken (detects token type from the text) instead of IdentifierName (forces the token to be a specific type, even if the text does not match). Alternately, I believe you can use SyntaxFactory.Token and assign all the values yourself.

sharwell on 2 May 2019

@sharwell Still in draft, so I'll take it under advisement.

In our framework, we are mapping expected types to ExpressionSyntax using the delegate:

```C#
delegate ExpressionSyntax ConvertVariantToExpression(IVariant variant);

Where, in this case, we are given an ``IVariant<double>``:

```C#
interface IVariant { }
interface IVariant<T> : IVariant { }

This is ultimately rolled up into IEnumerable<SyntaxNodeOrToken> and subsequently .CommaSeparated() (our own extension method) ConstructorInitializer(...) arguments.

This is TMI, really, for the issue at hand, in my opinion, except to expound on the issue a little.

mwpowellhtx on 2 May 2019

@sharwell As I said earlier, Literal(...) does the right thing with long and with ulong, just not with double or even float.

mwpowellhtx on 2 May 2019

I cannot repro your 'float' concern. As shown in gitter and with the code i sent you. 0F is generated. Can you supply a repro (and information about your config) so i can try to repro this?

CyrusNajmabadi on 2 May 2019

For the record, we are on Assembly Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35. Not quite ready to make the leap to 3.x version(s).

Using that version, and this repro code:

c# var literal2 = SyntaxFactory.Literal(0.0f); var literal2String = literal2.ToFullString();

I get this:

CyrusNajmabadi on 2 May 2019

@mwpowellhtx It sounds like the D suffix is omitted for doubles. This suffix is optional whenever there is a decimal point in the literal text. Can you confirm that the text you get for Literal(0.0d) is 0.0, which is one of several allowed forms for this token?

sharwell on 2 May 2019

👍1

@CyrusNajmabadi Already have, using a debugger Watch Value, invoking the GetText() members.

mwpowellhtx on 2 May 2019

@sharwell That, as well as float, yes. You can verify this in the debugger watch window.

mwpowellhtx on 2 May 2019

As I said earlier, Literal(...) does the right thing

Ultimately, "the right thing" is user interpretable. For example, some users may want literals to be hex. Some may want them in binary. Some may want significant digits. Some may want digit separators. Some may want e notation for doubles etc. etc. There are many input forms that can lead to a specific 'double' value at runtime. So the reverse generation picks an appropraite strategy given that there are literally near infinite forms it could produce.

The escape hatch for this, if you want a different output format, is to supply the actual format yourself :)

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi Already have, using a debugger Watch Value, invoking the GetText() members.

I checked that link, it doesn't show any usage of float literals.

CyrusNajmabadi on 2 May 2019

@sharwell That, as well as float, yes. You can verify this in the debugger watch window.

I've tried reproing this. Even in the debugger watch window I see:

That's for this code:

c# var literal2 = SyntaxFactory.Literal(0.0f); var literal2String = literal2.ToFullString();

This is on Roslyn 2.10.0:

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi Okay, fair enough, I'll retract that suspicion of the float use case. The issue seems isolated to double.

C# LiteralExpression(NumericLiteralExpression, Literal(123f)).GetText() // yields: {123F} as expected

mwpowellhtx on 2 May 2019

In that case, double is behaving consistently with 'int'. Both are naturally represented at the language level as not needing a suffix to signify what they are. As such, both avoid the type suffix since, for the most common case for users, this extra information would be redundant and undesirable (same as using binary or hex encoding, or digit separators). The goal of the text-less overloads is to produce a reasonably expected output given no additional user provided signals as to what they want**. If you want to adjust that, the solution is to use the explicit overloads that take in the text so you have total control.

** You'll note that this is how many parts of the entire Roslyn stack work. For example, if you look at a 'double' in the watch window you'll see:

There is no 'D' added, because the most common and natural form for doubles for the vast majority of users is to not suffix them.

CyrusNajmabadi on 2 May 2019

A similar concern exists for things like string-literals. If we emit a string literal, how should a tab be encoded? Should it just be the raw (that's an actual tab). Or should it emit as \t? Should it emit as if an @ string was being used? or a normal string?

Without any knowledge about what the user's end usage will be, and given many possible outputs (possibly infinite) we picked a default that matches the general intuition for most users about how they would write one of these. To emit in a different way, you can just supply the text you want.

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi string is a bit OT here, but I would think it would be the difference between seeing a sequence of characters new[]{'\\', 't'} or new []{'\t'}, depending on whether the @ was used. We do not have any cases for that, but I would still expect the emitter to do the right thing. Never mind conveying string interpolation, right? i.e. $"...".

Understanding the Roslyn goal is to deliver, what was it? Lexer tokens, syntactic nodes, etc? It makes these kind of reductions in key places, thus making life a bit more difficult for the code generation author.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi So in summary, Roslyn makes the nominal assumption where double is concerned, the Xd is almost arbitrary/optional at that point, along similar lines as with int, as you said. That's a fair assessment.

mwpowellhtx on 2 May 2019

We do not have any cases for that, but I would still expect the emitter to do the right thing.

As mentioned: the right thing is in the eye of the beholder. If i have a unicode character, is the right thing to embed the character, or to use \u01234? It's going to depend on who is using the API. There's no clear answer as it's caller specific. Taht's why there are two overloads. One that says: i'll make a general assumption about what is best, and what that says "i will make no assumption, you can tell me exactly how you like it" :)

Understanding the Roslyn goal is to deliver, what was it? Lexer tokens, syntactic nodes, etc? It makes these kind of reductions in key places, thus making life a bit more difficult for the code generation author.

As mentioned above, it already provides two different options for two different types of authors. Most authors won't care about this. And the behavior for them will be correct. Some authors will care, and the escape hatch is there for them. It's unclear what you would prefer the API be instead? If, for example, it always added the suffix, then we'd have similar complaints from people going "i just want to generate 0, why is roslyn adding the D at the end?!?"

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi Do we not agree it is not a lexer's role, essentially, to make parser related decisions? But yet that's exactly what Roslyn appears to be doing. What I expect is that Roslyn honors the tokens as-is, however absurd they may seem to the tool author. As you said before, when the compiler/parser sees the code, the decision is made whether the tree passed/failed, basically, plus any diagnostic reports.

mwpowellhtx on 2 May 2019

It makes these kind of reductions in key places,

There is a 1->infinite number of ways to represent some of these values. Could you let me know what your preference would be on how the API should be expressed given that? Allowing the user to provide arbitrary text ensures that all those infinite forms can be represented. And providing a convenience entrypoint to the most common forms taht people use also seems highly beneficial and desirable. So, IMO, this very much helps "the code generation author."

Finally as mentioned already (but rejected by you), there are higher level APIs to make life even easier for the code gen author. They specifically try to even be more convenient and allow you to leverage all the same functionality and smarts that all the VS analyzers/code-generators use.

I get that you don't want to use that. But it's difficult for me to be sympathetic to complaints of "this is difficult for the author" when the solutions to make it easier are summarily rejected.

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi

There is a 1->infinite number of ways to represent some of these values...

Sure, but if you are not working from a sane lexer/parser grammar, language specification, etc, we're all in trouble, my friend. As we've established, "a token is a token is a token", with a little help from the code generation API, of course, to help direct more appropriate traffic.

mwpowellhtx on 2 May 2019

@CyrusNajmabadi Do we not agree it is not a lexer's role, essentially, to make parser related decisions?

The lexers job is to produce tokens and trivia.

But yet that's exactly what Roslyn appears to be doing.

What do you mean? You're not using the lexer here at all.

What I expect is that Roslyn honors the tokens as-is,

You have no token. You're generating a token given a random value in memory. There are literally near infinite numbers of tokens that could generate that random value.

however absurd they may seem to the tool author.

Why is this generation strategy absurd? It's literally how people write doubles in the language....

As you said before, when the compiler/parser sees the code, the decision is made whether the tree passed/failed, basically, plus any diagnostic reports.

yes... i'm not seeing your point here though. can you clarify?

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi

Here again, this is what I'm talking about. I'm talking about a specific set of tokens: literally number + "d". That's it. Which is, I grant you, one double representation.

mwpowellhtx on 2 May 2019

Sure, but if you are not working from a sane lexer/parser grammar, language specification, etc, we're all in trouble, my friend.

I have no idea what that means. In nearly every mainstream language this holds true. The lexer/grammar of almost every langauge would allow a near infinite number of token representations here.

So, again, what would your strategy be here in terms of the factory? We've provided two entrypoints. One that is simple and serves the needs of most consumers of the API. And one that is more complex, but gives you the escape hatch if you want to be very explicit about the token representation you want.

CyrusNajmabadi on 2 May 2019

Here again, this is what I'm talking about. I'm talking about a specific set of tokens: literally number + "d". That's it. Which is, I grant you, one double representation.

Ok. So are you basically just saying: give me SyntaxFactory.LiteralWithSuffix(double) entrypoint?

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi I know how to express that using a stack such as ANTLR. What's the problem representing the same via Roslyn?

mwpowellhtx on 2 May 2019

@CyrusNajmabadi I know how to express that using a stack such as ANTLR. What's the problem representing the same via Roslyn?

I don't understand your question. Do you want to take this to gitter where you might be able to provide more context and clarify your position a bit?

CyrusNajmabadi on 2 May 2019

I've sent you a private DM on gitter.

CyrusNajmabadi on 2 May 2019

@CyrusNajmabadi

Do you want to take this to gitter where you might be able to provide more context and clarify your position a bit?

No, in fact, I've been plenty clear on the issue at hand. I've spent way too much time arguing over it to begin with and I've got to move forward in this code generation project.

mwpowellhtx on 2 May 2019

No, in fact, I've been plenty clear on the issue at hand.

Ok, then i would close out htis issue since the existing APIs seem to be working as intended, and there is no proposal or request for any new API here.

CyrusNajmabadi on 2 May 2019

Note: i don't know what the antlr bit was about. It seems like a non-sequitor here. As do the comments about nodes/tokens/etc. As this is a code generation problem, and the code generation is working as expected, with appropriate mechanisms for all code-gen authors to get the behavior they want, it seems like this was not an API problem but stemmed from a misunderstanding from the user on how these APIs worked.

It might be worthwhile to improve the docs here in the future if there is more confusion.

CyrusNajmabadi on 2 May 2019

👍1

The API is working as intended. It produces a double literal when asked to. It only adds a suffix when it needs to. For example, 0.0 is already a double literal and so no suffix is added by default.