Roslyn: Differences between SourceGenerators and EC# macros

Created on 30 Nov 2016  路  27Comments  路  Source: dotnet/roslyn

To focus the conversation with @CyrusNajmabadi about SourceGenerators and macros...

Finally, i guess i'm just not seeing what purpose macros actually serve over the SourceGenerator proposal. As you've mentioned, they cannot introduce new syntax. So all they can do is take existing syntax and manipulate it, to produce new syntax. But that's what SourceGenerators did. That's something Roslyn is optimized for, as it allows very extensible transformation of Syntax.

I feel like there are multiple important differences between them, but let me first clarify:

  • Where do SourceGenerators fit in the compilation pipeline? Somewhere in the "Binder" area?
  • Is it true that the original source code can directly use any newly generated member?
  • Macros can be provided with "invalid" code to "fix" (i.e. the code makes sense to the developer and to the macro but could not be understood by the underlying compiler until the macro transforms it. e.g. one might write a macro to change const int Foo = "Foo".Length; to const int Foo = 3). In contrast, it looks to me like the SourceGenerators system currently requires the source code to be not just syntactically valid, but semantically valid as well. Is that correct?
  • The linked document doesn't say anything about using attributes to "invoke" a SourceGenerator, but earlier you said "Or, alternatively, we can use the approach we took with SourceGenerators. Namely that we used an existing piece of syntax (i.e. '[attributes]'), to mark where we wanted generators to run." What did you mean? How does that work?
  • Does a SourceGenerator have a full ability to introspect the entire compilation? If so, what prevents you from having to re-run every SourceGenerator on every keystroke?

Thanks!

Area-Compilers New Language Feature - ReplacOriginal Question

Most helpful comment

These limitations of macros are an advantage to IDE performance

But probably too limiting for use-cases that many customers are asking for. Those use cases depend on having semantic information available. i.e. because the generator wants to generate different sorts of code depending on which types you use, etc. etc.

All 27 comments

Tagging @cston

Where do SourceGenerators fit in the compilation pipeline?

It's my understanding that SourceGenerators can run on SyntaxTrees to produce new SyntaxTrees. They have access to the Compilation though, so they can get at any semantic data they need. There is no concept of 'pipeline' as that's not something we expose to the world. Our APIs present a model where everything is always up to date. We just compute things lazily to make that efficient.

Is it true that the original source code can directly use any newly generated member?

Yes. Because a new tree will be generated, and that's what will be finally compiled, any original code could reference any new code. It will be an error in the initial compilation, but will be correct (or not) in the final compilation depending on what was generated.

In contrast, it looks to me like the SourceGenerators system currently requires the source code to be not just syntactically valid, but semantically valid as well. Is that correct?

I don't believe there's a semantically valid requirement at all. The syntax requirement could likely be lifted... but it's probably for the best that we don't. Because then we'd likely have to maintain invariants around what trees we produce in invalid code scenarios. And i don't think we want that. It also means people could do things like add support for new syntax constructs, and we'd end up breaking them if we ended up adding those same constructs in the future. So we'll likely keep the the syntactically valid requirement.

The linked document doesn't say anything about using attributes to "invoke" a SourceGenerator, but earlier you said "Or, alternatively, we can use the approach we took with SourceGenerators. Namely that we used an existing piece of syntax (i.e. '[attributes]'), to mark where we wanted generators to run." What did you mean? How does that work?

Attributes just serve as a really nice way to identify locations where source generation could occur. But i don't believe they'd be a requirement. They'd also be a potential optimization point. i.e. generators could say "run me on this attribute type" and we could limit the amount of work necessary when trying to decide on what to run.

Does a SourceGenerator have a full ability to introspect the entire compilation?

I believe so, yes.

If so, what prevents you from having to re-run every SourceGenerator on every keystroke?

For some generators, nothing. We've considered approaches where SourceGenerators could define scopes on which they ran. For example, a SourceGEnerator could say that it only needed to be rerun on a file if that file actually changed.

But, for some source generators, you would have to rerun them on any change.

I don't believe there's a semantically valid requirement at all.

The documentation says "Source can be added to the compilation but not replaced or rewritten." If existing code cannot be rewritten, or at least hidden from later stages of the compiler, then ultimately it must be semantically valid or there will be a compiler error.

Attributes just serve as a really nice way to identify locations where source generation could occur. But i don't believe they'd be a requirement.

Is Roslyn somehow optimized to find attributes so that Source Generators triggered by attributes can efficiently locate code they care about? Is it optimized to find anything else, like, say, types that implement a particular interface?

There is no concept of 'pipeline' as that's not something we expose to the world. Our APIs present a model where everything is always up to date. We just compute things lazily to make that efficient.

I ask because later stages of the pipeline would have more information available than earlier stages, which might lead to paradoxes.

For example, given this method:

static int Caller() => (int)Called(3);
static long Called(long x) => x;

If the SourceGenerator is, in effect, located somewhere near the "end" of the pipeline, then I expect it would be able to not only get the syntax tree of Caller, but also learn that Called(3) refers to Caled(long). It could then generate this new method in the same (partial) class:

static int Called(int x) { return x*x; }

But this, in turn, should change the interpretation of Caller(). I noticed an analogous paradox while studying D; in that language, the compiler analyzes each method only once, so Caller() would call Called(long) even though Called(int) exists, if Caller() was called (even indirectly) before the creation of Called(int). So, I am wondering if there are constraints placed on SourceGenerators to avoid potential problems like that.

I ask because later stages of the pipeline

I don't know what pipeline means here :)

If the SourceGenerator is, in effect, located somewhere near the "end" of the pipeline, then I expect it would be able to not only get the syntax tree of Caller, but also learn that Called(3) refers to Called(long)

SourceGenerators can get this. Because this information is always available. Because Roslyn represents a complete view of all your code, you can always get at any information at any time.

But this, in turn, should change the interpretation of Caller()

Indeed. This is one of the reasons why 'macros' are hard. But that's hard *regardless of if macros are purely syntactic or not. Even if macros were just syntax transformations, you may still end up with this situation. Indeed, people want semantics so that they can detect and avoid causing these sorts of problems.

How'd the anniversary go!? :)

How'd the anniversary go!? :)

Good! Mainly we just spent a day at a mall... eating our favorite foods. Then I got distracted for a couple of days catching up on emails and news. Then I noticed the Roslyn job was gone and kicked myself in the butt (a handy skill if you want to look silly at parties). Are you married?

Indeed. This is one of the reasons why 'macros' are hard. But that's hard *regardless of if macros are purely syntactic or not. Even if macros were just syntax transformations, you may still end up with this situation. Indeed, people want semantics so that they can detect and avoid causing these sorts of problems.

Since you can't look up semantic info in lexical macros, they don't suffer from the same paradox. I am wondering, if such paradoxes are possible, how would Roslyn handle them? Would Roslyn re-analyze the semantics of Caller after the creation of Called? How does it know that it should?

An important thing to recognize about Roslyn is that it is a system where everything is immutable. So there is no paradox, because at any point in time, code only means one thing. After any trasformations happen you get a new system. And things will be correct or not in that new system. You can then do any sort of analysis you want between the two systems (for example, to tell if something bad happened).

This is, for example, how Rename works. We start with the immutable state of the world when rename starts. And rename produces a new immutable snapshot post updating all the appropriate locations. We can then compare the old snapshot to the new snapshot to see if things changed unexpectedly.

I guess the implication here is that all analysis that was previously done on method bodies and initializer expressions is thrown out.

Yes. Though we can be 'smart' if all the edits are done within method bodies and don't change any visible declarations. In a language like C# (unlike TS) the contents of a method body cannot affect any signatures of symbols. So we can reuse the majority of state from the previous compilation, and just recompute the 'bodies' of method-like entities that change.

We make heavy use of this for features like 'IntelliSense'. If you are editing inside a method-body. We attempt to reuse as much as we can from previous compilations to avoid work that will produce the same results.

--

Note: we can only do this for features that do not depend on things like location information. That's because, necessarily, the location information for previous snapshots will not be correct in the new snapshot. As such, the model we present is one where you really have the previous snapshot, and you ask it to analyze a method body as if it was present in the previous snapshot. So all the semantic information you get when analyzing the new method body is in terms of the old snapshot, with all the old locations and whatnot.

Before I start listing out differences, I would recommend that for either SourceGenerators or macros, new syntax be added to C# to make the feature more useful. Several changes have been important for EC#, but I would say the two most valuable have been these:

  • The "block call" identifier { statements* } and identifier (arguments) { statements* } are a natural way to represent user-defined constructs to be processed by either macros or SourceGenerators. IMO these should be available both inside and outside methods, and should come in "statement" and "expression" flavors (the statement form wouldn't require a semicolon after the closing brace).
  • The $ operator has proven extremely valuable for metaprogramming.

If a SourceGenerator is not allowed to remove existing members, then the need for new syntax seems especially strong, because any code in the original source file must be semantically valid. This requirement is sometimes unwanted, so I think a mechanism would be needed to give users a location where they can write semantically invalid (but syntactically valid) code, and block calls would provide such a location.

A block call inside a class could simply be ignored by the compiler; but it seems like Roslyn would need to be told when a SourceGenerator uses the block call, so that it can issue a warning/error if it was never used. A block call inside a method could cause the entire method to be ignored, but it seems better if the compiler simply declines to do semantic analysis on that method and then requires a SourceGenerator to supply a replacement method that does not call the original method.

In the context of SourceGenerators, block calls would probably disable or downgrade intellisense inside themselves, since intellisense cannot know how the code therein relates to the output of any SourceGenerator. (I suspect a macro system would not usually require any downgrading of intellisense; I hope I'll be able to make a proof-of-concept that demonstrates this.)

Here's a list of differences between SourceGenerators and lexical macros... along with some ideas for overcoming challenges with the latter (with [LM] or [SG] in brackets according to which is "better" in that respect - or [NCW] if there is no clear winner). I have bolded some points that seem the most important. There's a summary at the end.

IDE Performance

[NCW] SourceGenerators and macros must both run before all IntelliSense functionality works. Both can support partial functionality before all user-defined code has finished. Both could suffer from "bad" user-defined transforms that run very slowly or use excessive memory, so both would need mitigations like we have talked about earlier.

[LM] If lexical macros are required to be "pure" and not access the outside world - which I recommend - the only things that can affect their output are

  1. code in the the same file
  2. contents of files that are included/referenced by the file (caveat: new references could potentially appear during macro expansion)
  3. project settings that macros could see, such as predefined symbols.
  4. (optional) there would be value in providing metadata about referenced assemblies.

These limitations of macros are an advantage to IDE performance. A macro cannot scan an entire compilation, and if a particular file uses a slow macro, some frequently-used IDE features are basically unaffected while editing other files.

Also, occasionally the knowledge that a macro can't look much beyond the current file may help the user to understand or troubleshoot a macro's behavior.

[LM] Lexical macros are parallelizable per source file, and may be parallelizable on smaller units than a source file. It seems you can't run SourceGenerators in parallel on different compilation units.

[LM?] Compared to SourceGenerators, control is inverted in a macro system; a macro system scans the syntax tree once, using a dictionary to quickly locate all relevant macros for each node in the tree. My question wasn't answered about whether mechanisms are provided for SourceGenerators to efficiently locate nodes they care about. Without such mechanisms, macros are clearly more efficient, especially as the use of source transformers increases.

[LM] By design, SourceGenerators want to produce output that calls the original method:

// Input
void Foo([NotNull] string x) { 
  /* big method */
}

// SourceGenerator produces
replace void Foo(string x) { 
  Contract.Requires(x != null); 
  original(x);
}

A macro would simply insert Contract.Requires in the front of the method. This makes the macro-based program more efficient, does it not? I wouldn't expect the JIT to inline the original Foo into its replacement (unless it somehow detects that the original Foo is only called by a single method). And while the metadata from defining two methods is small, it could add up in a project where most methods do contract checks. I hate to sound like a worry wart, but the potential efficiency losses of SG-based contracts really concerns me.

Capability, flexibility, and ease-of-use

[SG] SourceGenerators can semantically analyze the compilation, while macros cannot. This is the biggest limitation of macros, but it is not crippling, as working with macros has shown that a lot of interesting features are still possible. @Jonathanvdc has further been exploring how adding other compiler features can bolster the capabilities of lexical macros.

Nemerle shows how a macro system could come in multiple flavors that increase analysis capability at the expense of code generation capability. Since each flavor has its own strengths and weaknesses, Nemerle supports three flavors of macro. I would not propose supporting multiple flavors initially, but it's nice to know you have options.

The limitations of macros here aren't _all_ bad. Obviously, not being able to do arbitrarily complex analysis discourages users from attempting to do complex analysis, which in turn encourages macro writers to use macros for relatively simple and fast tasks instead. No surprise then that out of the many macros I have written, I can only think of one that potentially runs slowly.

[LM] When multiple source transformers can produce output from the same input, a lexical macro system can have mechanisms for mediating and prioritizing, so that

  • a given macro can only be used if its namespace was imported (EC# accepts fully-qualified macro names, too)
  • it's clear when two macros are incompatible because they "claimed" the same node for transformation
  • one transformer can override another in ways that are input-specific, i.e. "claim" a node in some circumstances and not others.

I didn't quite "bold" this issue because I suppose mechanisms could be designed for SourceGenerators too.

[LM] It looks like SourceGenerators have a single global execution order. Macro execution order, in contrast, depends mostly on the source code: "outer" nodes are processed before "inner" nodes. Thus macro1(macro2(expr)) + macro2(macro1(expr)) uses two different execution orders in a single expression. I strongly suspect that having such fine-grained control will be important in some cases, though nothing is actually coming to mind at the moment.

[LM] Macros are usually easier to write because they can declaratively target just the thing they are interested in. For instance, if I want a xml(@"<stuff></stuff>") macro that translates itself into new XElement("stuff"), the boilerplate for doing that in LeMP is

~csharp
[ContainsMacros]
public static class XMLMacro
{
[LexicalMacro(@"xml(""string literal"")",
"Converts an XML string to an XElement object at compile time")]
public static LNode xml(LNode node, IMacroContext context)
{
// TODO: parse the xml and return the replacement code
}
}
~

With SourceGenerators the user's task seems much harder, since you have to somehow find all xml() nodes, then construct a set of complete replacement methods for every method that calls xml(). It's also inconvenient that, in order to be able to use xml(), the containing class would have to be marked partial.

[LM] It seems like SourceGenerators are Roslyn-specific due to the massive API surface area they have access to. One could argue that thanks to Roslyn there is no longer any need for other C# compilers. Still, a macro system could be designed to use a smaller, or much smaller API surface area, which would make it easier for other compilers to support macros compared to SourceGenerators. Consider Nemerle where many macros can be written using few or no compiler APIs (although, I think, a macro developer is _allowed_ to use most facilities of the compiler), by relying mainly on quoting and pattern matching on syntax trees - features that a non-Roslyn compiler could support more easily than the entire Roslyn API surface for C#.

Implementation & refactoring challenges

On the whole, SourceGenerators win in this section. Perhaps that's why they were developed instead of macros in the first place?

[SG] "Go to Definition" is straightforward with SourceGenerators.

I thought I talked about how to do "Go to Definition" with macros but I can't find my description. My idea was:

  • When you press F12 (Go To Definition), you're taken to the node corresponding to the name in the original source code.
  • If the definition is synthetic (exists only in the generated code) you're taken to the generated code instead (which is converted to text on-demand.)
  • If you press F12 a second time, you're taken to the definition in the generated code. For this to work though, the F12 command would have to have a memory of the place where the caret was _originally_, since the first F12 could take the user to an identifier that expands to multiple output definitions. To avoid this challenge we would have to define a separate key for "Go To Expanded Definition".
  • Aside: why is it that when I press F12 on a symbol that is defined in multiple locations (e.g. a partial class or an incorrectly duplicated method), I'm taken arbitrarily to one of them? I should like a list of choices in that case!
  • We could define a key to jump back and forth between the original and expanded code. Attempting to edit the expanded code could cause an automatic jump back to the original code.

[SG] SourceGenerators, as originally envisioned, don't need any changes to support code completion (Ctrl+Space, Ctrl+Shift+Space).

Statement completion could be a challenge in a file with macros, but hopefully still practical. For the most part it should be possible to map the caret's location to the output syntax tree, and from there do statement completion as usual.

Sometimes, one block of source code corresponds to multiple blocks of output code, so the caret could map to multiple places in the output tree. I just had the funny idea to gather the completion list separately from _all_ these locations and show the _union_ of those lists, but only show the _intersection_ in black (other items could be shown in gray).

For example, if Roslyn supported a replace macro like EC#, I might write

~~~
replace ImplementSquare($T $x) {
///

Squares a number
$T Square($T $x) {

}
}
ImplementSquare(int x);
ImplementSquare(double x);
// Parsing note: in EC# writing an unassigned variable declaration like this
// is a parse error (it needn't be, but I thought it _should_ be an error
// to reduce the chance that users will overlook the ambiguity in cases like
// Dictionary < K, V > cx.) Since the empty attribute list [] exists in
// EC# only, it is used as a signal to force the parser to change mode and
// see this as a variable declaration.
ImplementSquare([] BigInteger x);
~~~

Now if I go into the Square method and write $x., perhaps intellisense can't handle that: it probably would have had time to run macros in the time it took me to move the caret into position, so it knows that x exists, but it still doesn't understand $x. Still, in principle $x. could produce a completion list, after a short asynchronous delay for macros to execute... assuming it's not a problem to pass the invalid expression $x. through the macro system!

Anyway, the caret should map to three locations in the output, and a completion list can be gathered for each (up to some reasonable time limit). The completion list could show in black items like GetType that are available in all locations, and in gray items like IsPowerOfTwo that only exist in BigInteger.

[NCW] SGs produce new source files, so there's a clear place to go see error messages caused by their output. If each SG gets one or more separate source files then it's obvious where a problematic piece of code came from. However, this doesn't scale well up to a large number of SGs; a project that uses SGs extensively could end up with more generated code than original code. With layer upon layer of methods replacing other methods, it could be hard for a user to keep track of which methods are the "final" methods used at runtime.

Macros _could_ similarly produce new source files - one per source file that uses macros - but this has scale problems, too. Macros fare relatively worse in a project that uses macros moderately, because just a single macro in each source file would cause there to be an output file for every original source file. It seems to me that LMs would be more suitable for heavy usage than SGs, though.

However, compilation would be more efficient if separate output files are never produced on disk, so that's how I think LMs should work by default. The generated code could be kept in memory as syntax trees and converted to text on-demand. I expect this approach to require more engineering work, though.

[SG] SGs have a clear debugging story, since they produce ordinary source files that can be stepped through like any other. A macro system could do the same, but producing an expanded file for every source file is not as efficient as we'd like, and Edit & Continue becomes less convenient as the user would have to switch back to the original file before editing.

The aternative is to try to map each statement in the output tree back to the original source file. Usually this is not difficult, but

  • some statements will be synthetic so there is no corresponding statement in the original source file. This could be handled by stepping through such statements automatically and not supporting breakpoints on such statements (like what happens in methods whose source is not available); often this is fine since the generated code is highly predictable and of little interest.
  • sometimes one statement in the original file corresponds to multiple statements in the output. The debugger is can handle this already, though not always elegantly.
  • rarely, one statement in the output corresponds to multiple statements or expressions in the input. I'm not sure if the debugger can handle that; if not, I guess we just have to pick one arbitrarily.

[NCW?] For Edit and Continue, presumably one is allowed to edit the original source files freely and not the generated files. Such edits could imply changes to the generated files, though, suggesting that transformers must run even during debugging. In this respect they are the same as macros. In case Edit and Continue fails, perhaps SGs have a slight advantage since there is an on-disk source file in which to show the purple squigglies. But a macro system already requires some way to show the expanded source file anyway, and that same mechanism can be used in this situation.

[NCW] The "Rename" refactor is tricky for both SourceGenerators and macros. Under SGs, Rename could fail immediately if the member being renamed is defined _only_ in a generated file. But apart from that, the same difficulties apply to both, unless I missed something. One nice thing about macros is that they can be "expanded" to enable renames that otherwise wouldn't work. It's not clear to me how you would do the analogous task for an SG.

[SG] For the most part, SourceGenerators are so well-suited to the Extract Method refactor that the refactoring operation itself can stay the same. However, using SGs with block-call expressions, Extract Method probably wouldn't work at all inside them.

Similarly, it seems like various other refactors would work more naturally with SGs, though not always.

However, perhaps there might be an algorithm out there to automatically try to map changes from the expanded tree back to the original tree or source code. This is something I would like to investigate further, since _every_ refactoring operation could take advantage of it.

Summary

SourceGenerators and lexical macros solve many of the same problems and suffer from many of the same drawbacks. The major differences are:

  • A macro system is parallelizable.
  • A macro system uses a natually efficient one-pass algorithm.
  • Macros run separately per-file, limiting the "damage" of a single keystroke.
  • Macros are typically easier to write than SourceGenerators, especially for "small" features that operate below the method level.
  • The order of execution of unrelated macros can be controlled on a per-usage-site basis.
  • The EC# macro system has mechanisms to help users import macros and "arbitrate" among them. SGs don't seem to have any of that, but perhaps such features could be designed and added.
  • SourceGenerators have access to much more information.
  • SourceGenerators have a clearer debugging story.
  • SourceGenerators seem better suited to refactoring operations, but I'd like to investigate whether it's possible to close the gap.
  • SourceGenerators naturally support existing navigation features (Go to Definition, Find all References) but with additional work I think a macro system will work just as well.
  • Likewise, code completion is doable with a macro system, but harder.

So, macros win on IDE performance, ease-of-use and composability, while SGs win on debugging, refactoring, the depth of analysis that SGs can do, and on the lower amount of engineering required to implement them in Visual Studio.

@CyrusNajmabadi, I hope we're getting closer to the same page.

Next steps

I will keep an eye out for any ideas that might either reduce the engineering or performance cost of a macro system, or increase its power.

In time I would like to investigate ways of "closing the gap" so that macros are as good or better in every way, except for their lack of access to semantic information.

Finally, I'm interested in improving the EC# Visual Studio extension and prototyping some of the ideas outlined here.

These limitations of macros are an advantage to IDE performance

But probably too limiting for use-cases that many customers are asking for. Those use cases depend on having semantic information available. i.e. because the generator wants to generate different sorts of code depending on which types you use, etc. etc.

[LM] Lexical macros are parallelizable per source file, and may be parallelizable on smaller units than a source file. It seems you can't run SourceGenerators in parallel on different compilation units.

You would definitely be able to run SourceGenerators in parallel across files. i.e. you could be running them on multiple files simultaneously. Per file, you would have to come up with some ordering for SourceGenerators, as some generators might want to see the results of previous generator changes.

As part of source generators we did consider ways for implementations to state what types of changes they cared about. For example, a source generator could state that it only wanted to be rerun on a file if that specific file changed.

[LM?] Compared to SourceGenerators, control is inverted in a macro system; a macro system scans the syntax tree once, using a dictionary to quickly locate all relevant macros for each node in the tree. My question wasn't answered about whether mechanisms are provided for SourceGenerators to efficiently locate nodes they care about. Without such mechanisms, macros are clearly more efficient, especially as the use of source transformers increases.

When we were designing SourceGenerators we looked to analyzers as a good model for how tools could register to hear about things they cared about. So it would be expected that generators could listen for things like SyntaxTrees, Compilations, specific syntax kinds, symbols, etc.

A macro would simply insert Contract.Requires in the front of the method. This makes the macro-based program more efficient, does it not? I wouldn't expect the JIT to inline the original Foo into its replacement (unless it somehow detects that the original Foo is only called by a single method). And while the metadata from defining two methods is small, it could add up in a project where most methods do contract checks. I hate to sound like a worry wart, but the potential efficiency losses of SG-based contracts really concerns me.

The replace/original pattern is there for when perf overhead isn't a critical concern, or if the intent is to be able to call it in multiple locations. For example:

c# replace void Whatever() { ... if (..) { original... } else { origina... } }

If perf-overhead is critical then you simply replace the original method outright, and you never call 'original'. If 'original' is never called, there is no need for the compiler to emit it.

[LM] Macros are usually easier to write because they can declaratively target just the thing they are interested in.

This does not seem easier to me. This seems like this means like you'd need some sort of DSL to declaratively express the sorts of patterns you want to match. That means learning a language for pattern matching trees. SGs allow for arbitrary complex matching, with any amount of computation desired for the recognizer to determine if it should apply.

Consider the canonical case that people always bring up (translating linq into efficient imperative code). To do this correctly, you're going to need to do an enormous amount of semantic analysis, coupled with lots of syntactic checks on how the query is actually written (not to mention what sort of context it is in). Trying to write declarative matches that actually do that correctly is going to be very difficult.

It's also inconvenient that, in order to be able to use xml(), the containing class would have to be marked partial.

Note that that requirement is not set in store. Indeed, in one of the prototypes it was the case that you did not need to mark the containing class as partial.

On the whole, SourceGenerators win in this section. Perhaps that's why they were developed instead of macros in the first place?

I think it definitely ties in. When we create features we are usually opposed to just saying "and if you use this, then all these other features will just become terrible". If we were to do anything in this space, we would want an excellent IDE experience around them.

This was similar to how we did the Repl initially. We didn't just create a bog standard text repl. We wanted one that actually tied into all the features that people expect of a good C# editing experience. And there's much more we want to do there. But we weren't going to enter into this space unless we felt we could do a great job here across the board.

why is it that when I press F12 on a symbol that is defined in multiple locations (e.g. a partial class or an incorrectly duplicated method), I'm taken arbitrarily to one of them? I should like a list of choices in that case!

Go-to def has, for many version, popped open a tool window listing all the locations of the symbol if there are many of them. i.e.

image

It is a reasonable request to have go-to-def show you the duplicates when there is an error. Feel free to file that bug :)

SGs produce new source files, so there's a clear place to go see error messages caused by their output. If each SG gets one or more separate source files then it's obvious where a problematic piece of code came from. However, this doesn't scale well up to a large number of SGs; a project that uses SGs extensively could end up with more generated code than original code. With layer upon layer of methods replacing other methods, it could be hard for a user to keep track of which methods are the "final" methods used at runtime.

That's not the intent on how SGs would work. With SGs you always have the original user files. Then all SGs run on those user files, producing the final file. You don't take a single file, then run it through a generator, producing new files, then running those through more generators, ad nauseum.

[LM] Macros are usually easier to write because they can declaratively target just the thing they are interested in.

This does not seem easier to me. This seems like this means like you'd need some sort of DSL to declaratively express the sorts of patterns you want to match.

Uh... the macro simply registers to be triggered by one or more kind(s) of nodes, remember? That doesn't involve a DSL. Remember the xml macro? No DSL there.

Inside the macro you then do further checks to see if the syntax tree has the form you want. You need not use a DSL for that, but I'm baffled why you'd think it's easier NOT to use a DSL.

Consider how the syntax matching DSL looks in EC#:

~~~csharp
[LexicalMacro(@"", "", "#foreach")] // watch for foreach loops
public static LNode foreachThingie(LNode node, IMacroContext context)
{
matchCode (node) {
case { foreach($Type $item in $list) $body; }:
// Get list of statements in body and make an index variable
var bodyList = body.AsList(CodeSymbols.Braces);
var index = LNode.Id("i" + context.IncrementTempCounter());

    // Create a for-loop replacement
    return quote {
        for (int $index = 0; $index < $list.Count; $index++) {
            $Type $item = ($Type) $list[$index];
            $(..bodyList);
        }
    }
  default:
    return null;
}

~~~

You just write what you're looking for literally with placeholders like $item, and generate output the same way. How is that not easy?

You would definitely be able to run SourceGenerators in parallel across files. i.e. you could be running them on multiple files simultaneously.

So it would be expected that generators could listen for things like SyntaxTrees, Compilations, specific syntax kinds, symbols, etc.

Then all SGs run on those user files, producing the final file. You don't take a single file, then run it through a generator, producing new files, then running those through more generators, ad nauseum.

Sigh. None of this is stated, suggested or implied by the documentation. I've been blindsided.

Nothing in there would make one think Execute is called more than once per compilation, let alone concurrently. Nothing there makes me think there are features to easily and efficiently locate nodes of interest. And the document not only implies you have you call AddCompilationUnit to create new code, it also says "Generated source is persisted to a GeneratedFiles/{GeneratorAssemblyName} subfolder", implying each SourceGenerator - or at minimum, each generator that is in a different assembly - gets one or more separate output files.

Would someone please update it?

Obviously, if what you say is true then many of the relative performance benefits of macros disappear.

Sorry, i'm not understanding this part then: "xml(""string literal"")" What code interprets what that means?

You mean in the [LexicalMacro] attribute? The first two strings just serve as documentation (syntax and description).

We are now taking language feature discussion in other repositories:

Features that are under active design or development, or which are "championed" by someone on the language design team, have already been moved either as issues or as checked-in design documents. For example, the proposal in this repo "Proposal: Partial interface implementation a.k.a. Traits" (issue 16139 and a few other issues that request the same thing) are now tracked by the language team at issue 52 in https://github.com/dotnet/csharplang/issues, and there is a draft spec at https://github.com/dotnet/csharplang/blob/master/proposals/default-interface-methods.md and further discussion at issue 288 in https://github.com/dotnet/csharplang/issues. Prototyping of the compiler portion of language features is still tracked here; see, for example, https://github.com/dotnet/roslyn/tree/features/DefaultInterfaceImplementation and issue 17952.

In order to facilitate that transition, we have started closing language design discussions from the roslyn repo with a note briefly explaining why. When we are aware of an existing discussion for the feature already in the new repo, we are adding a link to that. But we're not adding new issues to the new repos for existing discussions in this repo that the language design team does not currently envision taking on. Our intent is to eventually close the language design issues in the Roslyn repo and encourage discussion in one of the new repos instead.

Our intent is not to shut down discussion on language design - you can still continue discussion on the closed issues if you want - but rather we would like to encourage people to move discussion to where we are more likely to be paying attention (the new repo), or to abandon discussions that are no longer of interest to you.

If you happen to notice that one of the closed issues has a relevant issue in the new repo, and we have not added a link to the new issue, we would appreciate you providing a link from the old to the new discussion. That way people who are still interested in the discussion can start paying attention to the new issue.

Also, we'd welcome any ideas you might have on how we could better manage the transition. Comments and discussion about closing and/or moving issues should be directed to https://github.com/dotnet/roslyn/issues/18002. Comments and discussion about this issue can take place here or on an issue in the relevant repo.

I am closing this issue because discussion appears to have died down. You are welcome to open a new issue in the csharplang repo if you would like to kick-start discussion again.

Was this page helpful?
0 / 5 - 0 ratings