Roslyn: Enhanced C#: a friendly hello

Created on 16 May 2016 · 92Comments · Source: dotnet/roslyn

I'm terribly embarrassed.

For the last few months I've been working on a tool called LeMP that adds new features to C#. I recently published its "macro" reference manual. This month I was going to start publicizing my "Enhanced C#" project when I discovered that the design of C# 7 had already started well before C# 6 was officially released - and even more shocking, that this design work was being done "in public" right on GitHub!

It kills me that I didn't realize I could have participated in this process, and that "my" C# was drifting apart from C# 7 for over a year. Oh well - it is what it is, and I hope that something useful can still be salvaged out of my work.

So, this post is to inform you about Enhanced C# - where it came from, and what it offers that C# 7 does not.

A brief history

As a class project in my final year of university, I extended a compiler with a new feature (unit type inference with implicit polymorphism), but (to make a short story shorter) the authors of the language weren't interested in adding that feature to their language. This got me thinking about our "benevolent dictatorship" model of language development and how it stopped me, as a developer, from making improvements to the languages I relied on. Since I had already been coding for 15 years by that time, I was getting quite annoyed about writing boilerplate, and finding bugs at runtime that a "sufficiently smart compiler" could have found given a better type system.

So in 2007 I thought of a concept for a compiler called "Loyc" - Language of your choice - in which I wanted to create the magical ability to compile different languages with a single compiler, and also allow users to add syntax and semantics to existing languages. This system would democratize language design, by allowing third parties to add features to existing languages, and allowing language prototypes and DSLs to seamlessly interoperate with "grown up" languages like C#. But my ideas proved too hard to flesh out. I wanted to be able to combine unrelated language extensions written by different people and have them "just work together", but that's easier said than done.

After a couple years I got discouraged and gave up awhile (instead I worked on data structures (alt link), among other things), but in 2012 I changed course with a project that I thought would be easier and more fun: enhancing C# with all the features I thought it ought to have. I simply called it Enhanced C#. It started as a simple and very, very long wish list, with a quick design sketch of each new feature. Having done that I reviewed all the feature requests on UserVoice and noticed a big gaping hole: I hadn't satisfied one of the most popular requests, "INotifyPropertyChanged". So at that point I finally went out and spent three weeks learning about LISP (as I should have done years ago), and some time learning about Nemerle macros. At that point (Oct. 2012) I quickly refocused my plans around a macro processor and called it EC# 2.0, even though 1.0 was never written. I realized that many of the features I wanted in C# could be accomplished with macros (and that a macro processor doesn't require a full compiler, which was nice since I didn't have one) so the macro processor became my first priority.

So "Loyc", I eventually decided, would not be a compiler anymore, but just a loose collection of concepts and libraries related to (i) interoperability, (ii) conversions between programming languages, (iii) parsing and other compiler technology, which I now call the "Loyc initiative"; I've had trouble articulating the theme of it... today I'll say the theme of Loyc is "code that applies to multiple languages", because I want to (1) write tools that are embedded in compilers for multiple langauges, and (2) enable people, especially library authors, to write one piece of code that cross-compiles into many langauges. One guy wants to call it acmeism but that doesn't seem like the right name - I'd call it, I dunno, multiglotism or simply, well, loyc.

EC# and Roslyn

Roslyn's timing didn't work out for me. When I conceived EC#, Roslyn was closed source. I researched it a bit and found that it would only be useful for analysis tasks - not to change C# in any way. That wasn't so bad; but I wanted to explore "radical" ideas, which might be difficult if I had to do things the "Roslyn way". That said, I was _inspired_ by Roslyn; for instance the original implementation of "Loyc trees" - the AST of EC# - was a home-grown Red-Green tree, although I found my mutable syntax trees to be inconvenient in practice (probably I didn't design them right the first time) and rewrote them as green-trees-only (immutable - I thought I might rewrite the "red" part later, but I got used to working with immutable trees and now I don't feel a strong need for mutable ones.)

By the time MS announced they were open-sourcing Roslyn (April 2014), I had been working on Enhanced C# and related projects (LLLPG, Loyc trees and LES) for well over a year, and by that point I felt I had gone too far down my own path to consider trying to build on top of Roslyn (today I wish I could have Roslyn as a back-end, but I don't think I have time, nor a volunteer willing to work on it).

LeMP

EC# still is not a "compiler" in the traditonal sense, but it's still useful and usable as-is thanks to its key feature, the Lexical Macro Processor, or LeMP for short. It is typically used as a Visual Studio extension, but is also available as a command-line tool and a Linux-compatible GUI.

Through macros, I implemented (in the past few months) several of the features that you guys have been discussing for more than a year:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;
Code Contracts via annotations on the method signature
Tuples (positional only) with deconstruction
Algebraic data types
Patern matching

(They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros and because I'm just one guy.)

It also has numerous other features:

A maximally flexible alternative to "primary constructors"
Code quotations and code pattern matching (comparable to LISP and Nemerle), which is useful for writing macros, for code analysis, code generation and even (potentially) reading/writing JSON and LES files (a use case I haven't written about yet).
Method forwarding, for doing the decorator pattern more easily.
Declaring variables and writing code sequences in expressions (like the ; operator that was sadly not added to C# 6)
The "quick binding" operator
on_finally which works like Swift's defer, and related macros (on_return, on_throw)
replace and unroll for generating boilerplate (although after reading about Nim, I think there's a better way to do the unroll feature)
A with statement based on the With statement in Visual Basic
An LL(k) parser generator called LLLPG (massive chicken-and-egg problem there: you write LLLPG grammars in EC#, while the EC# grammar is written in LLLPG)
And last but not least, users can write their own macros.

The other parts of EC# that exist - the parser and "pretty printer" - support some interesting additional features such as symbols, triple-quoted string literals, attributes on any expression, etc. However, the majority of the syntactic differences between EC# and C# 6 are designed to support the macro processor.

An important theoretical innovation of Enhanced C# is the use of simple syntax trees internally, vaguely like LISP. This is intended to make it easier to (1) convert code between programming languages and (2) to communicate syntax trees compactly.

What now?

Well, I'm not 100% decided about what to do now, knowing that the C# open design process exists and that C# 7 is shaping up to be really nice.

I don't intend to throw the whole thing away, especially since there are major use cases for EC# that C# 7 doesn't address. So in the coming weeks I will change the pattern matching syntax to that planned for C# 7, implement the new syntax for tuple types (minus named parameters, which cannot be well-supported in a _lexical_ macro), and add those "record class" thingies (even though I don't think the C# team has taken the right approach on those.)

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

In fact, those are far from my only options - I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs. And I'd love to make the world's most widely useful programming language (which is not EC#, because we all know how hard it is to improve a language given the backward compatibility constraint). The main reasons to keep going with EC# are (1) that I have a large codebase already written, and (2) that after 8 years alone I finally have a volunteer that wants to help build it (hi @jonathanvdc!)

I do suspect (hope) there are some developers that would find value in EC# as a "desugaring" compiler that converts much of C# 7 to C# 5. Plus, LeMP is a neat tool for reducing boilerplate, "code find and replace" operations, and metaprogramming, so I really want to polish it up enough that I finally win some users.

There is so much more I could say, would have liked to say, and would still like to say to the C# design team... but in case this is the first you've heard of Enhanced C# or LeMP, you might find this to be a lot to take in - just like for me, C# 7 was a lot to take in! So I'll avoid rambling much longer. I hope that, in time, I can win your respect and that you will not "write me off" in a sentence or two, or without saying a word, an eventuality I have learned to emotionally brace for. I definitely have some opinions that would be opposed by the usual commentators here - but on the other hand, I think the new C# 7 features are mostly really nice and I'll be glad to have them.

So if this wasn't TLDR enough for you, I hope you'll enjoy learning about EC# - think of it as how C# 7 might have looked in a parallel universe.

Links:

EC# for normal coders (because EC# for programming language nerds is a bit old)
LeMP macro reference manual

Area-Language Design Discussion

Source

qwertie

👍11 🎉3 ❤1

Most helpful comment

Agreed. I'm getting high level ideas and concepts. But when i try to dive deeper, i'm seeing contradictions and not-fully-fleshed-out ideas.

Many of your ideas also seem predicated on a whole host of assumptions. i.e. "we could do X, (with the implication that Y and Z are also done). And to do Y and Z, we'd need these other things as well." I can't wrap my head around a clear set of concepts and work items that you're actually proposing, and how each one of them would work.

Most of this feels like you have grand ideas in your head, and you're giving quick sketches based on assumptions that are scattered around in a whole host of places :)

Condensing and focusing would make this conversation much simpler.

CyrusNajmabadi on 28 Nov 2016

👍2

All 92 comments

You did all this stuff by yourself? That's pretty darn impressive! I for one hope you stick around these repos, sounds like you have some good insights!

aL3891 on 16 May 2016

You might be interested in https://github.com/JetBrains/Nitra . I think it will eventually allow "extending" C# in an IDEA-grade IDE (https://www.jetbrains.com/rider ?).

...I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs.

Sadly, Microsoft has not yet described the future toolchain for C# - WebAssembly development. Saying "we have LLILC" is not really an answer. Hopefully they understand that TypeScript is just a temporary work-around.

dsaf on 16 May 2016

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

I think that depends a lot on what you want.

For acceptance in the mainstream I'd think that you'd have more impact with Roslyn, both lobbying and participating. While I believe that any feature must be championed by an LDM member to be considered for acceptance, having someone with experience in proving out the feature and who can actually develop it would reduce their burden and likely lower the barrier a bit which may allow for a faster evolution of the language.

But you would have to endure the politics of the committee and for someone who has gone their own for so long that might not be ideal for you. If you wanted to keep it on your terms it might be worthwhile to consider forking Roslyn. You'd have a lot to relearn but at least in theory you can keep your changes up to date with the evolution of C#.

Note that several features that you've mentioned (pattern matching, records) got punted to beyond C# 7.0, and they are very likely to change. So rather than adopting what has already been proposed here I'd suggest using EC# as a proof of concept for an existing syntax which can have an impact on how the feature will shape up for potentially C# 8.0.

HaloFour on 16 May 2016

@aL3891 Thanks very much! Though I did it all myself, I'd stress that I didn't _want_ to do it alone (I mean, think about your colleagues, have you learned anything from them? I've missed that by not having any).

@dsaf Thanks for the information! Nitra is an impressive project that maybe I ought to learn about (though I guess it could be hard to fit it in with the work I've already done). I wonder what Rider offers that, say, Xamarin Studio doesn't (because competing directly with VS Community seems ... impractical)

P.S. I don't really get how LLILC is different from the AOT compilation that Mono had already.

@HaloFour I'm definitely looking to have some kind of real-world impact, but I'm not sure if the C# team would be interested in replicating the main feature of EC#: a macro system or a compiler plug-in system. Plus, the design of EC#/LeMP would probably be difficult to adapt to Roslyn, so ... I'm not sure how to actually get a real-world impact. :confused:

qwertie on 18 May 2016

I suggest you open issues for the individual features of EC# that you'd like to see in c# and reference the work you've done in each area and then the discussions can go from there :) It may not always be possible to adapt your implementation directly but i'm sure the team will find it interesting none the less, as @MadsTorgersen said (I think it was) said on channel 9 one time, There aren't a whole lot of people out there designing languages, so its nice to stay together!

aL3891 on 18 May 2016

👍2

@qwertie

...wonder what Rider offers that, say, Xamarin Studio doesn't...

Built-in ReSharper obviously :).

dsaf on 19 May 2016

👍1

It did not escape my notice that no one from Microsoft was interested. I took my leave, tail between legs... progress on EC# since then has been minimal, but it's not cancelled, I'm still working on it.

qwertie on 23 Nov 2016

i'm interested :)

But, as Halo pointed out, the entirety of what's going on in this issue is enormous. It's simply too large to do anything with in its current state. Extracting out useful pieces and working toward getting them implemented is likely the best path forward.

Note that this bit concerns me:

They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros

We've looked into areas like this before, and a large issue is that things often work well for more 'toy' scenarios, but fall over when you need to really deal with the full complexity of the language. For us to do anything it really needs to be designed so that it will work well in that context.

Thanks!

CyrusNajmabadi on 23 Nov 2016

🎉1

@CyrusNajmabadi first of all, thank you very much for saying something (and also thanks to aL3891, dsaf & HaloFour - I appreciated your replies; it's just that I really had my heart set on some kind of response from an 'insider'.)

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"? I have found that macros work well for much more than just 'toy' scenarios. Let's see...

I _have_ noticed some difficulty in composability and conflict resolution of macros written by different people that operate on the same construct (e.g. two macros modify a method - what order should they run in?), but at least a set of "standard" macros can be designed together and compose in the right way.
I'm also aware of the challenge of integrating macros with refactoring, but it seems solvable. Some operations could fail when using macros that do fancy things, though, and renames probably shouldn't be done in realtime like VS2015 does.
I expect that a macro system would be much harder to implement in Roslyn than in Enhanced C# due to the complexity of syntax trees in the former. [EDIT: Hmm... there's a good chance I'm wrong about that.] An alternative to actually implementing a macro system would be some sort of change to the compiler to allow alternate front-ends. This could allow interested parties to use my existing macro system by switching the extension on a source file to 'ecs'. I bet someone would also write a VB front-end that converts to a C# syntax tree so that you could mix languages in one project (albeit not seamlessly - if the front end can only deal with syntax, the VB code would end up being case-sensitive).

qwertie on 24 Nov 2016

@qwertie

...I really had my heart set on some kind of response from an 'insider'.)

You have actually received a response from Gafter straight away - marking something as "Discussion" means that a suggestion is being rejected on the spot.

My opinion on this topic:

EC# cannot be widely popular because C# doesn't suck. The situation with TypeScript - JavaScript for example is entirely different and even then TypeScript is kind of "meh" unless a front end is predicted to be quite complex.
It's important to point out that C# is open-source but not community-driven. The only viable way of directly contributing to C# design is reduced to this:

https://github.com/dotnet/roslyn/issues?q=is%3Aopen+is%3Aissue+label%3A%22Up+for+Grabs%22+label%3A%22Feature+Request%22+label%3A%22Area-Language+Design%22

Alternatively consider this (not sure if this one is still alive):

https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=3&jid=208941&jlang=EN&pp=SS

dsaf on 24 Nov 2016

How do you define an 'insider'?

CyrusNajmabadi on 24 Nov 2016

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"

i mean things like properly working in complex constructs like async/await or 'yield'. Or in constructs where variables are captured into display classes. Or with constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

After this, you also have to figure out how this impacts the IDE/editing-cycle. We put many months into exploring a system that would do nothing but just allow tree transforms, along with generating the results of those into files that could be introspected, and the problem space was still enormous. How does debugging work? How do IDE features (like 'rename') work? How do safe transformatoins of code work?

Think about it this way:

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

Finally, arbitrary extensibility is also a major concern for us in terms of being able to rev the language ourselves. Now, anything we do in the language has the potential to stomp on someone's arbitrary extensibility plugin. What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

CyrusNajmabadi on 24 Nov 2016

How do you define an 'insider'?

Someone on one of the Roslyn teams. But I would have been happy with any Microsoftie.

constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it. Also, many macros do something simple enough that there's little that could go wrong and few feature interactions to consider. Plus, a lot of macros would be one-off things made by one user for one project; those things need not work beyond that one little context they were made for.

We put many months into exploring a system that would do nothing but just allow tree transforms

Interesting. Are discussions about it available to read?

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

In general you can't, but note that we technically have this problem already with WinForms controls. In theory they can misbehave on the design surface; in practice most people are happy, and happier than they would be if the design surface didn't run custom code. There are mitigations:

Decouple updating the program tree (the directory of classes, methods, etc.) from most user-facing operations (this is done already, I think)
Provide a hint to macros (or other units of custom transformation) that they are running in an IntelliSense context, to help slow macros avoid expensive parts (I'm thinking of my parser generator, which could skip grammar analysis and generate methods without bodies in that case.)
Measure the running time of all macros: per-macro aggregate time and slowest single invocation. If there's a performance problem, the IDE can put up tips like "FooMacro is slowing down Intellisense" so VS doesn't take the blame. And of course you'd need to inject a thread abort if a macro enters an infinite loop. You'd want to watch their memory usage too (is there a mechanism in the CLR for that?) The build process would also need some way of informing users about performance problems.
Have a dialog box for "intellisense performance" which, in addition to a profile of built-in intellisense, would summarize macro performance and allow users to disable badly-behaved macros at design time.
Typically a slow macro would only be used in one or two files, so the IDE could learn to process those files last for the purpose of passive look-up (e.g. dot-completion). Refactoring does require full processing though.

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again _immediately_ in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem. To me it's like the problem of "what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!" I knew that was a risk back when I defined my own WeakReference<T>, but I did it anyway. It seems to me it should be the _user's_ decision whether to take that risk. (BTW my macro system has a prioritization feature for some scenarios like this.)

qwertie on 24 Nov 2016

Someone on one of the Roslyn teams.

That would be me :)

CyrusNajmabadi on 24 Nov 2016

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it.

One of the arguments i thought you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system. That's only true if this subsystem if capable enough to handle all the complexity that we'd need to manage with all our features.

CyrusNajmabadi on 24 Nov 2016

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Yes, Roslyn does fairly extremely incremental parsing. It tries to reuse, down to the token level, all the data it can :)

CyrusNajmabadi on 24 Nov 2016

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again immediately in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

Yes. And you've now taken a system that should take a few seconds max, and made it potentially take minutes (depending on how many transformations are being done, and how costly they all are). :)

CyrusNajmabadi on 24 Nov 2016

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem.

We've now released a new version of C# that they can't use. Or which may break their code.

"what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!"

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

Allowing for abitrary new syntax to be introduced is problematic. Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

CyrusNajmabadi on 24 Nov 2016

Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned.

The problem with this is that features themselves are complex. Extract method, for example, needs a fine-grained understanding of data flow and control flow to make appropriate decisions. How does it do this over code that may change arbitrarily because of macros?

Consider just something simple:

```c#
void Foo()
{

Normal...
CSharp...
var result = Code...

<SomeMacro2>

}
```

The user wants to extract out the code in the middle. But maybe <SomeMacro2> ends up using 'result'. Normally extract method would see that 'result' was unused after the extracted region, and it would pull it entirely into the new method. Now, it would need to know that the value was actually used by <SomeMacro2> in order to make sure the value got passed out.

And that's just a simple case :)

CyrusNajmabadi on 24 Nov 2016

@dsaf Thanks. What's the importance of the 'up for grabs' tag?

consider this (not sure if this one is still alive)

Thanks for the heads up; too bad it has no date on it. If it's more than a few months old, I probably applied for it already.

qwertie on 24 Nov 2016

There are mitigations:

For certain. But now the problem space has gotten much larger.

This is a primary concern here: the value produced by this has to warrant the enormous amount of work that needs to happen here. And it has to justify all that work and not have major downsides that we would have to absorb.

Or, in other words, there are limited people resources to be able to do all of this. A suggestion like this would take a massive amount of effort to thread through the compiler (just the infrastructure), and then would have all that additional work to get workign properly in the IDE. Just the testing would be massive difficult as each feature would now have to deal with not only arbitrary code, but arbitrary macros.

To given some examples. We did something like 'Analyzers', and that was vastly smaller than what you're discussing in scope. Analyzers themselves too several devs an entire product cycle to fit into Roslyn. And it's still getting tons of work because of the deep impact is has on the system, and all the perf issues we need to address.

In order for us to take on this work, we'd need clear understanding of exactly what value we'd be getting once we finished. Right now that value isn't clear. For example, as mentioned earlier, we likely would not be able to use this system for our own language features. That would mean we'd be investing in something with very little payoff for our own selves. It also means we wouldn't be directly utilizing (dogfooding) our own features. Which means ensuring a high enough level of quality would be quite difficult. etc. etc.

CyrusNajmabadi on 24 Nov 2016

Are discussions available ot read?

Work was done here: https://github.com/dotnet/roslyn/blob/master/docs/features/generators.md
https://github.com/dotnet/roslyn/issues/5292

CyrusNajmabadi on 24 Nov 2016

What's the importance of the 'up for grabs' tag?

It means we're happy with anyone taking it on and providing a solution.

Technically, anything is 'up for grabs', but the ones with that particular label are things we think non-full-time developers could reasonably take on.

CyrusNajmabadi on 24 Nov 2016

Yes, Roslyn does fairly extremely incremental parsing.

Wow! Somehow I overlooked the incrementalness of the parser when I looked at its code.

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes.

Hmm. If the solution is big enough and the macros are slow enough, yes. But no one _has_ to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

One of the arguments i _thought_ you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system.

Ah, I see why you would think that, since I had done exactly that with my system. And if C# were a _new_ language then yes, you'd want to design it so that core features would be part of some grand extensibility scheme. But in the case of EC#, part of the reason I did so many features as macros was so that I'd have a payoff without the trouble of writing an actual compiler! Plus I wanted to explore just how much can be accomplished with lexical macros (= syntax-tree-processor macros) alone. And it's a lot.

While some built-in features of C# could be done as macros, I see a macro system more as

an incubator for ideas - just to see what power users do with it
as a way of reducing pressure to add new features to the language - you prioritize the features that are served least well by macros
a way to give developers features that will _never_ meet the team's famous threshold for adding features to C#, or that don't have a single best solution. Classic examples: things that auto-implement INotifyPropertyChanged; parser generators; and since you mentioned dogfooding, macros for code analysis and generation, which should be handy in Roslyn itself.
a replacement for T4 templates that is far more convenient to use.

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

My macro system uses namespaces pretty much the same way (if it had more users, I'd add support for :: too.)

Allowing for arbitrary new syntax to be introduced is problematic.

I agree; Enhanced C# does not allow new syntax. I edited C#'s grammar to make it flexible enough that new syntax wouldn't be needed in most cases. For example, there are several macros now that have the syntax of a method definition, like replace Square($x) => $x * $x;.

Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

Yeah... I recognize the tension. Probably it's better to show an "ambiguity" error rather than risk subtly changing the meaning of existing code. If the macro author knows the new feature is coming (and has the same semantics) he could mark it as having a low priority so that the new feature takes priority when it becomes available; and for end-users there could be another mechanism to prioritize, or at least import selectively.

Now, it would need to know that the value was actually used by in order to make sure the value got passed out.

True, there would be cases where 'extract method' might do the wrong thing... although in this example, if SomeMacro2 does something with result without the variable having been passed to it explicitly, it's probably either a badly designed macro (because why would it do that?) or one for which the dev doesn't need/want the refactoring engine to care, because the change in behavior is expected, like some debug/logging/profiling macro that doesn't affect user-facing behavior.

I understand MS has high standards... but I think if a feature provides a lot of value, it should be done even if interactions with other features is imperfect. I suspect you're looking at this as "if the UX is not 100% rock-solid, we can't do it." Whereas I'm looking at it more like "few things have a _worse_ user experience than generating C# with T4 templates. Let's make something akin to T4 that's pleasant, if not quite perfect, see what people do with it, and learn from that experience when we make our next new language in 10 years." To me, as a 'power user', I hate how repetitive my code often is, and wonder if I'd be happier switching to Rust (though Rust drops OOP and GC, both of which I'd rather have than not have) or Nemerle (which, er, I can't recall why I didn't. Maybe because I wanted so much to write a self-hosting compiler!)

So to me, it would be enough to put up a warning. It could detect if any macros are used within the body of a method and say "Caution: this method uses user-defined macro(s). In the presence of certain macros, 'extract method' could produce code that is invalid, or that behaves differently. You may need to verify manually that the refactored code is correct."

qwertie on 24 Nov 2016

Having said all that, point taken, any kind of compile-time metaprogramming is a big, difficult feature.

I just thought of something that I never think about, because I don't use ASP.NET. You know how you can write blocks of C# code in <% %> in an aspx file and intellisense works in there? How do they do that? Is the solution necessarily tied to the Roslyn C# parser, or could I somehow write a VS plugin that would work like aspx, but use my EC# parser instead? And if so, who out there has the knowledge of how to do that - and may be willing to share it with me?

qwertie on 24 Nov 2016

Before I speak bluntly, LeMP is jaw-droppingly impressive. It has features that pull me, from method forwarding to the accessible implementation of the build-your-own-language philosophy. Even though it's clearly not possible for Roslyn to adopt the same methodology as EC#, I absolutely think that it's worth examining all the concepts that Roslyn can take away from project. The work you've done is cool and highly intelligent.

One thing does bother me. As a consumer of C# and Visual Studio, who dreams of the ability to add my own pet language features like await? in a similar way to writing Roslyn analyzers, I have always imagined hooking into the parser and then transforming an already-parsed syntax tree. The thought of having to implement the language extension as a text preprocessor horrifies me. Text processing is full of edge cases that are factorially hard to forsee and harder to get right in a maintainable way. I want to deal with the purest semantic level possible.
I was similarly frustrated every time I tried to use ReSharper's Custom Pattern search or code analysis. I don't want to operate on text, I have experienced that to be brittle and dangerous and at best a workaround, but rather on a semantic model of the C# language which goes straight to the IL compiler.

jnm2 on 25 Nov 2016

❤1

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes.
Hmm. If the solution is big enough and the macros are slow enough, yes. But no one has to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

they will be encouraged to do something about slow macros.

This necessitates two things. First, we need a system to be presenting this to the user. That has to be designed and built into the entire products. Second, you are adding features now that can take away from the experience and force users into unpleasant choices. Say, for example, a team takes a dependency on some macro that they find really useful. They're using it for months as they grow their codebase up. Then, at some point they find htat things have gotten slower and slower and it's the fault of this macro. What do they do now? Removing the macro is devastating for them, as they'll have to go and change all of that code in their system that depends on it. And giving up on all these features they care about is equally disappointing for them.

CyrusNajmabadi on 25 Nov 2016

I see a macro system more as
an incubator for ideas - just to see what power users do with it

In that regard, people can and do just use Roslyn. You don't need a macro system when you literally can fork things and just implement the new feature you want. I mean, that's literally how we incubate all our own ideas :)

CyrusNajmabadi on 25 Nov 2016

a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution. Classic examples: things that auto-implement INotifyPropertyChanged; parser generators; and since you mentioned dogfooding, macros for code analysis and generation, which should be handy in Roslyn itself.
a replacement for T4 templates that is far more convenient to use.

Note: this is precisely what we were working on. And even trying to do basic stuff here turned out to be massively complex once you took the whole experience into account. Take, for example, simple features like:

Navigation. Say a macro introduces symbols. How can the user introspect and understand the symbols? Navigation would likely only take them to the invocation of the macro, leaving them to have to try to decipher what had actually happened. We'd need somethign to help out here, and that would spike costs.
Debugging. How do you debug these sorts of things? We've gotten enormous amount of feedback over the years that people find it massively difficult to debug things like type generators. Now we'd be taking that same problem and pushign it to the normal coding cycle. In order to do this sort of feature we would have to have some sort of realistic Debugging story. And that means another large spike in costs.
Refactorings. Already mentioned. But this would numerous problems and could easily lead to 'safe' refactorings (like 'rename') now breaking your code. That's both a big issue with our goals for refactorings and can definitely erode user trust.

These are just three of many areas we found impacted when we started investigating this space. And each of those three area breaks up into many other areas we'd have to look at and consider.

This is not me saying the idea is bad. This is me saying: the costs are huge. Ergo, the rewards must warrant it.

CyrusNajmabadi on 25 Nov 2016

I agree; Enhanced C# does not allow new syntax. I edited C#'s grammar to make it flexible enough that new syntax wouldn't be needed in most cases.

I'm a little confused as to what you're proposing C# adopt in future releases. I went through your enormous gist and it's got a lot of ideas and sketches, but is somewhat scattershot.

Could you clarify which parts, precisely, of ec# you'd like us to add? And could you give core examples of that addition so we can direct the discussion around them? Thanks!

CyrusNajmabadi on 25 Nov 2016

a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution.

So this is tricky. If this doesn't meet our own bar, then we'd be very hesitant to add it into the language. After all, if we weren't living and breathing this feature every day, then the chance for it to have major issues would be quite high. As we've discovered, it's only through day-to-day dogfooding that we really can shake down a feature effectively. This has been the case with everything we've produced. Someone, like me, will create a feature and test the heck out of it. Then, a month later when the team starts really using it in our day-to-day development, i'll get a wave of subtle issues reported that i missed originally.

We absolutely must have that in order to ship a high quality enough feature. If people aren't using this as part of their core cycle, and seeing how their debugging experience is impacted, or how their LiveUnitTesting experience is impacted, or how their intellisense experience is impacted, or how their refactoring experience is impacted, or how this impacted customers who use "Open Folder" and open 100MB of source :)) then we won't get the critical mass we need to ensure that the feature is going to effective solving problems for customers in the real world.

CyrusNajmabadi on 25 Nov 2016

Whereas I'm looking at it more like "few things have a worse user experience than generating C# with T4 templates. Let's make something akin to T4 that's pleasant, if not quite perfect, see what people do with it, and learn from that experience when we make our next new language in 10 years."

So, to be clear, this was an exact scenario that our investigation was attempting to make better. :)

And, i want to also be clear: We did not scrap this idea. The idea is still there, and we are still interested in it. It's just that as we started doing work here, we quickly realized the enormity of the scope this would have, and that this would involve many developers over many months. That was simply too high a cost for our schedule, and we deprioritized it against the other work we're doing.

I'm seriously hoping we pick that up again for C# 8.0. But i also think that what we'd deliver would be a lot less 'ambitious' than what i think i'm seeing you desire. We're going to go with scenarios and schemes we think we can nail across many of the axes that i outlined above. If we can succeed on that, we'll ship, and then use the feedback we get to judiciously improve things moving forward. i.e. very similar to what we did with analyzers. We started with a core kernel that had a design we could believe in. And, over time, we continually enhance based on our own needs and the needs of the community.

If we start this back up again during C# 8, it would be great to have your input!

CyrusNajmabadi on 25 Nov 2016

I just thought of something that I never think about, because I don't use ASP.NET. You know how you can write blocks of C# code in <% %> in an aspx file and intellisense works in there? How do they do that?

At a high level, it's a rather simple system (though the devil is in the details). Here are the broad strokes on how it works. First, ASP opens the HTML file and parses out all the HTML structure. During this it identifies all the <%%> regions. It then spits out a second 'code-behind' file that is a normal C# file with #line and #default regions where it spits in scaffolding code, as well as the code in the <%%>. This is the code-file that the Roslyn system interacts with.

Now, in the Editor lots of amazing work happens. A ITextBuffer is created for that crazy C# file. Roslyn is powering the experience for that file. An ITextBuffer is created for the HTML file. ASP powers the experiences outside the <%%> blocks, and it leaves the <%%> blocks alone. Then we use IProjectionBuffers. (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.text.projection.iprojectionbuffer.aspx) to grab portions of each buffer which we stitch together into one final buffer which gets presented to the user. This 'projected' buffer should be character identical to the original file. But it's actually a projection of other files, which have IDE experiences driven by different components and different teams.

Overall this works really well, but there's a lot of complexity at some points. For example, the new Razor syntax which just uses "@" to move into the embedded language, and which has no 'end' delimiter :)

Is the solution necessarily tied to the Roslyn C# parser, or could I somehow write a VS plugin that would work like aspx, but use my EC# parser instead? And if so, who out there has the knowledge of how to do that - and may be willing to share it with me?

Technically, you could 'host' C# yourself. Using these interfaces:

https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.textmanager.interop.ivscontainedlanguage.aspx

But, these interfaces are ANCIENT and PAINFUL. :(

I'd love for them to be rev'ed in the future to be more modern (i.e. using the modern editor concepts that came with the WPF editor), and more debuggable. Right now we basically know that only ASP hosts us so the code (on both sides) makes huge amounts of assumptions. Assumptions that would almost certainly break if you tried this yourself :(

CyrusNajmabadi on 25 Nov 2016

👍1

I want to deal with the purest semantic level possible.
I don't want to operate on text, I have experienced that to be brittle and dangerous and at best a workaround, but rather on a semantic model of the C# language which goes straight to the IL compiler.

There are a few problem with that.

First: Note that you talk about the information flowing one direction. From semantics to IL. However, in a system like ours, information needs to flow in the reverse direction as well. i.e. to the toolchain that is sitting on top of the compiler. Take, for example, a feature like FindAllReferences. It would somehow need to be aware of how you've impacted the semantics so that it could find references to symbols that you were now using. How would we do this? We don't want FindReferences to have to call into every macro as that could be staggeringly slow. So now we need a system where you can inject, and yet flow information out in a performant (and indexable) fashion.

This would impact things like CodeLens. You wouldn't want it saying "0 references" to some symbol when it was actually referenced by some code you injected. That would make people believe they could remove code, when really that might break things, or subtly change semantics.

This could impact things like 'rename'. Today rename can check to ensure that semantics did not change across runs. The moment you have arbitrary semantic injection, how can we tell if your rename was safe?

@qwertie mentioned "So to me, it would be enough to put up a warning. It could detect if any macros are used "

But that would essentially impact all features. Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"? That would be a terrible experience and many people would be complaining that we were not providing a suitable feature with the level of integration they expected.

Second: This presumes that Roslyn even has a 'pure' clean semantic layer that you can plug into. Trust me when i say that right now, it doesn't. Indeed, this lack of a clean semantic layer is one of the reasons that IOperation got delayed. This was our attempt to expose the semantic layer in a very clean way only for querying (i.e. definitely not for mutation). Even just exposing that layer for clean querying turned out to be problematic and we discovered that we're going to need to invest a lot there before we can expose that.

Once we get to that point, then we can start considering what it might be like to allow for some sort of mutation ability to be provided. But note that providing such an ability is also staggeringly difficult when we just barely start poking at the surface of things. For example, say you have someone who says "hey, if i see pattern XXX in my method body then i want to run some special code that changes semantics". However, that code that runs then wants to inject a Class into the system. however, when it injects that class, it changes the semantics of everything (including the semantics that the current generator cares about).

The introduction of that Class changes all type binding and means that any reference to any type needs to be recomputed. Most of the interesting sorts of code generation use cases that customers have asked for end up doing this.

And this is just the case where you have one generator. Say you have many generators (i.e. one for contract validation, one for logging, one for ensuring certain patterns and company practices in your code). How do these all run? Are they ordered? What if you need them to loop? Do you somehow have to keep running them all until they reach a fixed point?

Auto-mutation is an area where things get complex SUPER fast. :-/

CyrusNajmabadi on 25 Nov 2016

👍2

I'll respond to @jnm2 first because it's easier :)

I have always imagined hooking into the parser and then transforming an already-parsed syntax tree. The thought of having to implement the language extension as a text preprocessor horrifies me.

Me too. The D language has a lot of great features, but the way some of them work makes me wince a little. Like when you do compile-time codegen, it has to be done by generating strings of source code and I very much disliked that design. In EC#/LeMP you only deal with syntax trees; you can do some custom syntax with "token literals" which are trees of EC# tokens.

I was similarly frustrated every time I tried to use ReSharper's Custom Pattern search

I haven't used that, but that reminds that I've always wanted a non-regex text search option that would implicitly insert a whitespace regex [ \t\n]* at every apparent word boundary so that searching for void Foo can find void Foo... I think I would leave the option on ALL the time for all file types. Sometimes I marvel that MS does these impressive massive features, but misses some of the little things.

But it occurs to me that doing a syntax search - like class $name : $(.._), IFoo, $(.._) { $(..body) } to find any class derived from IFoo - is very easy in Loyc and perhaps a more limited form of that would be straightforward in Roslyn too.

[@CyrusNajmabadi] But that would essentially impact all features. Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"?

I'm pretty sure rename - the world's most important refactor IMO - can be done well much of the time despite macros. Perhaps I could make a prototype to explore the idea... but if the symbol appears within the output of a macro, you do have to run macros again and see if there are side-effects or failures. Now, such effects can't really be avoided, I mean, given the algebraic data type macro:

~csharp
public abstract alt class BinaryTree where T: IComparable
{
alt Leaf(T Value);
alt Node(T Value, BinaryTree Left, BinaryTree Right);
}
~

it produces "withers" methods like Node<T>.WithLeft(). Renaming the Left property should succeed but with the side effect of changing the "wither", and renaming WithLeft should fail outright since it doesn't exist in the original code.

Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"?

That reminds me of a third option for 'extract method' beyond (1) ignoring macros, (2) ignoring macros but going to the work of analyzing the result to figure out whether the refactor produced a semantically identical result. Option (3) is to offer to expand macros before doing the extract method operation. I suspect the same could be done in general for any refactor involving macros. We could call option (4) "do the extract method on the result of macro expansion, then magically reverse macro expansion" - that may not be possible in general [edit: but maybe some candidates for reversal can be detected and attempted...].

The introduction of that Class changes all type binding and means that any reference to any type needs to be recomputed. Most of the interesting sorts of code generation use cases that customers have asked for end up doing this.

Yes, letting a program analyze/use itself, using that information to expand itself arbitrarily is a case of "there be dragons!", at least when declaration order is not supposed to matter. I noticed this when I was learning D, and constructed the following example to illustrate one of the subtle problems that can occur:

const int C1 = Overloaded(3);

int CallFunction(int x) { return (int)Overloaded(x); }
long Overloaded(long i) { return i*i; }
static if (CallFunction(3) != 3) // compile-time if
{
  int Overloaded(int j) { return j; }
}

const int C2 = CallFunction(3);

In this case, C1 is 3, but C2 is 9, which is weird because C1 and CallFunction both call Overloaded(3). I assumed this was not the only paradox D could have lurking in it, so I kept the problem in mind when I designed the "original" EC# - a series of design sketches that made C# more like D. I solved the paradox (the equivalent EC# code would have produced two compiler errors) but my system was a bit limited in the metaprogramming department. Without something like macros, my language wouldn't solve a variety of problems that I thought it should, such as the INotifyPropertyChanged problem, so I switched gears and really focused on learning about Lisp macros for awhile - putting the rest of my ideas on the back burner.

Lexical macros don't produce paradoxes like you see in D, since they have no access to semantic information and cannot affect the syntax tree outside themselves. Nemerle allows macros that can look up semantic info; earlier I had some trouble finding info about such macros, but I just found this and I'll maybe read that now.

qwertie on 26 Nov 2016

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

What do you have in mind?

Say, for example, a team takes a dependency on some macro that they find really useful. They're using it for months as they grow their codebase up. Then, at some point they find htat things have gotten slower and slower and it's the fault of this macro. What do they do now? Removing the macro is devastating for them, as they'll have to go and change all of that code in their system that depends on it. And giving up on all these features they care about is equally disappointing for them.

Well, in most cases, hiding it from IntelliSense might be sufficient. But if the macro (or other language extension) makes a lot of changes, losing visibility of those changes would be annoying. If it's open source, they could submit optimizations or fork the code. If it's both complex and closed source (and I'd be wary of using closed-source macros), they could write their own macro whose job is to run a fast (and simple) approximation of what the original macro does. The macro would be set up to override the original macro in an IntelliSense context only.

In that regard, people can and do just use Roslyn. You don't need a macro system when you literally can fork things and just implement the new feature you want. I mean, that's literally how we incubate all our own ideas :)

I'm surprised you say that, because I see two massive barriers:

It sounds difficult - I wouldn't have a clue how to isolate my fork from the original Roslyn in VS such that both versions remain usable; how to set up my project to use a modified Roslyn; or how best to distribute my forked version of Roslyn to others. (And wouldn't it be a big download?) And since C#'s parser isn't currently designed with any macro-friendly features, there's a good chance one would want to modify the parser... that's not easy even for me.
Forked versions by different people aren't easily combined.

Most ideas are too small and don't provide enough benefit to try overcoming these barriers.

qwertie on 26 Nov 2016

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

Let's see, here are some macros in LeMP that might work the way you're thinking:

namespace Foo;: all it does is wrap the file in a namespace decl. The macro itself is fast, but I guess it could disrupt any 'incremental updating' that Roslyn does. Since I'm not familiar with Roslyn's incremental update process, I didn't think about this at first.
#useSequenceExpressions - EC# supports arbitrary executable statements inside expressions. This macro does its best to translate such funky code into plain C#, which is needed by one of my favorite features, quick-binding variables. If C# 8 supported macros, I'm certain it would also have a feature that eliminates the need for this macro.
#useSymbols - helps translate @@symbol literals to plain C#. This macro must process the entire source file. If C# 8 supported macros but not symbols, this is one macro you could safely shut off. It doesn't affect any top-level declarations and could perhaps be shadowed by a using static method to keep intellisense working.

The using System(.Collections, .Text, .Linq) macro comes to mind - it only has a local effect, but in turn it affects how the rest of the file is interpreted. However, I suppose this is not the kind of "wide effect" you were thinking of.

qwertie on 27 Nov 2016

What do you have in mind?

Anything that introduces a top level declaration. It will mean reanalysis of everything in that compilation and any downstream compilations.

Well, in most cases, hiding it from IntelliSense might be sufficient.

This doesn't actually solve the problem. Indeed, it just makes the user think that something is broken.

If it's open source, they could submit optimizations or fork the code.

This is a lot of presumptions. Even if it's their own code, they might be unable to optimize things in the manner you're specifying. We have to consider these situations and we have to have an answer that is acceptable :)

I'm surprised you say that, because I see two massive barriers:
It sounds difficult

Precisely teh issue with Macros as well :D

CyrusNajmabadi on 27 Nov 2016

all it does is wrap the file in a namespace decl.

That means the semantics of everything in that file change. It means the semantics of everything in that compilation need reanalysis. It means every downstream project needs reanalysis :)

here are some macros in LeMP that might work the way you're thinking:
namespace Foo;

I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

If this is new syntax, that's very problematic. What happens if we end up adding that syntax in the future? Now this code is broken for the user. If it's not new syntax, then how is this working?

CyrusNajmabadi on 27 Nov 2016

it only has a local effect, but in turn it affects how the rest of the file is interpreted.

This is, by definition, not a local effect... If a file can be reinterpreted, then that can effect that compilation and all downstream compilations.

CyrusNajmabadi on 27 Nov 2016

Precisely teh issue with Macros as well :D

Huh? Macros are easy. In many languages you can write one in a couple of lines of code. Even the EC# and Nemerle models where you have to create a separate assembly to hold your macros - one might argue that the added difficulty is a good thing since macros should be considered a last-resort solution when other mechanisms won't solve a given problem.

This is, by definition, not a local effect... If a file can be reinterpreted, then that can effect that compilation and all downstream compilations.

Right, but it's a known quantity and the work is done: Roslyn is already designed to deal with this precisely this kind of cascading effect.

qwertie on 27 Nov 2016

Renaming the Left property should succeed

Yes, it should. But how do you verify that it actually has? Let's use the simple example you mentioned (Withers). If we have the following code:

```c#
class BinaryExpression {
Expression Left, Right;
}

class Whatever {
void Foo() {
BinaryExpression e = null;
var v = e.WithLeft(...);
}
}
```

If the user renames 'Left', then 'WithLeft' needs to update. Otherwise, their code will be broken post-rename.

CyrusNajmabadi on 27 Nov 2016

Huh? Macros are easy.

Clearly not. As i mentioned, a whole host of areas become majorly problematic. Again, cases like: Debugging, Navigation, Refactoring.

CyrusNajmabadi on 27 Nov 2016

Right, but it's a known quantity

How can it be a known quantity? We have no idea what the macro will produce.

and the work is done: Roslyn is already designed to deal with this precisely this kind of cascading effect.

Roslyn is designed to deal with precisely the cases we know about for C#. Indeed, Roslyn was designed, at every level, to deal with the problem space given the constraints of the language. We take great advantage of our knowledge of waht can/can't happen in the language. Macros throw most (if not all) of our optimization opportunities out the window.

They also introduce areas that we have no known solution or design for. Or, they would require complete redoing of certain areas. Take, for example, 'FindReferences' (as i mentioned before). How does FindReferences work in a world with Macros? How do we know if the Macro generated code ends up referencing the variable in question? The only way to know is to have to actually perform and analyze all macros before doing the FAR operation. As any edit might impact any macro, we have to do this. That means that every FAR now takes a hit as we have to reanalyze the entire solution.

Note: this is the issue LiveUnitTesting faces. Almost any edit can have an effect on tests. So they always have to re-execute all tests. This is fine in a world where that's providing background/ambient information. It's not ok when the user expects FindReferences to return in seconds, and it takes minutes as all macros are reexecuted and reanalyzed.

CyrusNajmabadi on 27 Nov 2016

How can it be a known quantity? We have no idea what the macro will produce.

Yes, but we know _where_ a macro produces its output: right in the same spot. So processing a change to using System(.Collections, .Text, .Linq) has no more complexity than if the user selected the line and pasted in

using System.Collections;
using System.Text;
using System.Linq;

As any edit might impact any macro, we have to do this.

Hold on. If you restrict your view to EC#-style lexical macros (as I have been doing implicitly, as I haven't thought much about semantic-level macros) this is not true: lexical macros can't look at the contents of other files, and incidentally, macro expansion is highly parallelizable as a result (I defined exactly one macro that looks at another file - includeFile - and it seems fair to make that a built-in macro for IntelliSense purposes. Plus if you make it illegal for user macros to access the outside world, you could potentially run them in some kind of security sandbox).

Roslyn is designed to deal with precisely the cases we know about for C#. Indeed, Roslyn was designed, at every level, to deal with the problem space given the constraints of the language. We take great advantage of our knowledge of waht can/can't happen in the language. Macros through most (if not all) of our optimization opportunities out the window.

I see. Supposing you change Roslyn to expand macros as a matter of course, can you mention an example of a particular optimization that might be lost as a result?

If the user renames 'Left', then 'WithLeft' needs to update. Otherwise, their code will be broken post-rename.

Ahh, right! My thinking was flawed, I didn't actually think of that. I mean, I realized that of course the name of WithLeft would change as a side effect, and that this change could be detected. But somehow I didn't think about the fact that any code calling WithLeft would be broken unless the rename operation also figures out how to rename all uses of WithLeft. And even if we can figure out how to do that, I'm not certain it's a good idea. Can the rename option know for certain that the new "WithLeftRenamed" method is really "the same method" as the original "WithLeft" method? Maybe. But if it tries to rename WithLeft too, there's the potential for cascading effects on other macro expansions. I haven't thought it through, but it's a little scary. A system that gives up and says "side effect detected: renaming Left caused the WithLeft method to disappear from a macro expansion" would at least avoid such complications...

...and yes, irritate the user a little. I would say "but at least they're getting some benefit from a language feature that wouldn't have otherwise existed." Then I guess you would say "well, if we hadn't spent all this time writing a macro system, we might have spent the time instead adding a new ADT-like language feature with withers built right into the core of C#, and our version of the feature wouldn't have this problem". And, well, that's true. But then again, you might have decided not to do the feature after all. With macros devs can get lots of features _very_ quickly that the Roslyn team either won't ever do, or won't do _right now_, which is when they want it done. Plus, the macro feature can be seen as a data-gathering exercise, as you'll see which ideas become the most popular. If a popular macro great as-is, you can just let people keep using it; whereas if the macro system imposes annoying shortcomings, you can prioritize making it a built-in feature without those annoyances. (meanwhile some shops will disagree with the whole philosophy of macros and outlaw them... you won't see me working at one of those places. I couldn't believe all the people complaining about var back in the day!)

I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

Ahh. Now you mentioned you read through my "gist" - that is, the EC# 1.0 design sketches. I guess you focused less on "EC# for PL nerds" which, though out-of-date, explains my thinking for the macro system.

There are basically three categories of syntactic changes in EC#. All categories are hard-coded in the parser:

First, there are changes designed to make possible the features of EC# 1.0. So right now the EC# parser accepts things like alias Foo = Bar; or trait TFoo { } that I sketched out but didn't implement. It turned out, though, that some of these features could be implemented as macros! So that's what I did, e.g. in the case of method forwarding (==>) and @@symbol literals (which, in the sketch document, has the syntax $symbol). Doing this made sense for me, but of course, it would be a bit silly for Roslyn to do that.
Second, there are syntactic changes designed specifically for macros. Notable items in this category are block-call expressions like match (expr) {...} and quote { code }, and token literals.
Third, there are a series of "regularizations" to C# which make the syntax more regular and uniform. I basically "squashed" different kinds of syntax together into "one thing". For instance, there is no separate syntax for "declaration space" (inside a class) and "executable space" (inside a method) and "property space" (immediately inside a property); all those context accept the same kind of syntax, which is simply called a "statement" (the biggest challenge with this was constructors, which look a lot like method calls). Also, a method's formal argument list has the same grammar as the syntax you would use to call a method. The purpose of all this regularization is to (1) simplify the parser (at the cost of adding more complexity in validation later on) and (2) give macros freedom to accept interesting syntax with no changes to the parser. Most notably, if you write foo { stuff } there is no way for the parser to know if stuff should be parsed like a class body or like a method body. Good thing, then, that there's no difference between the two! (no doubt C# Interactive had to grapple with this same problem - kudos to the team on that fantastic work btw. I LOVE IT SO MUCH!).

namespace Foo; is a bit special since it kind-of fits in both categories 1 and 3. (1) It wasn't in the EC# 1 design sketches but I think of it as something that could reasonably be built in to the language, and (3) namespace Foo is a regularized construct in the sense that you can also write class Foo : Bar; or enum Foo; (or even namespace Foo : Bar, Baz). Edit: just to be clear, the syntax is regularized but the macro is not. class Foo; can be parsed, but has no meaning, as no macros have been written to give it a meaning.

~~For more information, please read PL nerds part 3.~~ EDIT: actually don't, it's too out of date.

qwertie on 27 Nov 2016

Could you break down simply the following:

here are some macros in LeMP that might work the way you're thinking:
namespace Foo;
I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

Do macros add new syntax or not? If they don't add new syntax, can you give a very simple explanation of what the grammar of your macros are?

Finally, can you state specifically what limitations there are on what macros can use as input to their work, and exactly what they can produce as the results of their operation?

For example, you mentioned INotifyPropertyChanged. How would your macro system help out here?

CyrusNajmabadi on 27 Nov 2016

I haven't thought it through, but it's a little scary. A system that gives up and says "side effect detected: renaming Left caused the WithLeft method to disappear from a macro expansion" would at least avoid such complications...

I would be very loathe to add such a feature with such a limitation. It goes against a core principle we have in terms of what the user experience should be for our language.

If we did a feature like this it would be precisely because we would want the entire experience around it to be great. And that means that it should work great at a minimum for the scenarios like Debugging, Navigation, Refactoring, etc.

CyrusNajmabadi on 27 Nov 2016

Do macros add new syntax or not?

They do not. Was my previous answer on that topic helpful?

Finally, can you state specifically what limitations there are on what macros can use as input to their work, and exactly what they can produce as the results of their operation?

EC# macros take the Lisp concept of macros and apply it to C#. So, the macro processor proceeds independently on each source file, top-to-bottom and outside-in (I think more parallelism could be squeezed out some of the time, but conceptually it's top-to-bottom, outside-in.) Each macro invocation is replaced with its result; so typically a macro takes one syntax tree as input and produces one tree as output.

My system also has some extra features: for avoiding and dealing with conflicts between macros; to allow macros to produce multiple (or zero) nodes as output, rather than just one ("splicing"); to allow macros to scan code below it (not just children) and optionally "drop" the code below it; and to allow macros to process child nodes first, in violation of the usual outside-in ordering.

Specifically: for each node, the macro processor looks up all macros that are registered to process nodes of that name (if this were implemented in Roslyn, I guess Roslyn would look for macros associated with the current type of SyntaxNode, and macros could also be limited to a particular name, e.g., a call to Foo() but not a call to Bar()). If any are found, the list of macros is grouped by priority (although most macros have the same priority, PriorityNormal) and the highest-priority macros are executed first.

Each macro can "accept" by returning a syntax tree or "decline" by returning null; macros are also passed an IMacroContext which, among other things, tells them about the ancestor nodes of the current node and allows them to write warning and error messages.

From the macro processor's perspective, a macro invocation "succeeds" if exactly one macro does not return null. If two or more macros return a result, a message is normally printed that the invocation was ambiguous (normally an error, but this is downgraded to a warning if the two macros produced the same output, and a macro can further request that the warning be suppressed.) If all macros return null, errors and warnings produced by all macros are delivered to the user. (I'll skip other subtleties about warnings/errors).

If there are multiple priority groups, lower priorities execute only if higher priority macros all return null. If all macros returned null and there are no warnings/errors, a generic warning is produced ("2 macro(s) saw the input and declined to process it: namespace1.macro1, namespace2.macro2") unless those macros use Passive mode (which means "it's normal for this macro to produce no output").

A macro can request that children be processed first. (This is optimized to happen once if competing macros ask for the same thing.) Sometimes this is necessary, but there's a performance risk; if a macro processes children first but then returns null, some other macro that didn't process children first might accept and then the work must be repeated (my system also wastes the effort if all macros decline, but atm I can't think of any reason it must be that way.)

Finally, a macro can "drop" all nodes after itself and incorporate them into its own results; that's how the macro for namespace Foo; works, and other macros like on_finally and LLLPG.

Oh, and I just added a feature where macros can define other macros in the current scope (i.e. new macros disappear at }). Just one macro uses the feature so far, which I will demonstrate below.

qwertie on 27 Nov 2016

What is your syntax for macros?

CyrusNajmabadi on 27 Nov 2016

For example, you mentioned INotifyPropertyChanged. How would your macro system help out here?

(Sorry for taking so long - I hit a couple of bugs in LeMP while writing this.)

While certainly we could envision a macro _specifically_ designed to implement INotifyPropertyChanged, I think it would be slightly niftier to show how EC#'s replace macro can do the job. We start with something like this and would like to factor out the repetitive code, namely ChangeProperty<T> (potentially repeated once per class) and the repetition between the properties.

~~~csharp
public class DemoCustomer : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;

/// Common code shared between all the properties
protected bool ChangeProperty<T>(ref T field, T newValue, 
    string propertyName, IEqualityComparer<T> comparer = null)
{
    comparer ??= EqualityComparer<T>.Default;
    if (field == null ? newValue != null : !field.Equals(newValue))
    {
        field = newValue;
        if (PropertyChanged != null)
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        return true;
    }
    return false;
}

private string _customerName = "";
public  string CustomerName
{
    get { return _customerName; }
    set { ChangeProperty(ref _customerName, value, "CustomerName"); }
}

private object _additionalData = null;
public  object AdditionalData
{
    get { return _additionalData; }
    set { ChangeProperty(ref _additionalData, value, "AdditionalData"); }
}

private string _companyName = "";
public  string CompanyName
{
    get { return _companyName; }
    set { ChangeProperty(ref _companyName, value, "AdditionalData"); }
}

private string _phoneNumber = "";
public  string PhoneNumber
{
    get { return _phoneNumber; }
    set { ChangeProperty(ref _phoneNumber, value, "PhoneNumber"); }
}

}
~~~

We can factor out the common stuff like this:

~~~csharp
replace ImplementNotifyPropertyChanged({ $(..properties); })
{
// *
// Generated by ImplementNotifyPropertyChanged
// **
public event PropertyChangedEventHandler PropertyChanged;

protected bool ChangeProperty<T>(ref T field, T newValue, 
    string propertyName, IEqualityComparer<T> comparer = null)
{
    comparer ??= EqualityComparer<T>.Default;
    if (field == null ? newValue != null : !field.Equals(newValue))
    {
        field = newValue;
        if (PropertyChanged != null)
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        return true;
    }
    return false;
}

// BTW: This is a different `replace` macro that can pattern-match any syntax tree.
// The [$(..attrs)] part of this example is supposed to put all attributes into a list
// called `attrs`, but it doesn't actually work because pattern matching on attributes 
// isn't implemented yet. This is relevant because in EC#, modifiers like "public" are 
// considered to be attributes.
replace ({
    [$(..attrs)] $Type $PropName { get; set; }
} => {
    replace (FieldName => concatId(_, $PropName));
    private $Type FieldName;
    [$attrs]
    $Type $PropName {
        get { return FieldName; }
        set { ChangeProperty(ref FieldName, value, nameof($PropName)); }
    }
});

$properties;

}
~~~

This defines a macro called ImplementNotifyPropertyChanged that accepts a list of properties within a braced block. Although LeMP itself doesn't let users define macros on-the-fly, it does allow macros to define other macros, which is the technique used here. replace is a standard macro that creates a new macro, scoped to the current block, that outputs a specified syntax tree and performs replacements. (If the "current block" is at the top level of a source file, the macro is available at any lower point in the file.)

You can use the macro like this:

~~~
public class DemoCustomer : INotifyPropertyChanged
{
public DemoCustomer(string n)
{
CustomerName = n;
}

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

}
~~~

Naturally people would want to use ImplementNotifyPropertyChanged in all source files that implement INotifyPropertyChanged, so they could put its defnition in a common file, say, ImplementNotifyPropertyChanged.ecs and then use includeFile("ImplementNotifyPropertyChanged.ecs") to import it in a given file.

qwertie on 27 Nov 2016

I'm trying to wrap my head around what the actual syntax for your macros are. How does the parser find them? What does it do with them? In the code you presented, you have:

```c#
ImplementNotifyPropertyChanged
{
public string CustomerName { get; set; }
public object AdditionalData { get; set; }
public string CompanyName { get; set; }
public string PhoneNumber { get; set; }
}

So presumably a macro usage is, what:

Macro:
Identifier { MacroElement_list }
```

If that's the case, how does the parser know what it can parse within the braces?

I still don't see how debugging, navigation, refactoring, etc, work with any of this... For example, any code that referenced "FieldName" would be broken in any sort of refactoring scenario. Can you help clarify how this would work?

CyrusNajmabadi on 27 Nov 2016

👍1

Also, you've given an example of your macros for declaration level constructs, can you show it for statement/expression level constructs. For example, you mentioned:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;

Can you show how you did that?

CyrusNajmabadi on 27 Nov 2016

I'm trying to wrap my head around what the actual syntax for your macros are.

Sorry if I wasn't clear. In my system, syntax is completely orthogonal to the macro system; the parser knows nothing about macros, and the macro processor knows nothing about syntax (in fact, it knows nothing about C#).

The parser produces a programming-language-independent tree called a Loyc tree, and the macro processor is looking at the target of every "call" in that tree ("calls" include both method calls and _everything_ else except identifiers and literals.) A macro can target any call to an identifier (or any plain identifier). So a macro can target methods, classes, properties, calls, constructors, variables, multiplications, or almost anything else. The only thing macros can't target is literals, or "everything".

Obviously a macro system implemented on Roslyn would have to be somewhat different.

qwertie on 27 Nov 2016

For example, you mentioned:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;

Once you understand the orthogonality of macros to the programming language in which they are used, you can see that there is no difference between declaration-level macros, statement-level macros and expression-level macros.

Supporting "out variables in-situ" was very hard to do, by the way, because C# doesn't let you write sequences like int x; int.Parse(s, out x) _as an expression_.

Therefore, creating ‘out’ variables in-situ required what is essentially an entire compiler pass, implemented as a 605-line macro written in EC#, that eliminates 'sequence expressions' and variable declarations in expressions. Its output in this case is

int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;
int? Parse(string s) { int x; return int.Parse(s, out x) ? (int?) x : null; }

Note that writing out var x doesn't work, since var x; becomes a compiler error.

To invoke this "compiler pass" one can invoke #useSequenceExpressions at the top of the source file; but as a shortcut one can write #ecs; which enables _all_ EC# features that require a macro at the top of the file (there are currently two such macros).

qwertie on 27 Nov 2016

@qwertie Well, if macros really know nothing about the language, how would a macro processor distinguish between a code line and a comment? Isn't it forced to use just text replacement with all its awkwardness (akin to C preprocessor)?

vladd on 27 Nov 2016

@vladd the macro processor is given a language-independent syntax tree. That tree may have come out of the EC# parser, or some other parser.

@CyrusNajmabadi

I still don't see how debugging, navigation, refactoring, etc, work with any of this... For example, any code that referenced "FieldName" would be broken in any sort of refactoring scenario. Can you help clarify how this would work?

We've talked about this. Renaming the field would have to either fail or force expansion of the macro that generated it... I just considered the latter problem briefly and the problem of _limiting_ how much expansion occurs (avoiding expanding all/most macros in the file) is a bit vexing too.

Let's take stock of my current thinking about "what if EC#-style macros were redesigned for Roslyn?", keeping in mind that I don't know much about Roslyn internals. I don't see how I can give firm answers on some of these issues without building some kind of prototype.

Speed: Macros _must_ be executed to get accurate and reliable intellisense, so we'd need something to monitor performance to assign blame for performance issues, some mechanism to bypass badly-behaved macros in intellisense (despite the resulting breakage). Macros must never run on the UI thread. If a macro effectively surrounds a large code region (like the namespace Foo macro I mentioned - but not the using macro), perhaps some incremental update opportunities would be lost, but perhaps the parser could always remain incremental?
Mitigations: There could be two expansion modes, "IDE mode" for speed and "build mode" for the "real" expansion. (When running a macro, you could check whether the _macro_ checks if it's in IDE mode, to unlock an optimization where macros that _don't_ check don't have to be run twice to get the two different kinds of expansion.) I'm thinking the IDE will never show the IDE-mode expansion to the user; showing the expansion forces build mode. Also, the IDE could learn to deprioritize slow files for intellisense operations that can work before analysis is completed.
Memory: Keeping expanded versions of all code could hog memory if we're not careful, but often there's lots of overlap between the original code and the expanded code. For instance, given namespace Foo; followed by a class, I certainly hope the _original_ class node (and the original namespace ident) can be included in the macro's output. (If the expansion is later shown on the screen, e.g. in a pop-out drawer, it may be necessary to generate a slew of extra objects to keep track of the locations of nodes in the textual expansion being displayed.) As for synthetic output of macros, that code would often have been hand-written in C# before macros came along, so it might shrink the source code as often as it raises the memory footprint.
Navigation: no doubt I haven't considered all the issues in this area.... anyway, IDE mode implies slow macros leave out some of their output, so "Find All References" could miss references appearing in expanded code. The only truly slow macro I can think of is LLLPG; at the cost of parsing the user's grammar, it could regurgitate enough data in its _fake_ output to keep FAR working perfectly (the output doesn't need to make sense, it just needs to include the same references that will exist in the real output). It must be noted that when a macro includes nodes from the source file in its output, those nodes still point to the original source code, so mapping from expanded code to original source is easy, except for synthetic nodes.
Refactor 'rename': performed on expanded code - (I suspect IDE-mode expansion is enough, if macro authors know what they're doing). Rename would also map its changes to the original code. If a change can't be mapped it's not always fatal, but a change that _can_ be mapped may cause unacceptable side effects. In principle it should be possible to "force" a successful rename by expanding one or more macro(s), but limiting the "damage" (expansions) looks tricky. Renames would be slower in the presence of macros.
Refactor 'extract method': could be performed on original or expanded code. Must be performed on expanded code (build-mode) to preserve semantics reliably, but the changes won't always map back to the original code. Again, I think the refactor can be forced by expanding macros. Slowness should be limited since only a single file is affected.
Debugging: I don't know, lots of issues to consider, but I'm optimistic that a fairly good experience debugging the original code is possible much of the time. If debugging proves difficult, the user should have a way to debug expanded code - I'm thinking, not on the fly, but as a build option. Presumably it would disable edit-and-continue since we can't map arbitrary changes back to the original code.
Syntax: A pleasant macro system would require a number of syntax changes similar to those in EC#. I'd suggest especially: merging the syntax of top-level statements, declaration statements, executable statements and property statements; adding expressions of the forms foo {...} and foo (...) {...}; the substitution operator $, which can appear whenever types or names are expected, and whose precedence in normal expressions is above all others; attributes on arbitrary statements (if not expressions); token literals; and this as an alternate name for a constructor, so that constructors can be written that are not lexically children of a class or struct. Macros also need "sequence expressions" (the ; operator that I heard was considered for C# but not added).

The EC# macro processor doesn't currently have a concept of attribute macros, which is a performance issue (e.g. contract macros have to scan every method signature for contracts even though perhaps no contracts exist anywhere) and also a composability issue (two macros that recognize two different attributes on methods pretty much have to be specifically designed to get along or else it's not possible to combine them on a single method.) I originally planned (but did not implement) a syntax [[foo]] which resembles an attribute but is meant to call macros; however, such a feature doesn't solve a certain problem with contract attributes that I won't get into, because this is getting long.

My first wedding anniversary just started, so TTYL :)

qwertie on 27 Nov 2016

Btw there are some things that would greatly improve the experience of using EC# in Visual Studio. Where should I go for help?

Right now it's a single-file generator using an ancient COM interface. It is NOT a vsix because SFGs require registry settings to be written and I couldn't figure out how to write registry settings with a vsix. So before I can do anything interesting, I have to escape this madness.
Assuming I can't replace Roslyn's front end with EC# and have to keep using the Single-File Generator approach... I really need the generated code to be read-only. So many times I have accidentally modified it.
It seems possible to construct a bidirectional mapping to help users navigate between the generated code and the original code. I'd like to project Roslyn error messages back to the .ecs file; define a command for navigating between the two files; somehow invoke Roslyn dot-completion in the .ecs file; and maybe, just maybe, attempt to map renames back to the ecs file.

qwertie on 27 Nov 2016

Sorry if I wasn't clear. In my system, syntax is completely orthogonal to the macro system; the parser knows nothing about macros, and the macro processor knows nothing about syntax (in fact, it knows nothing about C#). ... The parser produces a programming-language-independent tree called a Loyc tree, and the macro processor is looking at the target of every "call" in that tree

Sorry, this is getting more confusing. Again, how would the parser actually parse:

c# ImplementNotifyPropertyChanged { public string CustomerName { get; set; } public object AdditionalData { get; set; } public string CompanyName { get; set; } public string PhoneNumber { get; set; } }

If the parser knows nothing about macros, then the above code would end up with very broken syntax. It certainly wouldn't create 'call' nodes that a macro processor could handle.

CyrusNajmabadi on 28 Nov 2016

Supporting "out variables in-situ" was very hard to do, by the way, because C# doesn't let you write sequences like int x; int.Parse(s, out x) as an expression.

I still don't know how you even got pass the parsing phase. How did you parse the actual code that contains an 'out var'. i don't care how you transformed it. I just care how you actually parsed it. You said that your system required adding no new syntax. So i'm trying to wrap my head around how you handled a construct which is, by its very nature, new syntax :)

CyrusNajmabadi on 28 Nov 2016

Mitigations: There could be two expansion modes, "IDE mode" for speed and "build mode" for the "real" expansion.

This violates a core design goal of roslyn. That the experience you get in the IDE is the same as you get when you build. We want to ensure that you never experience a situation where the IDE tells you one thing, but the build tells you another. If Macros make that unfeasible, then it would have to provide an absolutely enormous amount of value to make them worthwhile.

CyrusNajmabadi on 28 Nov 2016

Btw there are some things that would greatly improve the experience of using EC# in Visual Studio. Where should I go for help?

You're treading entirely new ground. You can stay here for help. I'll try the best i can, but what you're asking for may take an enormous amount of work**

** Which is what i was saying before. As we tried going down this path ourselves, we realized it would be many devs spread over many teams in order to accomplish this. This is not a small amount of work. It needs design and resources spread over the entire product. And it needs a huge amount of buy-in in order to get the minimal-viable-product developed.

CyrusNajmabadi on 28 Nov 2016

Debugging: I don't know, lots of issues to consider, but I'm optimistic that a fairly good experience debugging the original code is possible much of the time. If debugging proves difficult, the user should have a way to debug expanded code

You've now made the design space much larger. Saying things like "should have a way" effectively means we need to design and cost precisely that solution. If that solution is necessary for a "minimum viable product" then that has to be factored in. If that solution requires work from other teams (like the debugger team), then that has to be established up front so we can know if we can get all work approved before starting anything.

CyrusNajmabadi on 28 Nov 2016

That tree may have come out of the EC# parser, or some other parser.

Presumably for Roslyn it would come from the Roslyn parser. So i ask again, what is the syntax for Macros that Roslyn would have to recognize and parse in C# code (we'll get to VB later)?

CyrusNajmabadi on 28 Nov 2016

Navigation: no doubt I haven't considered all the issues in this area.... anyway, IDE mode implies slow macros leave out some of their output, so "Find All References" could miss references appearing in expanded code.

This would be very concerning. A core value proposition of Roslyn is that is enables the types of accurate features that people can depend on. If that value prop goes out when people use Macros, then it very much undermines core principles and values that we're trying to deliver and that people expect to have.

We do not introduce new language feature without strongly considering the experience they will have in the IDE experience. Indeed, that's exactly the role i serve on the language design team. My primary purpose there is to ensure that we can introduce new features in a manner whereby they serve both language goals and IDE goals. Right now the Macros, as you've described them, come with an enormous number of 'take backs' in terms of the bar we've set for C#/VB. We would have to either resolve those issues, or decide that macros were worth lowering our bar. Both of these seem difficult :)

CyrusNajmabadi on 28 Nov 2016

👍1

A pleasant macro system would require a number of syntax changes similar to those in EC#. I'd suggest especially: merging the syntax of top-level statements, declaration statements, executable statements and property statements

I remain very confused. You previously said that your system required no syntax changes. i.e. "Enhanced C# does not allow new syntax." If you do not allow new syntax... why are you now saying that you would require a number of syntax changes. And if you don't allow new syntax, how did you accomplish things like supporting out-var?

I really can't reconcile many of these statements that seem contradictory. For now, i'm going to assume you do require new syntax (as indicated in the first line). If so, please indicate what your syntax additions actually are. For example, what is your syntax addition that enable the INotifyPropertyChanged code that you mentioned already. What is your syntax addition that enables a user to provide 'out-var's through macros? etc. etc.

CyrusNajmabadi on 28 Nov 2016

@qwertie Currently, i find this 'proposal' to be far too massive, disorganized, and unclear. I think a way forward would be to start over with new proposals that have very small scope. i.e. "I would propose these specific syntax changes to the language. Here are the grammar changes for it, and what purpose it would serve."

We could then discuss each individual piece fully, ensuring htat we'd thought through all the issues and concerns of each one. Right now the enormity of everything you're discussing here, and the jumping around from ideas and issues is clouding any progress here. (I mean...i still don't actually know what you're actually proposing, let alone how to deal with all the issues that could arise).

Starting just with syntax will be helpful as it will ground things and will help understand waht sort of code the user could create and then what later processing systems could do with it.

CyrusNajmabadi on 28 Nov 2016

Final note: It's unclear to me what value these macros have over our original SourceGenerator proposals. The benefit of the SourceGenerator approach was that you could take in C# code, manipulate it (using normal Roslyn APIs) and just produce new trees that the rest of the pipeline would operate on. There was no need for a new 'macro language' for manipulating trees. The macro language was just any .net code that wanted to operate on Roslyn's object model.

Such an approach was possible without adding any new syntax to C# at all. Your proposal seems to indicate that you would be able to do things that would traditionally require new syntax (like primary-constructors, or out-vars), but it's still unclear to me how that would work. And, if your approach does not allow for new syntax, it's unclear to me what value your system would have over what we were looking at.

CyrusNajmabadi on 28 Nov 2016

How did you parse the actual code that contains an 'out var'. i don't care how you transformed it. I just care how you actually parsed it. You said that your system required adding no new syntax.

I feel like you must have missed the message in which I talked about the fact that I added _lots_ of new syntax to EC#. Some of that syntax would make sense without a macro system; some of it would not.

qwertie on 28 Nov 2016

I'm definitely quite confused (as several messages seem contradictory)**. But, for now, i'm going to go with the explicit claim that new syntax is required and that you introduced new syntax to support these features.

If that's the case, and you required syntax changes to be able to support 'out-var' then why would i need macros in order to support out-var? What do macros buy me? Since i had to introduce the new syntax for out-var in the first place... why would i then use macros to implement out-var?

** (Again, this is why i'd lke a new thread that starts with precisely the set of syntactic changes you want in the language to support your proposal).

CyrusNajmabadi on 28 Nov 2016

I probably confused you by saying "the parser knows nothing about macros". Sorry about that. In my own mind the syntax is independent, because the parser can do whatever, it's just making a tree, and whether there's a macro system running after it or some other system doesn't matter to the parser. But understandably you don't think about it the same way - you think of C# as a single integrated thing, where certain changes to the parser were designed for the macro system and therefore the parser "knows" about macros. So, sorry for that. Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser. Edit: e.g. one of the things I'd like to do someday is take various other parsers - Python, C++ - and hook them up to the macro processor.

qwertie on 28 Nov 2016

Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser.

HOW? If the parser does not change, then how do you handle things like your INotifyPropertyChanged example?

The syntax you presented would be rejected by the C# parser. And if it was rejected any sort of 'processor' would have a heck of a time trying to do anything with the tree we produced.

CyrusNajmabadi on 28 Nov 2016

In my own mind the syntax is independent

How can the syntax be independent? If Macros run on the tree hte parser produces, then the parser has to understand some sort of Macro syntax so it can generate the right sort of nodes that the Macro processor will run on. If it doesn't, then the tree is going to be massively broken, and it will be enormously painful for any sort of processor to have to work on that tree.

CyrusNajmabadi on 28 Nov 2016

Without changes to the parser, you'd have to make do and write it with syntax that already exists, maybe something like

class ImplementNotifyPropertyChanged {
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

The replace macro would similarly have to be designed to "make do". It would be pretty ugly, but doable.

qwertie on 28 Nov 2016

To make an analogy, List<T> can hold objects of type Foo without having any awareness of Foo. The macro processor doesn't have generic type parameters, but it does process LNode objects, which are language-independent. So in that sense it processes C# without knowing anything about C#.

qwertie on 28 Nov 2016

What are LNode objects? What information do they contain? How does one get one?

CyrusNajmabadi on 28 Nov 2016

LNode is the .NET implementation of Loyc trees. The API is described here.

qwertie on 28 Nov 2016

@qwertie I tried to read this post few times myself and while I understand most of what you're saying I really, strongly recommend you to start fresh and create a new issue, explaining things in the following manner:

This is the problem.
This is the solution.
This is the syntax.
This is an example of a macro.
This is how it's used at the _callsite_.
This is the generated code.

In my opinion you shouldn't even think about EC# when describing this at all, this would make it a lot easier to understand and a allow @CyrusNajmabadi and others to see how this fits within Roslyn, if ever.

eyalsk on 28 Nov 2016

👍1

I'm confused again. Are you saying Roslyn would be translating nodes into some other API and calling into that to do work? That sounds quite expensive. Trees can be huge, and we already do a tremendous amount of work to not realize them, and to be able to throw large parts of them away when possible.

CyrusNajmabadi on 28 Nov 2016

Agreed. I'm getting high level ideas and concepts. But when i try to dive deeper, i'm seeing contradictions and not-fully-fleshed-out ideas.

Most of this feels like you have grand ideas in your head, and you're giving quick sketches based on assumptions that are scattered around in a whole host of places :)

Condensing and focusing would make this conversation much simpler.

CyrusNajmabadi on 28 Nov 2016

👍2

I've been switching back and forth between two tasks - if I thought you were asking me about how EC#/LeMP works then I described EC#/LeMP. But you've also been asking about the IDE experience and things like that, so for those questions I've switched gears and tried to figure out (mostly on the fly) how one would, in broad strokes, translate concepts from LeMP to Roslyn. So this conversation is sort-of two conversations interleaved, and which would be bewildering if you're not mentally distinguishing the two or if you haven't understood the EC#/LeMP side of things. Probably at certain points I didn't explain some things well enough, and I'm sorry about that. This got pretty long so I think we should start a new thread, but right now I need to go on a anniversary trip with my wife.

qwertie on 28 Nov 2016

I've been switching back and forth between two task

I think that switch was not clear enough for me :D And it would be better to just discuss specifically what we would want to do with Roslyn and C# here.

but right now I need to go on a anniversary trip with my wife.

Congrats! I look forward to hearing from you once you get back!

CyrusNajmabadi on 28 Nov 2016

👍1

Hi everyone. I'm a small-time EC# contributor, and I'm currently working on ecsc, a command-line EC# compiler. I'm not as knowledgeable about EC# and LeMP as @qwertie, but I thought I'd try and shed some light on how macros work in EC# – perhaps a different perspective can be helpful. I'll try to explain what LNodes are, what the parser does, and what the macro processor (LeMP) does.

`LNode`s

EC#'s syntax trees are represented as LNode instances. An LNode can be one of the following:

An Id node, which represents an identifier. An identifier can be any string (technically, identifiers are encoded as Symbol instances, but that's not very relevant here). x and foo are valid Id nodes, but so are things like #class, #interface and #import. Identifiers that are prefixed by hashtags are called _special_ identifiers. They don't get special treatment per se, but they are used (by convention) to encode language constructs as call nodes. More on that in the next bullet.
A call node. Call nodes are conceptually just a simple call. They consist of a call target LNode, and a list of attribute LNodes. For example, f(x) is a valid call node. But call nodes only get really interesting when a special identifier is used as the call target; they are used to represent all C# language constructs other than identifiers and literals. For example, using System; is represented as #import(System): a call to the #import Id node with the System Id node as its argument.
A literal node. These are simple literals, such as 1.0, 0, '\n' and "Hello, world!".

Every LNode also has a list of attributes, which are also encoded as LNode instances. Attriibute lists are empty most of the time, though.

It is worth noting at this point that there is no such thing as an "invalid" LNode. For example, #if(f(x)) makes no sense – it's an if statement with neither a 'then' nor an 'else' clause – but it's a perfectly legal LNode, because an LNode is just a data structure. It does _not_ have some implicit meaning.

In ecsc, nonsensical syntax trees like #if(f(x)) are only caught by the semantic analysis/IRgen phase. This differs from how C# traditionally operates, i.e., every statement has well-defined semantics from the get-go.

The parser

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

The EC# parser is a relatively simple tool. It takes source code as input, and produces a list of LNodes as output. It does this according to a number of rules. These make the statement below legal (though they don't assign any semantics to it).

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

ecsc has a pair of options (-E -syntax-format=les) that can be used to coerce it to print the syntax tree. Technically speaking -E will expand macros first, and then print the syntax tree. But I haven't defined ImplementNotifyPropertyChanged in this context, so it won't get expanded.

$ ecsc ImplementNotifyPropertyChanged.ecs -platform clr -E -syntax-format=les -fsyntax-only
'ImplementNotifyPropertyChanged.ecs' after macro expansion: 
ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});


ImplementNotifyPropertyChanged.ecs:1:1: error: unknown node: syntax node 'ImplementNotifyPropertyChanged' cannot be analyzed because its node type is unknown. (in this context)

    ImplementNotifyPropertyChanged
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Again, let me stress that ImplementNotifyPropertyChanged gets parsed fine. The compiler only flags it as an error when it notices that all macros have been expanded and it doesn't know what an ImplementNotifyPropertyChanged node's semantics are.

The macro processor, LeMP

LeMP takes a list of LNodes as input, and produces a list of LNodes as output. Macros are used to do this transformation, but it might as well be a black box from a compiler pipeline perspective – it's not tied to any other component in the compiler.

Anyway, the basic idea is that LeMP's input contains nodes which the semantic analysis pass doesn't understand, and macros then transform those nodes. LeMP's output (hopefully) consist of nodes that semantic analysis understands completely. So the way it works is: the parser produces a syntax tree which need not have fixed semantics, and macro expansion is that syntax tree's one and only chance to get its act together before the semantic analysis pass converts it into compiler IR.

I'd love to show you the expanded version of @qwertie's ImplementNotifyPropertyChanged example, but I can't do that at the moment because ecsc relies on the Loyc NuGet package instead of the EC# master branch; replace inline macro definitions are a relatively new feature in LeMP. Sorry about that.

I can show you how an ADT is expanded though. Consider the following example:

public abstract alt class Option<T>
{
    public alt None<T>();
    public alt Some<T>(T Value);
}

Without macro expansion, this gets parsed as:

@[#public, #abstract, @[#trivia_wordAttribute] #alt] #class(#of(Option, T), #(), {
    @[#public] #fn(alt, #of(None, T), #());
    @[#public] #fn(alt, #of(Some, T), #(#var(T, Value)));
});

We can force macro expansion by adding using LeMP; to the top of the file. That'll make LeMP import its standard macros. The resulting syntax tree is

#import(LeMP);
@[#public, #abstract] #class(#of(Option, T), #(), {
    @[#public] #cons(@``, Option, #(), {
        });
});
@[#public] #class(#of(None, T), #(#of(Option, T)), {
    @[#public] #cons(@``, None, #(), {
        });
});
@[#public] #class(#of(Some, T), #(#of(Option, T)), {
    @[#public] #cons(@``, Some, #(#var(T, Value)), {
        #this.Value = Value;
    });
    @[#public] #property(T, Value, @``, {
        get;
        @[#private] set;
    });
    @[#public] #fn(#of(Some, T), WithValue, #(#var(T, newValue)), {
        #return(#new(#of(Some, T)(newValue)));
    });
    @[System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never), #public] #property(T, Item1, @``, {
        get({
            #return(Value);
        });
    });
});
@[#public, #static, @[#trivia_wordAttribute] #partial] #class(Some, #(), {
    @[#public, #static] #fn(#of(Some, T), #of(New, T), #(#var(T, Value)), {
        #return(#new(#of(Some, T)(Value)));
    });
});

How is this in any way relevant to Roslyn?

¯\_(ツ)_/¯

I just thought I'd give you some background. That's all. :)

jonathanvdc on 28 Nov 2016

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

I can't reconcile this with the code examples given. If the parser is unware of 'macros' how could it successfully parse:

This is not legal C#. If you tried to parse this today then the parser would 'go off the rails', creating tons of skipped tokens and missing tokens. If that's the case, then the transformation step would have a heck of a time trying to figure out what happened. If you want a good tree, then the parser is going to need to know about macros.

Or, alternatively, we can use the approach we took with SourceGEnerators. Namely that we used an existing piece of syntax (i.e. '[attributes]'), to mark where we wanted generators to run. But if it isn't an existing piece of syntax, then i'm not sure how the system can work without the parser having to know about the syntax of these guys.

CyrusNajmabadi on 28 Nov 2016

Right. So the thing is that the EC# parser doesn't think about what it's parsing in the same way a traditional parser – like Roslyn's C# parser – does.

IIRC, the EC# grammar defines something called block-calls, and what you're seeing is really just an example of that. Basically, anything that looks like identifier { ... } gets parsed as a call node: identifier({ ... }). The parser doesn't stop and consider if the syntax tree is _meaningful:_ only macros and semantic analysis can define a syntax tree's semantics.

Macros don't define new syntax. They merely transform the parse tree in a way that assigns semantics to constructs that don't have semantics yet. The EC# parser was designed with macros in mind – which is exactly why it successfully parses source code that is meaningless without a macro processor – but it doesn't interact with the macros. It just builds a syntax tree, and leaves the task of transforming said tree to the macros.

So the EC# parser will parse the example you listed as exactly this.

ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});

And that will work even if no macro called ImplementNotifyPropertyChanged is defined – in fact, the parser isn't even aware of which macros are defined when it is parsing away at the source code.

I understand that this can be hard to wrap your head around. But you should really try to think of the EC# parser as something that parses _data_ rather than _code,_ akin to an XML parser. An XML parser will happily parse <CompilerOption key="out" value="bin/Program.exe" />, despite the fact that it has no idea of what a CompilerOption node's semantics are. It's entirely up to the program that runs the XML parser to make sense of what a CompilerOption node is.

Similarly, the EC# grammar defines legal constructs whose semantics are to be defined by the user, in macro form. The parser mindlessly parses its input according to the grammar, and then hands the syntax tree off to the macro processing phase. That's all there is to it, really. Conceptually, it's a pretty dumb system, but it works beautifully.

jonathanvdc on 28 Nov 2016

IIRC, the EC# grammar defines something called block-calls

...

Ok. So there is new syntax defined, and the parser does need to be aware of this :)

Macros don't define new syntax.

I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

CyrusNajmabadi on 28 Nov 2016

The parser doesn't stop and consider if the syntax tree is meaningful:

By and large, neither does Roslyn's parser**. But the parser still needs to know what syntax is valid or not. It needs to know what language constructs are in the language. And so it needs to know what the syntax is for macros. Otherwise, it will be completely thrown off when it see these constructs. I mean, you don't have to take my word for it. Just toss the above syntax into a file and you'll get errors like:

Severity    Code    Description Project File    Line    Suppression State
Error   CS1022  Type or namespace definition, or end-of-file expected   
Error   CS1022  Type or namespace definition, or end-of-file expected   
Error   CS0116  A namespace cannot directly contain members such as fields or methods   
Error   CS0116  A namespace cannot directly contain members such as fields or methods   
Error   CS0116  A namespace cannot directly contain members such as fields or methods

** Technically not true. But all the cases where Roslyn's parser does this, should be moved out to higher layers. This is what i did when i wrote the TS parser. There's no need for that stuff to live in hte parser. It's just there for legacy reasons.

CyrusNajmabadi on 28 Nov 2016

Ok. So there is new syntax defined, and the parser does need to be aware of this :)

Yes, absolutely. EC# defines new syntax. But that has nothing to do with the ImplementNotifyPropertyChanged macro in particular.

I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

Yeah. As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros. But macros can operate on _any syntax node._ Heck, a macro can even transform syntax nodes that already have well-defined semantics today. In fact, ecsc implements foreach as a macro.

So I'd much rather say that identifier { ... } is _a_ syntax to make using macros easier, but it's not _the_ syntax, because EC# macros can operate on any syntax.

Does that clarify things a little? :)

jonathanvdc on 28 Nov 2016

Macros don't define new syntax.

I don't understand. You just said the syntax for macros was:

Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros.

That's basically true, but indirectly. So here's the whole story.

I decided that, unlike some existing languages with LISP-style macro systems, I wanted a macro system in which macros would _not_ add new syntax, because I believed parsers should be able to succeed without awareness of macros. Also, as a C++ programmer I was well aware that the C++ parser was linked to the symbol table - in general, C++ is ambiguous and requires a symbol table to resolve those ambiguities. Even if C++ didn't have #define macros, the situation would be analogous to languages where macros define syntax. For example, the statement X * Y; may be a multiplication or a pointer declaration depending on whether X is a type. This has at least two disadvantages:

An inefficient linear parsing system, where all #include files must be parsed before the machine can parse the main file. Moreover if one source file says #include "X", and another says #include "W" then #include "X", the parser must parse "X" twice, since the contents of "W" can affect the interpretation of "X". (c.f. EC#'s includeFile macro, where the included file is parsed _after_ the main file)
If the included files are not available, parsing can't be done properly. In practice IDEs will try to "fake it", but parsing must be repeated if the included files are discovered later, and I was concerned that if macros became a important core feature (unlike in C++ where macros are of limited use and the ambiguity I mentioned about only happens occasionally), their use of custom syntax would be a serious problem for the IDE.

Also, if macros can define new syntax then their meaning can be slightly harder to guess. By analogy, we could view unknown macros the way we view foreign languages. Consider trying to read Spanish vs Tagalog. You don't _really_ understand either language, but Spanish has both words and grammar that are more similar to English, so you can glean more information from a Spanish text than a Tagalog text - perhaps you can even guess the meaning correctly. If macros can add arbitrary syntax, then when you look at an unknown macro you don't even know _to what extent_ custom syntax has been added. So if you see something like "myMacro foo + bar;" then _probably_ the macro accepts an expression, but you can't be sure; it's really just a list of tokens, and usually in these systems, you can't even know whether the semicolon marks the end of the macro or if it keeps going after that.

So instead I decided to preserve C#'s tradition of "context-free" parsing by ensuring every source file can be parsed without knowledge of macros. However, if macros wouldn't be allowed to add syntax then they would require changes to the language, such that the existing syntax was usually sufficient for them. This new syntax should be useful for multiple unforeseen purposes, and also consistent with the existing flavor of C#.

My main strategy was to "generalize" C#. Part of this generalization was taking the existing syntactic ideas of C# and extending their patterns in a logical way. Here are some examples:

Numerous statements can begin with "modifiers" like public, abstract, readonly (you can also think of ref and out as being in this category). I noticed that, upon seeing a modifier, the parser cannot know what kind of statement it is attached to. So it is forced to skip past all modifiers, examine whatever comes after, and _only then_ decide whether the modifiers are valid on the construct to which they are applied. Simply by _not_ checking "is this modifier valid on this construct?" (i.e. waiting until semantic analysis) the parser can accept any modifier on any construct (and since my syntax tree has an attribute list on every node, the parser always has a place to save the modifiers.)
Properties have get {...} and set {...} while events have add {...} and remove {...}. Generalizing this pattern, we get any_identifier {...}
Consider the built-in constructs if (...) {...}, lock (...) {...}, while (...) {...}. Generalizing, we get any_identifier (...) {...}
Given contextual keywords like partial and yield, I observed that really _any_ identifier could act as a contextual keyword, so EC# _does_ treat any identifier as a contextual keyword if possible. (However, I actually made a mistake - I didn't notice that, for example, partial public class Foo was illegal. I incorrectly thought of contextual keywords as a kind of modifier rather than as something that comes after modifiers; thus my parser currently accepts partial public class Foo.)

As I designed this, I had very few actual macros in mind. For instance, remember alt class BinaryTree<T>? I generalized "contextual keywords" long before I thought of creating alt class. The historical precedent seemed compelling enough by itself, e.g. partial, yield, async (not to mention add, remove, etc.) demonstrate the value of contextual keywords. And obviously, the C# team would always design new syntax in a way that is consistent with old syntax, so it made sense to "entrench" any obvious patterns that were developing - making them available both to future features in the compiler itself, and macro authors as well.

Another part of "generalizing C#" was "squashing" multiple grammar productions together. In part this was to give macros flexibility, but I also wanted to make the EC# parser _simpler_, or at least no more complex, than the C# parser. (Currently it totals 2500 lines including about 500 lines of comments - or 5600 lines including 800 comments after LeMP expands it. Roslyn's C# parser is about 10,000 lines with 800 comments, though it's not fair to directly compare since, for example, my parser still lacks LINQ, while Roslyn has more blank lines and is more paranoid due to its use in an IDE.)

The existing C# grammar defines "top level of a file", "contents of a class", "contents of a property" and "contents of a method" as separate contexts, each with their own grammar. I observed that I could squash those contexts together so that you could write a class with if statements in it, a property with a statement directly inside it (not inside get { }), or a method with another method inside it.
All the "space" constructs - namespace, class, struct, interface, enum - have similar syntax, so I combined them.
I figured that macros might want unusual syntax inside the "formal" argument list of a method. I also thought it would be neat if you could define variables in expressions, like in C++ where you can write if (Foo x = y) - note that this is completely unrelated to macros, it's a separate thing that seemed useful in its own right, except that in C# it would have to be if ((Foo x = y) != null) instead. At first I thought this feature wasn't possible in C# because Foo(Dictionary<K,V> x) would be ambiguous (is it a variable declaration or two separate arguments?) But then I realized that if the variable is assigned a value, like Foo(Dictionary<K,V> x = null), it's not "really" ambiguous since the expression V> x = null could never compile. So, I decided to squash the "expression" syntax together with the "formal parameter list" syntax. There would still _formally_ be two different kinds of expressions - one that requires variables to be assigned with =, and another that does not, but a single expression parser can handle both situations. This gave me four birds with one stone:
- arbitrary expressions in formal argument lists (potentially useful for macros)
- attributes on any expression (potentially useful for macros)
- variable declarations in expressions
- out-variable-declarations like TryParse(s, out int x) (these last two are meant to be directly implemented in a compiler, but are implemented as a macro since there's no complete EC# compiler.)

Finally, I realized that "generalized C#" by itself isn't sufficient for all macros, so I added a few more things:

The substitution operator $, which is crucial for macros like replace
Token literals: @{ tree of tokens (parens, [square brackets] and {braces} must be balanced.) }. This is not the same as letting macros create syntax, since the whole file is parsed before any macros get involved. It's a literal, like a string.
Custom unary and binary operators in `backticks` (e.g. x `<=>` y or pehaps x `X` y for cross-products). I saw the "backquote operator" both as an extension point for macros, and as a generalization of the existing concept of overloaded operators (thus, the parser also accepts static int operator`plus`(int x, int y) { return x+y; }).

qwertie on 29 Nov 2016

Macros don't define new syntax.
I don't understand. You just said the syntax for macros was:
Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

...

If you defined new syntax for macros... then macros did indeed define new syntax.

c# does not contain this grammar production. In order for the c# parser to parse out macros, it would need to understand this new syntax. I do not see how we can do macros (like you do them) without defining new syntax here.

CyrusNajmabadi on 29 Nov 2016

All the "space" constructs - namespace, class, struct, interface, enum - have similar syntax, so I combined them.

There is a tension here. We've avoided overlapping things when there are significant deviations between the forms. For example, namespaces can have dotted names. The rest can't. If the node supports dotted names here, that means that all downstream consumers either need to figure out what to do when they encounter a dotted name in any of these other entities. Alternatively, the parser might never accept dotted names for the rest, but now everyone needs to know that they should assume the name is never dotted. The node is no longer a source for confident information about what you might get.

There's the question of 'when does this end' as well? After all, methods/properties creates 'spaces' (i.e. where locals and whatnot live). Should we merge methods with the above list? You could just have the above list then have an optional parameter list before the braces...

At the end of the day, you could try to merge everything into one type (i've seen systems that do this). Pros are that you only ever deal with one type. Cons are the amount of information you need to handle.

CyrusNajmabadi on 29 Nov 2016

Finally, i guess i'm just not seeing what purpose macros actually serve over the SourceGenerator proposal. As you've mentioned, they cannot introduce new syntax. So all they can do is take existing syntax and manipulate it, to produce new syntax. But that's what SourceGenerators did. That's something Roslyn is optimized for, as it allows very extensible transformation of Syntax.

The problem was not in making it possible for people to manipulate syntax (we have plenty of experience and features that do that today). The problems stemmed from how you make a cohesive, fast, and trustworthy set of tools when this is a fundamental building block of your system.

Because source-transformation is now a core primitive, we have to assume it will be used pervasively by many. And that means every single feature we build into the product needs to work well with these features.

CyrusNajmabadi on 29 Nov 2016

We are now taking language feature discussion in other repositories:

https://github.com/dotnet/csharplang for C# specific issues
https://github.com/dotnet/vblang for VB-specific features
https://github.com/dotnet/csharplang for features that affect both languages

Features that are under active design or development, or which are "championed" by someone on the language design team, have already been moved either as issues or as checked-in design documents. For example, the proposal in this repo "Proposal: Partial interface implementation a.k.a. Traits" (issue 16139 and a few other issues that request the same thing) are now tracked by the language team at issue 52 in https://github.com/dotnet/csharplang/issues, and there is a draft spec at https://github.com/dotnet/csharplang/blob/master/proposals/default-interface-methods.md and further discussion at issue 288 in https://github.com/dotnet/csharplang/issues. Prototyping of the compiler portion of language features is still tracked here; see, for example, https://github.com/dotnet/roslyn/tree/features/DefaultInterfaceImplementation and issue 17952.

In order to facilitate that transition, we have started closing language design discussions from the roslyn repo with a note briefly explaining why. When we are aware of an existing discussion for the feature already in the new repo, we are adding a link to that. But we're not adding new issues to the new repos for existing discussions in this repo that the language design team does not currently envision taking on. Our intent is to eventually close the language design issues in the Roslyn repo and encourage discussion in one of the new repos instead.

Our intent is not to shut down discussion on language design - you can still continue discussion on the closed issues if you want - but rather we would like to encourage people to move discussion to where we are more likely to be paying attention (the new repo), or to abandon discussions that are no longer of interest to you.

If you happen to notice that one of the closed issues has a relevant issue in the new repo, and we have not added a link to the new issue, we would appreciate you providing a link from the old to the new discussion. That way people who are still interested in the discussion can start paying attention to the new issue.

Also, we'd welcome any ideas you might have on how we could better manage the transition. Comments and discussion about closing and/or moving issues should be directed to https://github.com/dotnet/roslyn/issues/18002. Comments and discussion about this issue can take place here or on an issue in the relevant repo.

You may find that the original/replace code generation feature tracked at https://github.com/dotnet/csharplang/issues/107 is related to this proposal.