Roslyn: Reboot build chain codegen

Created on 15 May 2017  ·  119Comments  ·  Source: dotnet/roslyn

There have been lots of thoughts of compilation extensibility points for .NET, including "dnx compile modules", "roslyn generators", etc. The reality is: due to various factors including complexity, changing needs (IDE support? just the compiler chain?), and time pressures (getting vWhatever shipped), the result of this is: nothing has changed. There's a charplang feature that is going nowhere, and the build-time options are all closed. It is acknowledged that the IDE side of things (i.e. the current hypothetical approach) is probably prohibitively expensive, so: that almost certainly won't happen.

It is also perceived as a niche feature, which subtracts from the perceived need.

My aim, here, is to challenge the line above. While I fully agree that it is a minority feature in terms of counting the implementors, I propose that it is not a niche feature in terms of reach. Quite the opposite, especially when we consider that netstandard now happily reaches into a lot of platforms that don't have active meta-programming support (no JIT, AOT-only).

There are any number of scenarios that can benefit here, but due to my projects, the ones that leap most immediately into my mind are things like:

  • serializers (Json.Net, protobuf-net, etc)
  • ORM-like tools (EF, dapper, etc)
  • RPC tools for implementing stubs
  • UI templating tools (razor, etc)

Additionally, deferring meta-programming to runtime even on those platforms that support it means there is unnecessary overhead at runtime, doing something every startup that could be done once and forgotten. In some cases, this reflection/emit work can be quite considerable. For platforms that don't support it, you have to use reflection workarounds, so you have an entire additional codebase / implementation to support and maintain, plus poor performance.

The point is: just about every single app is going to need either data, or UI - probably both. And every app wants to be performant. The number of implementors of these tools is indeed minimal, but the reach is immense.

The current status-quo leaves poor options. Additionally, with the ever-increasing trend towards async, even on platforms that support runtime emit, it is less and less desirable to use the emit API, as it is tortuous to correctly implement async code in this way. A lot of work has been done in the language to solve the complex problems of async, and it makes perfect sense to use the roslyn chain to do that injection.

How can we pragmatically move forward these needs, without drowning in the big goal complexity that has stalled "generators"? Would rebooting the "compile modules" work be viable?

(also: paging @jaredpar and @davidfowl)

Area-Compilers Discussion Feature Request New Language Feature - ReplacOriginal

Most helpful comment

Oh. You don't need to worry about that. The experience isnt' great. But people put up with it.
If the existing deficiencies aren't a problem, then why does it need to be part of Roslyn?

Using quotes for something we never said is a bit troubling... We did say that the problem exists but we didn't have the choice other than to live with it. That's different, we wouldn't have had this discussion in the first place if we were not already in the wild west looking for something better

Let me reiterate why this plugin architecture is important to be part of Roslyn:

  • Compilation time would be largely improved
  • All code modifier/generators would use the same unified infrastructure (no more custom msbuild tasks, no more hazardous custom IL patching tools)
  • Using clean SyntaxTree instead of whatever dirty IL rewriter
  • As simple as using DiagnosticAnalyzer today, by installing a NuGet package
  • We could have a good debugging experience (in cases where it makes sense)

So, overall a significant better experience compare to what we have today.

From this discussion, it appears that scenarios that could modify the existing code will never get approved by the Roslyn Team while generators adding stuffs could be. It is better than nothing, but it is excluding quite a few chunks of scenarios out there (e.g AOP). I'm probably gonna try to release a lightweight-fork of Roslyn (easily upgrade-able to any new version of Roslyn) with NuGet packages that will allow these kind of compiler plugins, if it can help to federate a bit our wild west.

All 119 comments

Ideal scenario for me as a library author:

  1. consumer adds nuget packages and knows nothing about the voodoo
  2. then a miracle occurs
  3. assembly with code injected / mutated by the tool is ejected

then a miracle occurs

The "compile modules" work is presumably a good starting point for a minimal viable product of step 2. Obviously the same miracle should happen for all of dotnet build, msbuild and IDE build. I can see that there might be security issues if step 1 is completely silent, but by the same token: installing a library that does runtime meta-programming has some similar issues and nobody blinks an eye. If that is a concern, I wonder whether it would be reasonable to do something like a warning:

CS{NUMBER} Compiler module {name} discovered; to execute modules, add <compilerModules>true</compilerModules> to the project file {path}

(i.e. kinda similar to "CS0227 Unsafe code may only appear if compiling with /unsafe")

Another for the list of uses for codegen is RPC proxies & stubs, like the GrainReference & IGrainMethodInvoker implementations we have in dotnet/orleans.

Currently, we use a mix of IL and Roslyn, but Roslyn code is much easier to maintain and debug. IL code is a necessary last resort, used for generating serializers/copiers for inaccessible types and fields & optimizing the results of reflection (eg, generating code to call some constructor).

Perform Roslyn-based code generation at runtime adds a considerable amount of time to startup as we need to scan types and perform deep accessibility checks (if we generate code for this type, will it even compile or will the compiler whinge about accessibility?) That's before we even get to generating syntax and invoking the compiler.

In order to generate code at compile time, we have a NuGet package which builds the target assembly, loads it and generates code for it, and then builds it again with that generated code included. That slows down builds and it's not pretty, but the metadata model exposed by reflection is nicer to work with than the one exposed by Roslyn during compilation time (at least the APIs I could find when I wrote those code generators in mid-2015). If it's possible for us to eventually move to a supported mechanism, like compiler modules, then that's very attractive.

Examples of codegen/tooling in Orleans:
Roslyn-based: GrainReferenceGenerator.cs, GrainMethodInvokerGenerator.cs, SerializerGenerator.cs
IL-Based: ILSerializerGenerator.cs, GrainCasterFactory.cs
Tooling: Orleans.SDK.targets

Code generation is hugely important to the .NET ecosystem: it lets us augment the features provided by the language/runtime. As Marc was indicating, features which support codegen will never be touched by the vast majority of users, but nearly all of those users directly benefit from those features via the libraries which they consume.


EDIT: this issue is quite old, and we've evolved our code generation approach since then. We use Roslyn both for consumption (analysis) and production of source.
Code generator library: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator
MSBuild integration: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator.MSBuild

@ReubenBond great example; and indeed your IL code looks very familiar - very comparable to protobuf-net's emit stage. Just a tip for anyone who might need to fight Roslyn (Reuben makes the point that the API is hard at times): roslynquoter is great if you want to figure out the Roslyn code needed for a particular scenario.

I used Roslyn quoter in the beginning, and now LINQPad or VS with their syntax tree displays if I ever need it. It's a verbose API, but it's fine. Consuming syntax trees is harder than generating them, I feel, but maybe I just don't have enough experience with that aspect of roslyn yet. It feels stringly compared to using reflection on compiled code.

For me, (source) code generation tools can be grouped into two categories:

  1. Those that can represent the generated code strictly as a set of new (generated) files added to the build, without altering the effective content of any non-generated source files in the project
  2. Those that alter code defined in non-generated code, e.g. Code Contracts' handling of Contract.Requires<TException> (this was actually an IL generation tool, but it should still serve as an example)

It's hard for me to tell which of these is more important for the people blocked on this issue.

For tools falling into the first category, it seems a reasonable option is a build-time tool which runs the code generation step prior to invoking the compiler. It aligns closely with the way other code generation tools currently operate, and it's always seemed to work well for ANTLR from both a tooling and an IDE experience perspective (though that wasn't a Roslyn-based generator). I'd be interested to see details on the reasons why this approach doesn't produce the expected results when Roslyn comes into the mix.

Tools falling into the second category can be further broken into two sub-categories: those than can represent the output of the code generation transformation as pure C# code (i.e. code that can be passed to an unaltered compiler such that the compiled binary executes as expected from the code generation tool output), and those that cannot. I have substantial but somewhat different concerns for these sub-categories, but if tools aren't actually falling into these groups then it doesn't seem necessary to go down this path right now.

I think a good example of working Codegen with IDE support in the current .Net environment is F# type providers.

They cover only 1. of @sharwell classification but they already exists today and work relatively well in the IDEs (one of their currently missing feature is accessing types from the same compilation but there's an RFC).

Writing them is a niche feature for sure, but using them is pretty common and they already cover most database access, and serialization/deserialization needs.

@sharwell I strongly suspect that "1" would fulfil the 80%+ case - and quite possibly the 98% case if we consider partial class etc. Frankly, that would be a great place to start. It is conceptually similar to what CodeGeneration.Roslyn does - but the point here would be to formalize something in this area, rather than workarounds, hacks, etc.

@mgravell Suppose Antlr4.net40.targets worked on both MSBuild for desktop and MSBuild for .NET Core (cross platform). The overall behavior of the file could be modified for a different tool, e.g. a Roslyn-based tool, which gets plugged in where Antlr4ClassGenerationTask currently sits. This file already supports incremental builds, debugging, and cleanly integrates with both Visual Studio and ReSharper IntelliSense. Would you perhaps be unblocked with this file as-is (modified to provide the same feature set for a different code generator)? Or perhaps with a modification to work the same way on additional platforms?

@sharwell I don't really know enough about that tool to offer an opinion. If it would work in a x-plat, x-target, x-tooling way that allows me to get extra code into the build based on analysis of the current code, then I guess "probably"?

@mgravell It should be possible. You could cover several scenarios from the start by writing your MSBuild task against the new .NET Standard builds of the MSBuild APIs. You could then expand on that support by multi-targeting the build task (to .NET Standard plus MSBuild 14 to cover Visual Studio 2015), and updating the targets file to reference the correct task. The build task is just a wrapper that invokes your code generation tool with the correct input files and options, and feeds information about the outputs back for the rest of the build.

oh god, I'm going to have to learn how to "do" msbuild, aren't I...

@mgravell You were already working on IL code generation without API assistance, how bad could it be? :trollface:

In all seriousness though, start with the .targets file I linked and the .props file next to it.

  1. Replace everything Antlr4 with some other name
  2. Remove properties not related to your use, such as Java vendor
  3. Remove the AvailableItemName lines and the ItemDefinitionGroup
  4. Instead of passing @(Antlr4) to the task (all items with the Antlr4 build action), you'll want to pass @(Compile) (all the source code files)

It's probably not a one sitting thing, but it's not horrible either. This build definition has been hardened by many, many years of use in commercial settings.

@sharwell that requires a "double compile" no? Which is pretty sucky. What you want is to be handed a roslyn compilation so that tools can just look at that VS rewriting that boilerplate in every build task that needs the same compilation.

@davidfowl It does (require two "core compilations"), but maybe it's not a problem. The C# compilation core is often only a minority of the overall build time, and the code generation component could reuse much of the same information that is already getting computed for the main build (e.g. the location of references).

@sharwell That seems unfortunate (I know emit takes up more time than making the compilation itself). We do 99% of the work with analyzers already, that's why this feels like such a PITA.

My aim, here, is to challenge the line above. While I fully agree that it is a minority feature in terms of counting the implementors, I propose that it is not a niche feature in terms of reach.

Don't think this needs to be challenged. Everyone involved with the generators features agrees on this. Else we wouldn't have spent so much time on it.

How can we pragmatically move forward these needs, without drowning in the big goal complexity that has stalled "generators"?

One of the bigger issues is scenarios. There is only a limited amount of time we can spend on language features. Even features with known advantages and well thought out designs get put on the back burner because of more immediate needs.

Take for example ref returns and span<T>. These are features that are well understood, have known / measured benefits, mature designs and available implementations to reference. That's been the case for at least 5+ years now, much longer for ref returns. Yet we're now just getting to them in the core language because we had motivating scenarios to push them to the top: kestrel / pipelines + unity performance.

Generators still have a number of open design issues but we're quite certain it's a costly feature. The actual compiler / language changes are moderate but the developer experience cost is very high. In order to get this going we need some very compelling scenarios.

Also the scenarios need to essentially show why we can't just use existing options: T4 templates, single file generators, etc ... Can a moderate investment there close the gap that full language generators are not needed?

Would rebooting the "compile modules" work be viable?

Compiler modules are harder to create a developer experience around because they could change literally anything about a Compilation (at least in the versions I looked at). The generators feature we worked on for C# 7.0 was very carefully designed to be an augmenting generator to limit the churn a generator could have to the compilation.

Finally due want to take a second to talk about the state of the feature. We put a lot of work into this for C# 7.0 across a number of teams. Eventually we cut it for 7.0 for pragmatic reasons and decided to revisit it during the next major release. Hence the inactivity here isn't a reflection of "we've given up" but more of a reflection on "shipping C# 7.0 - 7.2 is a ton of work and we need our brain power over there". Once we start ramping up again an C# 8.0 this will get revisited and we'll look at it again compared to all the other work we want to do.

@davidfowl Note that I wouldn't say this is optimal, and small overhead definitely adds up over time and in combination with other things. My real hope from this is it unblocks users who need this functionality in the short term by providing a very reliable and tolerably-performing path to a desired outcome, and then use those results to start being very specific about the goals of compiler-integrated transforms.

@sharwell How does it affect incremental builds? Do you have any experience running multiple code generators in the same project?

How does it affect incremental builds?

The approach I'm using for ANTLR 4 works seamlessly with incremental builds, including incremental clean (a change which causes output files to be renamed will delete previous outputs during incremental build).

Do you have any experience running multiple code generators in the same project?

No, but the order can be defined in MSBuild. By default, all generators run before the compilation step but are otherwise unordered relative to each other. If you have a code generator which depends on another code generator, it can be accounted for on an as-needed basis.

@mgravell What is the experience you want for:

  1. Build/Compilation
  2. Debugging
  3. Browsing Generated Source

With code gen? There are many other problems to solve but I don't understand which position you are taking. Do you want something similar to roslyn generators or do have something else in mind?

FYI @cston

As another take on this, Refit (which does compile-time code generation), uses a Mustache file for its template language. Sure, it only works per-language, but that's ok for us.

It's really easy to update the template. The generation is called in a task before CoreCompile.
https://github.com/paulcbetts/refit/tree/master/InterfaceStubGenerator.Core

Still, I'd love to see "the right way" to do this in a more standard way.

I will definitely be looking into theGenerateStubsTask in refit and that .targets file, thanks for the pointer, @onovotny.

Call it a personal weakness:

  • I can write code to parse a 3rd-party DSL; no problem
  • I can emit C# accordingly: fine!
  • or IL: that's just dandy
  • but msbuild files? that scares and confuses me...

Out of curiosity; does that .targets file and the tool assembly deploy into consumers via nuget?

In Refit, the tools get packed in a build\tools folder in the nuget package:
https://github.com/paulcbetts/refit/blob/master/Refit/Refit.csproj#L28-L46

The only thing that gets deployed to the client itself is the main Refit library, that has the implementation called by the generated code.

One thing I will mention is that writing non-trivial Build Tasks is harder than it should be for .NET Core due to needing to deal with publish steps to get the right output and then needing a custom AssemblyLoadContext to load your task's dependencies alongside the dll: https://github.com/paulcbetts/refit/blob/master/InterfaceStubGenerator.BuildTasks/ContextAwareTask.cs (thanks to @AArnott for that one).

In the list of scenarios that @mgravell listed above:

serializers (Json.Net, protobuf-net, etc)
ORM-like tools (EF, dapper, etc)
RPC tools for implementing stubs
UI templating tools (razor, etc)

I will add to the UI templating tools, UI framework (e.g XAML) where we need to generate bindings/binary representation of the UI tree. Same goes for grammar languages...etc. I will add also IL patching, very narrow, but again, I had to use this in multiple projects and IL post processing hurts compilation times... The list goes on...

As @jaredpar asked, I'm wondering why this list of scenarios are not enough compelling?...

In my experience, I had already to work on two other similar problems to generate custom serializers and codegen from a scripting language. The fact that we don't have a simple hook in the compiler to generate everything as part of the compilation process is very annoying because it is inducing a post step compilation (with complications when you have to IL merge back), which caused much longer compilation times (typically, I remember some people complaining in our team that compilation time was bad for C#, but it was because of our extra passes that we couldn't really internalize as part of the regular Roslyn compilation process - I had even to make a server running in the background so that we could avoid NGEN steps or JIT slowing down the whole process by several seconds). For that scenario alone, we wouldn't even care about having access to Intellisense for these serializers, as they are completely internal stuffs...

The same would go for generating DLLImport interop for CoreRT/LLILC...

Alone XAML is quite a big typical scenario, compilation steps involved currently in the compilation pipeline (msbuild+VS experience)...

So, obviously, there are quite a few big scenarios that we have to workaround today and are hurting significantly our development process (and the overall feeling that people could have about .NET building process)

I completely understand the complexity of getting something full-featured covering all the cases, but couldn't we work step by step on this? Like:

1) provide a plugin API for running pre/post processing compilation steps - without Intellisense, similar to how analyzers are plugged. Get the feedback from the community. This would be a tech preview, added to the official compiler, but not ready for prime-time/broad usages. Breaking changes possible later.
2) take into account Intellisense more seriously and add support for it

For 1), it could help already many projects that don't require Intellisense. It would help the compiler team to get more feedback.
For 2), it could likely introduce breaking changes for the Compilation plugin API, but it would be fine, as long as 1) is an internal stuffs that you need to activate.

Thoughts?

@xoofx

As @jaredpar asked, I'm wondering why this list of scenarios are not enough compelling?...

The key here is enough. The scenarios are compelling, but the work involved in making code generators work as a first class language + IDE feature is extremely high. It would likely take at minimum 3 developers the majority of a release cycle to complete. There are a lot of other features you could do with that man power.

Also we have to weigh existing solutions here. How much better could these scenarios get if we invested a fraction of the time in the experience around single file generators / T4 templates?

couldn't we work step by step on this?

I don't think the first step is going to give us a lot of actionable feedback. Definitely we'd like feedback in that area as it's new and has design holes. But it also represents the smallest amount of work. The second step, Intellisense, is basically where all of the crazy comes into play. Very likely the problems associated with making that function would force us to redesign the compiler layer.

The key here is enough. The scenarios are compelling, but the work involved in making code generators work as a first class language + IDE feature is extremely high. It would likely take at minimum 3 developers the majority of a release cycle to complete. There are a lot of other features you could do with that man power.

I understand the challenges of the IDE part. But, _many scenarios don't require IDE/Intellisense experiences_ because everything that is generated is internal/unknown to the developer. At least for the following cases:

  • serializers
  • mappers
  • rpc stubs
  • IL/calli for custom dllimport

The first 3 could generate additional files (in obj) that would be only required for debugging experience (and I don't expect this case to be that complicated to integrate).

The compiler would allow to generate code as part of the compilation process (i.e as part of Roslyn process - exactly like analyzers -), it wouldn't need complicated scenarios like triggering recompilation on user key stroke, navigating to generated code...etc.

The only scenario that may require a bit more work in Roslyn is the last one where we would need to have pluggability access at the IL generator level (and not only at the syntax level, or a way at the syntax level to pass through custom IL generator)

I don't think the first step is going to give us a lot of actionable feedback. Definitely we'd like feedback in that area as it's new and has design holes. But it also represents the smallest amount of work.

Considering that this feature alone would cover many of the existing scenarios, by far the most popular (in terms of impact to end-users) and assuming that it would not require any IDE modifications (expect maybe displaying code generators assemblies along analyzers in assemblies references), it sounds very reasonable that it should be possible to add this without tremendous amount of man power (I quote you here _"the smallest amount of work."_ 😉 )

Would you agree with this or are there any side effects of this feature that I'm missing?

Also we have to weigh existing solutions here. How much better could these scenarios get if we invested a fraction of the time in the experience around single file generators / T4 templates?

Above the following scenarios, single file generators/T4 templates is not the most common scenario. T4 templates are working fine today for the few cases where you don't need to inject new code based on your existing code.

@jaredpar could you confirm my question above?

Also, @mgravell , @ReubenBond, from your experience, can you confirm that codegen for the cases above (serializers/mappers/rpc stubs...etc.) usually don't require any navigation/intellisense/dynamic recompilation on the fly and that the generated code doesn't need to be accessed from "manual code"?

@xoofx I would say that for those cases, and I include Refit, then no, it doesn't need IntelliSense support.

That said, in another case, like @AArnott's NerdBank.GitVersioning, it generates a static ThisAssembly class that contains useful members. This is currently done in a pre-compile step but could easily be turned into a generator. The data from the ThisAssembly class should be available to IntelliSense.

(quietly nodes in agreement with the sage words of @onovotny)

@xoofx in the Orleans case, codegen never requires any intellisense. It's all for behind-the-scenes support classes.

@xoofx

I understand the challenges of the IDE part. But, many scenarios don't require IDE/Intellisense experiences because everything that is generated is internal/unknown to the developer.

If there is no need for an IDE experience then a single file generator via MSBuild is a very attractive solution. Why not invest a small amount of effort there to make the experience more toolable?

Would you agree with this or are there any side effects of this feature that I'm missing?

There are two experiences that aren't accounted for here though: debugging and ENC. Imagine the generated code has an exception, or as the developer I simply want to step through it.

What will be the experience when I step into that file? Will intellisense be available, syntax highlighting, etc ... Getting that to function is not a trivial task. Not fixing it though will make the experience seem rather broken.

ENC is a whole other bag. What should the experience be when the deveolper edits the generated code? The compiler can't re-run the generator otherwise it would destroy the user edit. Also can't not run the generator or it wouldn't be able to account for other edits the developer made to the normal code.

Imagine the generated code has an exception, or as the developer I simply want to step through it.

Isn't that a similar issue to Linq.Expressions?

If there is no need for an IDE experience then a single file generator via MSBuild is a very attractive solution. Why not invest a small amount of effort there to make the experience more toolable?

A single file generator means that you need to commit the generated files as part of your repo, which is something that you don't want (e.g size of generated file, merging conflicts alone would be super annoying), moreover if you don't want to allow any changes to the code. Also what is the other IDE story support for them? (VSCode, Rider?...). The single file generator would have to be triggered automatically on every single changes... not sure VS is well equipped for this.
There are also stuffs that you absolutely can't do with single file generator, like replacing empty method body (for the DllImport scenario typically) or if a code generator needs to generate nested serializer inside type in order for them to be able to access private fields... (by introducing behind the scene a "partial" even if the original class/struct doesn't have it)

So single file generator is not an option for the scenarios above, at least for me, as I have already used them in the past, precisely for a serializers scenarios, and it was an awful experience for developers... what do you think @mgravell @ReubenBond @onovotny ?

There are two experiences that aren't accounted for here though: debugging and ENC. Imagine the generated code has an exception, or as the developer I simply want to step through it.
What will be the experience when I step into that file? Will intellisense be available, syntax highlighting, etc ... Getting that to function is not a trivial task. Not fixing it though will make the experience seem rather broken.

If we want debugging experience, it would require to generate files on the disk, so there is definitely some work to do here. I'm also concerned that the generator would need access to Roslyn syntax tree (with syntax inferences ready), it would then generate some files, these files would have to be added to the current assembly compilation unit, would it re-trigger a whole recompilation? (or is it possible to do this with Roslyn without recompiling everything, I don't know)

So this is definitely something important, but would this feature alone would take several man years? I fail to see exactly what would make this so difficult to add to Roslyn...

ENC is a whole other bag. What should the experience be when the deveolper edits the generated code? The compiler can't re-run the generator otherwise it would destroy the user edit. Also can't not run the generator or it wouldn't be able to account for other edits the developer made to the normal code.

ENC should not be allowed for them (if there would be any generated files)

I agree with the @xoofx's concerns around SingleFileGenerators, namely that they're tied to an IDE and requires the artifact to be checked in (for the same reasons).

The tool we have should work from pure notepad + CLI builds. We have this today with a code-generating pre-compile Task injected with targets, but that's hard to maintain and is limited to C# since we hard-code the template.

Being able to step-in to the generated code is definitely important, however I don't see any need to support editing it. In fact, given that it is re-generated "constantly," the editor shouldn't allow any changes to the generated file.

@xoofx

A single file generator means that you need to commit the generated files as part of your repo, which is something that you don't want

Why do you need to commit the file? Again, imagine we made moderate investments in the scenario. For example making the single file generator run as a part of the build.

Also, whether you check in generated files is matter of preference. In Roslyn we check in our generated files because we found that it has a number of benefits: simplifies our development steps (restore, open solution), lets us SourceLink in 100% of our source code, and allows simple stepping / debugging.

There are also stuffs that you absolutely can't do with single file generator, like replacing empty method body

Why not? Or rather why do you think the code generators feature would allow this but a single file generator would not?

The design we settled on for code generators did not allow for generators to modify developer code. Instead it added a couple of small language features (think partial methods on steriods) that allowed for generated code to more cleanly replace those method.s

So this is definitely something important, but would this feature alone would take several man years? I fail to see exactly what would make this so difficult to add to Roslyn...

Where did I say this feature (Debugging + ENC) would take several man years?

ENC should not be allowed for them (if there would be any generated files)

This is your opinion and I can guarantee you it's not shared by a significant number of our customers.

@jaredpar

Why do you need to commit the file? Again, imagine we made moderate investments in the scenario. For example making the single file generator run as a part of the build.

Can you explain how you would do this exactly? How this could run as part of a single pass/process within the compilation process (Roslyn)? Again, if you are proposing what we are already doing, by customizing special msbuild target, compiling first the assembly, reading metadata from it, generating file from the code, recompiling the assembly with the generated file (or merge back an assembly generated separately)... we have explained that in addition to add significant complexity to our compilation process, compilation time is hurting the whole experience

Why not? Or rather why do you think the code generators feature would allow this but a single file generator would not?

Afaik, single file generator cannot modify existing code (adding an attribute to a class for example), unless you have prepared your code to do so (add partial, even if you don't know if the generator is going to add this attribute...)....

How would you code the Pinvoke generator to replace DllImport with proper dll loading, calli and so on? Unless this is going to use again the slow route of generating the whole DllImport outside of the main compilation process? (as .NET Native is doing)... or using some custom internal of Roslyn we can't use?

Where did I say this feature (Debugging + ENC) would take several man years?

You said above _"take at minimum 3 developers the majority of a release cycle to complete"_ (for the whole thing, not the reduced scenario we are focusing at right now), so I just played here to extrapolate the bits, to make it more... dramatic. I'm glad that it will be much less 😋

This is your opinion and I can guarantee you it's not shared by a significant number of our customers.

My opinion? In this discussion, we are a couple of customers, promoters of .NET in our companies, MVP, OSS contributors of some major projects (btw, you know, the kind of projects "your customers" are most likely using, sometimes, without telling us or even just thanking us) and we have been here to confirm some major use cases, requirements...etc. and we are really glad to help the whole .NET OSS platform here by discussing with you in public. It would certainly help if these customers could raise their voice directly here so that we could get the full picture...

But fair enough, this discussion seems to indicate that beyond analyzers, there is currently zero chance to add plug-able extensions for codegen scenarios to Roslyn in any short/medium term future.

At least, we have tried. 😉

I get that the outcome we want is "codegen features sooner", but what subset and when? If we are going to do this in a multi-step fashion what is the acceptable minimum feature set?

My reading of is it:
Step 0: Better single file generator experience that is more integrated with the compiler giving better performance
Step 1: Add more APIs so that nuget and the rest of the toolchain knows about these generators
Step 2: Better APIs so that dynamic compilation of serializes is possible
Step 3: Have analyzer-like\type-provider-like api that allows for generated types to be consumed in the IDE (intellisense etc.)
Step 4: Have generated code be emitted to pdb so debugging works
Step 5: Have Enc work with generated code

Where am I wrong in terms of what everyone wants? what would the ideal ordering of feature delivery be?

@xoofx agreed: we also do not want users to have to commit the code we generate. In order to reduce the chance of this happening, we've taken to emitting the code into the obj directory, whereas it was previously in Properties.

This scenario is too important to give up on. Code generation in C# today is restricted to very few developers with few scenarios because it's so difficult (mostly because tooling/language support is lacking) and there's no blessed path. Look at Android, though, where a large chunk of the most popular libraries make use of Java's Annotation Processing Tool (APT) for codegen. Eg: Butterknife, Dagger, Retrofit, Robolectric, AndroidAnnotations, Parceler, IcePick. Those libraries make the ecosystem vastly better and they make the language more powerful and friendlier for application developers. They extend the scenarios we've discussed in this thread to also include UI binding & customization, application lifecycle, threading, testing, and dependency injection. We don't need all of those things in .NET since we have superior reflection support, but the scenarios are interesting nonetheless.

The outlined steps look good to me. My preference is: 0, 4, 1, 2, with very little desire 3 & 5.

Step 0:
I understand this as "better APIs for processing and generating code at build time"

Step 1:
I understand this as "code generators can be installed via packages and are exposed to tooling other than the compiler (for diagnostic purposes / transparency / management?)"

Step 2:
Perhaps we could attach our own deserialization & copy constructors to a type, as well as other methods & properties. That should remove the desire to have some way to access private types/fields from generated code (see #11149). Currently we jump through hoops to make things "just work" in cases where the user has a private type or private/readonly fields. Eg, the generated C# code calls into methods which generate IL at runtime so that we can sidestep accessibility rules.

Step 3:
For our scenarios, this is not necessary. When Orleans was originally released, we did have some user-exposed generated code. We would generate static classes for users to consume based upon their interfaces. The experience was not ideal, and we've since replaced it with non-static classes with generic methods. I'm not discounting the value of this in general. I'm sure it's very useful, just for our particular scenarios (RPCs & Serialization)

Step 4:
This should be step 1 if it's not included in step 0. Without this we would end up instrumenting generated code (printf) so that we have some way of debugging it. Ultimately, we can live without it.

Step 5:
I don't see the need. A developer should not expect ENC edits to persist after the debugging session has ended. It would be more surprising if they did persist. It's not mutable code & conceptually does not need to ever exist on disk. When we perform runtime code generation, we just pass syntax trees directly to the compiler, no textual C# code exists and certainly not on disk.

@xoofx

Afaik, single file generator cannot modify existing code

I covered this in my earlier comment. The generator feature we were designing didn't allow for code modification either. Hence single file generators are just as powerful here as the language feature.

Again, if you are proposing what we are already doing, by customizing special msbuild target, compiling first the assembly, reading metadata from it, generating file from the code, recompiling the assembly with the generated file (or merge back an assembly generated separately)... we have explained that in addition to add significant complexity to our compilation process, compilation time is hurting the whole experience

Not what I'm suggesting. I'm trying to dig into why single file generators don't work for the scenarios. Is it the development experience, the way in which they execute in MSBuild, the lack of access to Compilation objects, etc ...

You said above "take at minimum 3 developers the majority of a release cycle to complete" (for the whole thing, not the reduced scenario we are focusing at right now), so I just played here to extrapolate the bits, to make it more... dramatic.

How is creating unnecessary drama helping to move this conversation forward?

I'm glad that it will be much less

When did I say it would be much less?

It would certainly help if these customers could raise their voice directly here so that we could get the full picture...

The information I provided pretty much sums up their position. The ENC experience should work for the entirety of the C# code that comprises their assemblies. Whenever there is a gap we get pretty direct feedback about it.

I feel as passionately about this issue as the people on this thread (DNX compile modules were awesome 😄). @jaredpar would it make sense to convert a few real projects to what your proposal would be (better single file generators)? That way we could see how much pain there actually is(and there's a lot today), and how it scales across the various IDEs we need to support now (VS, VS Code, VS for Mac, Rider etc).

@jaredpar

I covered this in my earlier comment. The generator feature we were designing didn't allow for code modification either. Hence single file generators are just as powerful here as the language feature.

I understand, but it seems that previous generator feature didn't have in mind scenarios for rewriting DllImport. Compare to serializers, this is less critical, but when we will have one day to work with CoreRT/LLILC to make DllImport code generation fast as part of build process, it will require to integrate it as part of the compilation process, otherwise it will slow down the whole compilation.

Something to keep in mind, but let's not take this scenario into account for now.

I'm trying to dig into why single file generators don't work for the scenarios. Is it the development experience, the way in which they execute in MSBuild, the lack of access to Compilation objects, etc ...

Assuming that we are talking about the way IVsSingleFileGenerator is currently working:

1) It is VS specific, design time only, running only from the IDE
2) It is running against a "trigger" file on which you assign a custom tool. It is inadequate when changes are coming from any code changes in your project (case of serializers, rpc..etc.) as It is triggered only if you change the trigger file.
3) It implies to generate a file on the disk and to add the generated file to the source control repo

For serializsers/rpc/db mapping, these are laborious constraints. I fail to see how to workaround these without integrating the generator as part of the Roslyn compilation process.

How is creating unnecessary drama helping to move this conversation forward?
When did I say it would be much less?

My usage of rhetoric and humor doesn't get through here, so my apologize for the interference.

The ENC experience should work for the entirety of the C# code

If generated code is on the disk (like in obj\...), I don't see why it would not work (although, as we said, for generated code, we don't want to persist changes, as they don't make sense)... and if generated code is not on the disk, the user couldn't see it, so there should be no problem either.

Note that for IVsSingleFileGenerator, in additions to the points above, this interface is also completely lacking any context of compilation objects.

What we are looking for is something very similar to Roslyn Analyzers (in terms of distribution, discover-ability, message reporting...), but that would run just before analyzers. I actually wrote a hack/proof of concept a few months ago in this branch at compilation-rewriter. This is not how it should be done, but it gives roughly an idea where this generator could be called in current Roslyn compilation process (A dedicated code would extract the pre-process outside of AnalyzerDriver, remove inheritance from DiagnosticAnalyzer, try to share a common base class and provide a default base class CompilationRewriter...etc)

I totally agree with @xoofx that analyzers provide a compelling basis for code generators. There's already a nice dev-time experience for analyzers that could be at least partially leveraged and developers are starting to understand how they work, what they're capable of, and how to use them. Adapting the analyzer pattern to automatic compile-time application seems like it would be easier for developers to grok than a whole new code generation paradigm.

I just started work on a task that uses Roslyn-based code generation to provide a feature:
https://twitter.com/samharwell/status/883767229896159232

I have no idea how it's going to turn out. Definitely intended to be a learning experience.

How can we help to make progress here?

We can use Roslyn's code analysis APIs to build better experiences for users and improve the ecosystem if we have support from the toolchain. In my case, I want to be able to take a project as it's being built, analyze its syntax, and emit additional syntax before the build continues.

I suspect that mine and Reuben's use cases are virtually identical.

Also emphasis: this kind of support would be a major boost to UWP (and
unity / xamarin) users who currently get a third rate story from any tools
that are meta-programming heavy. I've had quite a lot of frustrated
conversations with those users (especially of late).

Think of the users! :)

On 15 Dec 2017 9:42 p.m., "Reuben Bond" notifications@github.com wrote:

How can we help to make progress here?

We can use Roslyn's code analysis APIs to build better experiences for
users and improve the ecosystem if we have support from the toolchain. In
my case, I want to be able to take a project as it's being built, analyze
its syntax, and emit additional syntax before the build continues.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/roslyn/issues/19505#issuecomment-352119038, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABDsPi5jVURF9LYdThCFA77MqE959-Yks5tAufdgaJpZM4Na13S
.

Tagging @KathleenDollard

I also have two scenarios that would benefit from a proper compile time codegen solution.

  1. An ORM-like API that currently uses double compilation (csc, Mono.Cecil, csc) to generate support functions that are invisible to the end user.
  2. A localization API that could benefit from compile time codegen, but currently does not because there isn't a good way to make it work seamlessly (except for WebForms and Razor where the compliation units can be augmented and the result is quite nice but needs two separate codegen implementations).

In both cases I would like to modify the CompilationUnit before the assembly is emitted, but there is no need for any IDE/IntelliSense support (in fact, I don't want the end users to see any generated/replaced code, I just want them to use normal API calls that have a basic fallback implementation).
IntelliSense, debugging and ENC are not needed at all. For the ORM support functions I actually use #line hidden to hide the generated code from the PDB.

I would love to see life in this topic again. I think our modern view would result in a good outcome, specifically, the notion of micro generation/translations _rather than_ uber generations to solve all the problems of app development at one go.

There is also an interesting feature proposed by Anthony D Green https://github.com/dotnet/vblang/issues/282 This is an interesting finesse around what I have come to believe is a core requirement: the code you are going to emit needs to be checked syntactically while you are building the template. Since this approach actually doesn't use a template, there is innate syntax checking.

Thank you, @KathleenDollard, that gives me hope that we can make progress on this.

My interim solution was to create an MSBuild target which runs before CoreCompile and passes some context from the current build to my code generator assembly which then creates an AdhocWorkspace, calls GetCompilation, walks the tree, and outputs supplementary syntax to a temporary file which is included in the final build. This is then bundled into a NuGet package which inserts the target. That says nothing about ergonomics, just about how it can be done today.

This is cheaper than what we're doing in dotnet/orleans (full double compilation: each assembly is emitted twice), but more expensive than it needs to be (adds 2.5s to build for a small library). This solution might also be fragile.

I believe that the approach from dotnet/vblang#282 wouldn't be suitable for our needs in Orleans since it seems to only apply to properties. It looks very AOP to me (which is fine). In our case, we generate RPC proxies & stubs, serializers, and assembly-level attributes which help us to quickly locate interesting types in an assembly.

Perhaps we could start with a general purpose code generation feature and introduce friendlier but more restricted features (like dotnet/vblang#282) later?

I would love to see life in this topic again. I think our modern view would result in a good outcome, specifically, the notion of micro generation/translations rather than uber generations to solve all the problems of app development at one go.

@KathleenDollard the posts in this thread should give a pretty condensed view of years of usage/experience feedback from many major libraries or internal company use cases. We have been through many options in the past already (IL postprocessing + IL merging, IVsSingleFileGenerator generating files, Roslyn pre-analysis+generate files+actual compilation...etc.) and the bottom line is that without an integrated story into the compiler, we are making the compilation story horrible for our users.

The example of @ReubenBond above which adds 2.5s is very similar to the problem I had to fight with IL postprocessing, where I end up creating a server running on the user machine to make sure we don't have to reload all the processing assemblies... still the experience was terrible compare to regular .NET projects (not counting months if not years of trouble when Cecil was not correctly writing debugging information for async code making the debugging experience impossible after a IL postprocess)

In the above discussions, the Edit&Continue seemed to be one of the blocking points but we said that we don't want Edit&Continue for code that is generated by a tool (btw, I would love to hear metrics usages about Edit&Continue, because I have never been using it or any people in my teams). I believe that generating these files in intermediate obj/ directory would still allow a good debugging experience (in case we want to debug generate code). I would go further and say that ENC should be disabled if a Roslyn codegen plugin is present in the project. Users would know about it ("Can't ENC, because XXX requires Roslyn plugin codegen) and would understand the benefits. If they don't want this, they would choose a different RPC/UI/Whatever library (and would realize what they are loosing)

I have given above also an example where and how in the Roslyn toolchain this codegen plugin story could be developed by leveraging the existing Analyzer infrastructure...

So I concur with you, I would love to see life in this topic again! 😉

FYI, at Unity we have been developing an incremental compiler using Roslyn. In the coming months/years, we are going to move to a model where many parts will be code generated (serializers...etc.), and this code generation will be part of our compiler. We are not forking Roslyn but reusing the different bits to allow this. Still, I would really prefer this kind of work to be integrated/standardized into the Roslyn compiler (As diagnostic analyzers have been done) as it would allow efficient codegen scenarios for a broader audience.

So I have been in a lengthily discussion on a related issue https://github.com/dotnet/csharplang/issues/107#issuecomment-398663257

TLTR; I would like to know if the folks in the Roslyn/Microsoft teams (@jaredpar @agocke @CyrusNajmabadi @KathleenDollard @jcouv and all the people I have missed sorry) plus the people asking for this feature that have been developping similar scenarios (@mgravell @ReubenBond @xoofx, add your name if you have developed something related to heavy IL patching/codegen scenarios) would be interested to have a meeting together to try to sort out our requirements, the constraints and if we can proceed further?

I've only been lurking on the relevant GitHub issues, but I'd be interested in participating. We have a lot of generated code using a variety of methods (T4 templates, custom tools using Roslyn, custom tools not using Roslyn, and Fody) for both private and public codegen.

@SimonCropp and the rest of the @Fody team would probably have some valuable input too.

Another use case from me: my PCLMock library uses codegen to take the busy work out of creating mocks. It uses Roslyn to achieve this, but .NET Core/SDK-style csproj came along and broke some things.

Writing the code to perform the codegen was one thing. The bigger challenge for me was deciding how to make that tooling available to developers (T4 template, console app, scripty...?). The concept of codegen should just be a formal part of the ecosystem and build tooling. As an added bonus, if it was formal, I suspect that the kinds of regressions I've struggled with would be less likely.

As it stands, PCLMock has remained unable to move to .NET standard for many months. 😢

@xoofx

I would like to know if the folks in the Roslyn/Microsoft teams ... would be interested to have a meeting together to try to sort out our requirements, the constraints and if we can proceed further?

I'm always up for a discussion. At the same time though most of what I'm interested in digging into how to make a sensible IDE story around generators. The compiler side of code generators is fairly straight forward in pretty much every design we've discussed. The IDE is the challenging part and what ultimately caused us to not do this feature.

The constraints we came up with for code generators and IDE were roughly the following:

  • The IDE must able to understand as changes happen to the code whether or not generators are out of date.
  • The IDE must have correct semantic information without running generators on every single key stroke.
  • Code generators cannot directly modify code the developer has authored.

(I feel this is a more relevant issue for discussing this request than https://github.com/dotnet/csharplang/issues/107)

@KathleenDollard, you mentioned out-of-proc source generation. This could satisfy most of my scenarios.

It could let us consume & emit syntax trees which are included in the compilation, and it would be re-invoked whenever source changes. I assume some simple notifications we could subscribe to so we're not running on each key press or file modification. The tooling would have to ensure all generators are satisfied before the final compilation.

Two scenarios which additive source generation doesn't currently suffice for are:

  • Serializers usually need/want to access private fields and potentially add a custom serialization constructor so that we can initialize readonly fields and (more importantly these days) get-only autoprops. Currently we use unverifiable IL to read/write private/readonly type members, but that is an issue for AOT platforms.
  • AOP scenarios need to be able to modify code more generally. I cannot comment on this, since I haven't needed it. I'd be fine with a solution which didn't satisfy AOP's needs immediately.

For my uses:

  • Using Roslyn APIs to analyze source and to construct syntax trees is fine.
  • Users do not need to be able to see generated code in intellisense or otherwise access it in their own code. We would have hand-written interfaces for scenarios which might tempt this and generate against the interfaces.
  • Users should be able to inspect generated code for diagnostic purposes. Eg, today we write it to obj dir.
  • Users should be able to step through generated code with the debugger.
  • Generated code should be able to add new members (eg serialization ctor) and access private members of another type specified in a compilation, but removing or modifying members is not a requirement. We have partial today, but I believe requiring users to add partial to everything because of codegen implementation details is an undesirable end-user experience.
  • Code generators should be easily shippable as NuGet packages.
  • Code generation should not significantly slow down the developer's inner loop (eg, by significantly slowing the build process)

@jaredpar wrote:

The constraints we came up with for code generators and IDE were roughly the following:

  • The IDE must able to understand as changes happen to the code whether or not generators are out of date.
  • The IDE must have correct semantic information without running generators on every single key stroke.
  • Code generators cannot directly modify code the developer has authored.

I believe these wants align with those constraints. The scary/difficult one is regarding adding new members and accessing privates - that probably needs the most design.

@ReubenBond all your use cases can be done with a post-compilation step without an IDE integration, the plugin would decide to dump to obj folders the relevant files that it wants to be debuggable. The access to private fields, the fact that it can change existing class to access private fields, the fact that you don't want to have explicit tag partial class, I agree with all your points, that's what we have with IL patching today (except the debugging experience, unless you go through a more complicated way of IL analyzing + generating cs + compiling them + ILMerge them back), and post compilation can simply handle this in a breeze.

Concerning @jaredpar requirements:

The IDE must able to understand as changes happen to the code whether or not generators are out of date.
The IDE must have correct semantic information without running generators on every single key stroke.

These requirements are hurting most post-compilation scenarios (and de-facto excluding them) while I don't understand why you would want to generate serializers/ORM mappers/RPC code on every key stroke or whenever there is a single change in the code base that could affect a codegen aware trigger. This is a waste of IDE time, while this code gen should only happen when actual compilation to the disk is happening. We want here a similar treatement than async/await: the users doesn't have to suffer from the internal machinery at IDE time. The difference with async/await is that we could output debuggable code in the obj folder just in case you still want to debug the internals.

Code generators cannot directly modify code the developer has authored.

Post-compilation would output files to obj folders at compilation time - not IDE typing time. This would not be mandatory to output files, as for some AOP scenarios this might be irrelevant (pre-post code changing the body of a method). The IDE has nothing to know about them as it would access them automatically when debugging with the generated PDB informations generated at compilation time. Files are "readonly" by essence there (post-compilation). as they are not part of your project.

At the same time though most of what I'm interested in digging into how to make a sensible IDE story around generators.

This is where the requirements for a strong IDE story around generators is puzzling me. Post-compilation allow a good balance between zero IDE integration work while fitting a large chunk of use cases we have with codegen scenarios today (including debug). By making a strong IDE integration a requirement (making it almost a pre-condition to even discuss in a meeting), mismatching our use cases on several points, is disallowing to make any progress on the subject, and the staling state of this issue, one year later, is attesting sadly the blocking and our misunderstanding on this.

@jaredpar, as @xoofx say, please re-consider post-compilation.

I think what a lot of people here want is to be able to optimize existing calls (probably with changes directly on the call site, hoisting stuff, etc). All of that code already works without any post-compilation, by doing stuff at run-time (e.g. serializers, ORMs). I like to think of it as comparing Debug and Release builds. You don't expect to be able to step through every C# statement when debugging a optimized release build. I think most of the optimized / generated code would be [DebuggerHidden] / [DebuggerNonUserCode] / [DebuggerStepThrough].

The major pain point here is that #line pragmas must be applied at the call-site, to line up the parameters when the call target gets swapped out, or in case it changes (and it would have to in some cases, e.g. inlining or converting anon type members into parameters in dapper calls).

So maybe we actually want an "optimizers" instead of "generators" feature, which would:

  • work on method calls decorated with an attribute
  • allow changing of call sites to those methods, allow them to be expanded arbitrarily, and introducing new members, to classes
  • require a run-time implementation, so they can work without post-compilation steps
  • don't need any IDE support

As you can see, this looks very similar to what generators do, but without any IDE support. That's why I think everybody wanting to get the "optimizers" feature was so excited about "generators", but are now bummed that the IDE support requirement is blocking everything.

I think what a lot of people here want is to be able to optimize existing calls

Optimizing calls almost always end up doing so by violating the semantics of the original code. Asking for this is essentially asking for generators to change the meaning of the code the developers typed. That will lead to less predictability and many, many subtle issues. I'm highly skeptical of generators that attempt to do this.

As you can see, this looks very similar to what generators do, but without any IDE support.

Even this design requires IDE support. It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

Optimizing calls almost always end up doing so by violating the semantics of the original code. Asking for this is essentially asking for generators to change the meaning of the code the developers typed. That will lead to less predictability and many, many subtle issues. I'm highly skeptical of generators that attempt to do this.

We are not asking Roslyn doing this. But this is something that is sometimes used in our products that we deliver to our customers but though, I have never been in a situation to violate the original code.

Think for example of providing a rewriter for Linq to expand them to use foreach instead. That's a super useful optimization.

Sure, someone with a compiler plugin could do something wrong, but many will do it right and on purpose. And nobody is forcing you to use a plugin that is going wild or is wrong.

Even this design requires IDE support. It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

As I explained, post compilation do support debug and you don't need IDE support (because the PDB is already giving the infrastructure to jump around files) by outputing files to the obj/ folders at compilation time.

But for AOP scenarios, usually, you don't want to debug modified call site. If you look at PostSharp debugging experience they don't give you an opportunity to step into the modified call site (and hey, they could violate the code they change, but I'm sure they are careful at designing their product not to do this), but you are still able to step into the callbacks.

But even for AOP and Roslyn post-compilation, with a configuration asked by the user, we could still re-dump the files have been changed to the obj folder in addition with the code inserted, and with pragma #line it should still be able to debug his code and step into the generated code - and step back to the original code that was not modified via pragma #line (but usually, you don't want to debug a Linq->Foreach loop, you assume that it has been battery tested as much as async/await has been tested)... though not sure many people would ask for this.

For the ENC case, I disagree: post-compilation does not have to comply with this, and this should be fine to disallow ENC if there is a post-compilation plugin in the project, because users of this compiler plugin will place the features brought that the library much above an ENC requirement. I have never seen a complain about this in our product, while we were doing pretty heavy IL patching. I don't think that PostSharp users either. ENC doesn't make sense here if we rewrite the body of a method or we change a class to be partial and add some methods to it...etc.

Again, we have been working with customers for years, by providing this kind of workflow, and it has been working fine (except, that the compilation time is just horrible and IL patching - not the one pre-generating to cs and ILMerge back - was not providing debug, but this can be done with Roslyn post-compilation plugins actually)

I'm not saying I don't want pre-compilation (generated code that is accessible by user code), because this one of course will require a deep IDE integration. But pre-compilation is a lot more problematic and it doesn't cover the more larger scenarios used by the post-compilation option.

So we are looking for something where we can have, first, post-compilation with debug in Roslyn (which requires zero IDE integration) which is very easy to add, and pre-compilation later, once all the teams involved for this feature will have been able to find a proper solution to this problem.

And I still would like to make this meeting, because I feel that we need to talk in real to clear things up more fluently 😉

It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

@jaredpar that's why I'm saying let's call the feature "optimizers". People are not expecting this of optimized code:

Step Into - Debug
Step into - Debug

Step Into - Release (optimized)
Step into - Release

@xoofx

Think for example of providing a rewriter for Linq to expand them to use foreach instead. That's a super useful optimization.

That's a prime example though: it's an optimization that changes the underlying semantics of the code. I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

As I explained, post compilation do support debug and you don't need IDE support (because the PDB is already giving the infrastructure to jump around files) by outputing files to the obj/ folders at compilation time.

That can help with general debugging. A lot more work would be needed to address features like ENC.

For the ENC case, I disagree: post-compilation does not have to comply with this, and this should be fine to disallow ENC if there is a post-compilation plugin in the project,

That is your opinion though and it simply doesn't reflect the feedback we get from users. They want ENC to just work.

And I still would like to make this meeting, because I feel that we need to talk in real to clear things up more fluently

Agree. In part because I'm better at talking than writing 😄 Seriously though I've had better success discussing the "violating C# semantics" point in person with people. Mostly because there are very subtle issues that pop up during optimizations that pretty much always violate C# semantics. It's easier to detail this, and the C# team philosophy around it, in a back and forth setting.

That is your opinion though and it simply doesn't reflect the feedback we get from users. They want ENC to just work.

You are referring to users in general - as a Roslyn team member, I can understand - but on our side we are talking about users that are using apps requiring to generate additional/modified code at compile time. In this category of person, I haven't seen them asking for ENC, or them saying "I won't use your product if you don't have ENC". So that's not really an opinion I have, but the reality we are living today: We are generating code through a non standardized setup (IL patching+custom build tasks) that is hurting the experience of our customers, but we don't any choice, because there are no other solutions to solve this problem today (hence why we are here trying to get this solution to get into Roslyn)...

That's a prime example though: it's an optimization that changes the underlying semantics of the code. I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

Sure, but can Roslyn oversee all the potential optimizations that all C# products around are looking for? That's the fantastic opportunity of having a true compiler plugin story (while the post-compiler plugin story is the easiest one), to allow people to extend the compiler (that we are today already extending through IL patching). It will allow people to prototype more easily new ideas (That sure could make it into Roslyn one day) or to distribute breakthrough to their customers directly.

I can feel from the different discussions with Roslyn team members that you are implicitly worried that it would open a pandora's box, and that the whole integrity of Roslyn is at stake here, that if a plugin starts to output something different than a stock Roslyn compiler, it would be a kind of treason or a super burden for Roslyn... while I'm more convinced that it would open more fantastic opportunities for the community than the few dark side of some rogue compiler plugins 😉

@jaredpar

it's an optimization that changes the underlying semantics of the code.

Doesn't every useful code generator/codegen tool do that too? The whole point is that the user writes one thing, but the code that actually executes is different.

I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

The justification I've heard for the reason that the C# compiler won't do that is that it can't see the implementation of LINQ. But a user that decides to install a "make LINQ faster" optimizer/code generator does know if it's appropriate for them. So I don't see the issue here.

Doesn't every useful code generator/codegen tool do that too? The whole point is that the user writes one thing, but the code that actually executes is different.

No, most code generators are things like serializers that take in a spec (in code or an external file like .proto) and generate additional code in the project to implement that specification. Since this is all C# code that's being generated and no user-written code is modified, by definition this can't violate any C# semantics.

Happy to hop on a call and hash things out

No ENC. Emit code is easy to understand but writting emit is a terrible experience. We need C#-Script、AOP on runtime. I think it is a very important enhancement for the .net core. I can even write life with it. :)

@xoofx

I can feel from the different discussions with Roslyn team members that you are implicitly worried that it would open a pandora's box, and that the whole integrity of Roslyn is at stake here

No. I'm more worried about the integrity of C#. There is a reason that C# has a spec and that the compiler adhers to that spec with the exception of compat issues. The language provides strong guarantees about how the code will be interpreted and executed. Once plugins can arbitrarily rewrite the code then all those guarantees go away. It becomes impossible for the developer, and the compiler team who gets the bug report, to understand exactly what a foreach or select clause are doing. It's no longer defined by the spec but instead by the whims of an optimization engine.

(IL patching+custom build tasks) that is hurting the experience of our customers

I agree that IL patching as a build step is both:

  1. hard to maintain as a implementer
  2. difficult reason about as a customer

Correct me if I am wrong @xoofx but I going to be working under the assumption that a solution to our problem would be able to create LinqOptimizer using roslyn apis.

So what do we need?

  1. Ability to load compilations (implicitly stating that we only need the context of a csproj file, not a the entire solution)
  2. Ability to examine existing code patterns (similar to analyzer infrastructure)
  3. Ability to modify code (similar to code fix infrastructure)

That last one is really tricky. Suppose we have two "optimizers" installed. Which order do they run in? What if they each modify the same section of code? If there are conflicts when the optimizer is run how are they reported to the user?

Today code fixes are user initiated actions. Most of these ambiguities are resolved by the user being shown the set of code changes that are going to be made, the user deciding that these changes are correct, and additional error dialogs being shown to the user if application fails.

A general purpose optimizer api will need to at least solve these problems to be viable.

  1. order of application for optimizers
  2. error model for failed optimizer application
  3. some strategy for parallelization / merging (so N number of optimizers don't require N number of builds)

In addition you've identified several concerns that the compiler team has about code gen so I'll spell them out:

Being able to completely change language semantics means a user could install an optimizer and then be unable to reason about their code. All arithmetic operations could be revered (+ to - for example) and the user wouldn't know until runtime. This sounds like an odd things to be concerned about ("who would ever write such a thing?" you may ask), but if this was introduced in the public compiler API there would be no half measures. I can personally attest that when you have a userbase as large a C# anything that can be done will be done and C# dialects are not a thing that the compiler team wants to introduce. C# as a language has a lot of explicitness in its design so you can look at a snippet of C# code and know what it is going to do. Losing that feature is not a tradeoff the compiler team is willing to make.

So where does that leave us? I still think that your scenario is a reasonable one. If you have a library that has a nice linq-like api it would be great if you could get the performance your users need without making the api harder to work with. My current thinking is that CodeGeneration.Roslyn is the best place to look at for the optimizer case.

CodeGeneration.Roslyn does all three of these:

  1. Ability to load compilations
  2. Ability to examine existing code patterns
  3. Ability to modify code

It has some design limitations that make it unsuitable to solving all of unitys problems out-of-the-box, but I still think that modifying this solution will be easier because:

  1. It uses existing extensibility mechanism (msbuild instead of the compiler)
  2. It has an api that is better than IL patching

@jaredpar

Once plugins can arbitrarily rewrite the code then all those guarantees go away. It becomes impossible for the developer, and the compiler team who gets the bug report, to understand exactly what a foreach or select clause are doing. It's no longer defined by the spec but instead by the whims of an optimization engine.

That's a bit exaggerating that the whole integrity of C# would be at stake because a plugin could misbehave. We could perfectly ask the user to report to Roslyn with only safe mode compiler "on". This could be a property in the project to setup.

No. I'm more worried about the integrity of C#. There is a reason that C# has a spec and that the compiler adhers to that spec with the exception of compat issues. The language provides strong guarantees about how the code will be interpreted and executed.

So today, there are thousands of projects using IL patching solutions (either commercial or things like Fody) that are well integrated enough into msbuild and post-compilation tasks, that a user can't make a distinction if it is integrated into Roslyn or not (apart that the build is slightly/significantly slower), so these solutions that have been around for years (even before Roslyn was released) are in power of breaking the whole integrity of C# or put the Roslyn team at a high risk?... Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

Yes we have had several compiler bugs where IL weaving caused crashes and it took us a long time to determine that it was not a compiler bug, but an IL weaver bug.

Yes we have had several compiler bugs where IL weaving caused crashes and it took us a long time to determine that it was not a compiler bug, but an IL weaver bug.

So yep, that confirms that you get trouble anyway, even without a compiler plugin in Roslyn... so, the integrity in C# is not doomed 😉

@xoofx

That's a bit exaggerating that the whole integrity of C# would be at stake because a plugin could misbehave.

Disagree. This is not about misbehaving plugins. It's about correctly functioning plugins. The only reason for rewriting code is to meaningfully change the way in which the code executes.

So today, there are thousands of projects using IL patching solutions (either commercial or things like Fody) that are well integrated enough into msbuild and post-compilation tasks

Agree these exists and that they should exists. But they should not exist as part of the compiler.

Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

This is a frequent problem for the compiler team. This is true for all the different ways in which developers can manipulate compiled IL. They are often the highest hit crash count the compiler team deals with.

@xoofx This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

This is especially bad for enterprise customers with support contracts, as we are often required to fix their bugs, but they often can't even show us the binaries. We've had to send engineers to do actual off-site support requests for these bugs, wasting huge amounts of resources.

This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

This is a frequent problem for the compiler team. This is true for all the different ways in which developers can manipulate compiled IL. They are often the highest hit crash count the compiler team deals with.

So this problem is unavoidable no matter plugins exist or not. But at least, with a plugin infrastructure right into the compiler, we would be able to track through some assembly attributes which plugins have been used. It would at least streamline to identify what has been modifying the code.

But fair enough, I finally got the plot behind the resistance against the post-compilation plugin idea. That's unfortunate, because I believe that not having this as part of Roslyn is actually hurting more Roslyn than it would, because solutions today are actually more dirty and error prone (working at IL level, ms build tasks in the middle...etc.)

I think Jared and I have basically different arguments here, fwiw. I don't like post-compilation modification because, by and large, the results are often buggy that we inevitably have to debug. Jared's a manager and isn't weighted by such pedestrian concerns 😉. I believe his position is more that the compiler shouldn't be in the job of code rewriting, because our job is to produce a translation engine from C# to IL, not an arbitrary code generation platform.

Edit: Change language. "buggy crap" was too strong and I was being a bit tongue-in-cheek here. I also have no idea of the proportion because by definition we don't get reports from people who have working post-compilation rewriters. However, we do see a lot of these bugs.

@agocke

We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

I don't like post-compilation modification because, by and large, the results are buggy crap that we inevitably have to debug

This seems unnecessarily antagonistic to many of the people we want to involve in this conversation, ie the people who have experience in IL re-writing.

Also I am certain than many of the owners of those tools that have bugs (as all software does), have spent a non trivial amount of time helping users of their tools (and also MS customers), to debug problems that resolve down to a bugs in MS software and sometimes specially roslyn. I know this is true for Fody (which i maintain).

From my perspective this issue provides us with an opportunity to have a more formalised approach allowing 3rd parties to provide the business value that is best delivered by codegen. Ideally this would result in less, as you say, "buggy crap". One specific example of how this could be achieved is, if there was a plugin based codegen model, after each codegen plugin was run an IL verification (like peverify) could run. Also we could provide testing helpers that run the same verification when someone is unit testing their codegen plugin

Now you're describing an arbitrary codegen platform. Helping users generate IL, making sure the IL is legal, maybe making sure the IL has some subset of the semantics provided by the initial program -- these are just codegen tools. I don't think inserting this stuff into the compiler is good software design and I don't think the goals of the compiler team are well served by us owning this process.

image

Got a Roslyn plugin compiler with my branch compilation-rewriter using the existing diagnostic analyzers infrastructure... Now, it is so tempting to proceed further...

Btw I didn't mean Fody or any other tool is buggy. There's a huge amount of ildasm/ilasm going on that is very buggy. A lot of it from within Microsoft.

@agocke

This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

To me, this sounds like an issue that new code rewriting infrastructure could help with, if it's designed with that goal in mind. For example, it could mandate how rewriting of a piece of code is enabled (e.g. every C# file to be rewritten has to have some marker in it or have a special extension like .csr), how it's presented in VS, how it's logged or where the rewritten source files are located.

@agocke (explaining @jaredpar's position)

the compiler shouldn't be in the job of code rewriting, because our job is to produce a translation engine from C# to IL, not an arbitrary code generation platform

I don't think it matters much from a user's perspective whether this would be part of the Roslyn project/team or whether it would be a separate project/team. I (and I assume others here, but I can't speak for them) think that this is something that's missing from the .Net ecosystem. And since the community didn't manage to create such a tool, I am asking Microsoft to do it. And I think the Roslyn team is the obvious point of contact and discussion about this, even if making this work would ultimately require creating a separate team (for example).

On the other hand, if the definition of the job of the C# compiler was that narrow, it would still be a black-box csc.exe, not a library for analyzing C# source code or a platform for analyzers and code fixes.

On the other hand, if the definition of the job of the C# compiler was that narrow

I don't think the job of the Roslyn platform is narrow, I just don't think it's an arbitrary code generation platform. I'm sorry if my statements came across as insulting @SimonCropp, I meant them the exact opposite way -- Fody and friends are existing code generation platforms that are good. I just don't think they belong in the compiler pipeline. Code rewriting is an important piece, but it should be one taken deliberately and carefully, with full knowledge of what's going on. By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility. Sometimes it will be a plugin's bug. But sometimes it will be a Roslyn bug. Sometimes it will even be bugs from earlier codegen plugins interacting with the output of another codegen plugin. None of this looks like a robust pipeline to me, and it puts the compiler in the job of intermediating between all of the concerns, which is very much not the compiler's responsibility.

By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility. Sometimes it will be a plugin's bug. But sometimes it will be a Roslyn bug. Sometimes it will even be bugs from earlier codegen plugins interacting with the output of another codegen plugin. None of this looks like a robust pipeline to me, and it puts the compiler in the job of intermediating between all of the concerns, which is very much not the compiler's responsibility.

Replace "compile time" with "runtime", and you have exactly described what the majority of people spend much time on as programmers, debugging interactions between various components. Dont get me wrong this is not ideal, but is the nature of delivering business value. as long as the business value outweighs the friction, then we are ahead in the long run.

I dont see why the compiler should be immune from this equation. It cannot be so black and white, as in "there is no amount of possible value we could deliver to users of roslyn that would convince us to allow people to perform even the slightest amount of code gen". Based on that logic i would like to see the discussion focus on "what can we expose that will deliver significant impact while mitigating the possible negative side effects"

By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility

This would seem to already be the case. As you have already asserted people are using codegen in the wild. The problem is, from my perspective, it is currently the wild west. There is no guidance, little tooling, no agreed apis, no standard conventions. I suspect many of the problems caused codegen are due to this lack.

Ideally i would like to see most (the ones that add real value and are not just me wondering if something is possible) of the fody addins be re-targeted against a supported API from MS. With the end goal of me doing a scorched earth on the Fody core codebase

This would seem to already be the case. As you have already asserted people are using codegen in the wild. The problem is, from my perspective, it is currently the wild west. There is no guidance, little tooling, no agreed apis, no standard conventions. I suspect many of the problems caused codegen are due to this lack.

Here's my concern about this though. In many of the conversations that have come about on this topic, i've often seen the following back and forth:

Ok. If we were to provide something in this area, we'd need to really ensure the tooling was top notch. We'd have to enforce X, Y and Z. We'd need to make sure that certain experiences had an airtight story (like 'debugging').

To which, the response has been something like:

Oh. You don't need to worry about that. The experience isnt' great. But people put up with it.

So, it's very hard for me to gauge what is and isn't actually important. Or why it is/isn't important to be part of the actual Roslyn compiler platform.

If the existing deficiencies aren't a problem, then why does it need to be part of Roslyn? If it's going to be part of roslyn, i think the team's general position is "it can't be the wild west". but that means then actually putting in all the time and effort to solve those problems. Which means actually investing in all those expensive bits.

Oh. You don't need to worry about that. The experience isnt' great. But people put up with it.
If the existing deficiencies aren't a problem, then why does it need to be part of Roslyn?

Using quotes for something we never said is a bit troubling... We did say that the problem exists but we didn't have the choice other than to live with it. That's different, we wouldn't have had this discussion in the first place if we were not already in the wild west looking for something better

Let me reiterate why this plugin architecture is important to be part of Roslyn:

  • Compilation time would be largely improved
  • All code modifier/generators would use the same unified infrastructure (no more custom msbuild tasks, no more hazardous custom IL patching tools)
  • Using clean SyntaxTree instead of whatever dirty IL rewriter
  • As simple as using DiagnosticAnalyzer today, by installing a NuGet package
  • We could have a good debugging experience (in cases where it makes sense)

So, overall a significant better experience compare to what we have today.

From this discussion, it appears that scenarios that could modify the existing code will never get approved by the Roslyn Team while generators adding stuffs could be. It is better than nothing, but it is excluding quite a few chunks of scenarios out there (e.g AOP). I'm probably gonna try to release a lightweight-fork of Roslyn (easily upgrade-able to any new version of Roslyn) with NuGet packages that will allow these kind of compiler plugins, if it can help to federate a bit our wild west.

Thank you for this list of goals. I think this helps clarify a lot.

I would love to see a similar list of the reasons you think new files cannot solve the problem of editing existing code. Part of the work done prior to the feature cut in 7 was to work to increase the scenarios where separate file generation would solve the problem.

@xoofx

Compilation time would be largely improved

I disagree. Build throughput would be improved here but I think compilation time would be slower. This would be moving code today which executes outside the compiler inside it. Hence it won't get better, only worse. But yes build throughput is likely to improve.

We could have a good debugging experience (in cases where it makes sense)

Most designs I think can achieve this but there is significant work involved. I am going to push back on the "when it makes sense" part though. This is essentially "return true" for us. The push back we get from customers when there are gaps in our debugging experience is virtually guaranteed. I'm not basing this on personal whims, but instead my real experiences over the years on this team.

So, overall a significant better experience compare to what we have today.

You've excluded the biggest con of the design though: developers can no longer reason about how the C# code they wrote executes by thinking of C# semantics. It's now up to the decisions of the code generators (even in the cases where the code generators execute exactly as designed). That is likely to be a non-starter for most of the developers involved.

I don't think this is necessary to have a successful code generator story. There are many ways to do non-modifying generators that provide loads of developer productivity and performance enhancements.

My push back on "code is better written like X" is always the same: then write a code fixer. If the code is better expressed as a different C# pattern then encourage developers to just author it that way.

I disagree. Build throughput would be improved here but I think compilation time would be slower. This would be moving code today which executes outside the compiler inside it. Hence it won't get better, only worse. But yes build throughput is likely to improve.

Yes, I was implicitly talking about the compilation time as a whole (built time in this context), from a user perspective today a user can't separate between the compilation time and the IL patcher time (they are part of their build as a whole), but strictly speaking, that's build time, I agree. My early experiments are showing that build time is a big win using an integrated rewriter into roslyn.

You've excluded the biggest con of the design though: developers can no longer reason about how the C# code they wrote executes by thinking of C# semantics. It's now up to the decisions of the code generators (even in the cases where the code generators execute exactly as designed). That is likely to be a non-starter for most of the developers involved.

I disagree (but you know that 😉 ). I don't know which "most of the developers involved." you are referring to. From the beginning, I'm talking about users that are using products that are doing already codegen IL patching to their assemblies (AOP users, serializers done by a product like Xenko...etc.). These developers using these products never ever came to us saying "Oh, you are modifying the output of Roslyn, it's a huge deal breaker, I won't use your product"

Anyway, I understand the Roslyn team position. So, I'm currently working towards providing a lightweight fork of Roslyn that will provide this compiler plugin infrastructure. People will have to explicitly reference this new compiler in order to use it (though a nuget package). I believe that it will serve as a great playground and provide practical feedback on the subject, without putting Roslyn at risk.

@KathleenDollard

I would love to see a similar list of the reasons you think new files cannot solve the problem of editing existing code. Part of the work done prior to the feature cut in 7 was to work to increase the scenarios where separate file generation would solve the problem.

If i understand your question correctly, you are referring to the approach of partials (classes and methods) as sibling files for code gen?

Some restrictions/scenarios are i can think of

OK, we're talking each other on this point, my fault for not being clear about what the feature work cut in 7 was. I don't think anyone will argue that the current partials is the right answer.

I just went back and found a design idea on how to have generated code replace code. Here's the code:

``` c#
class C
{
public void M()
{
// do stuff
}

[INPC] public int X {
get { ... }
set { ... }
}

// generated
partial class C
{
public supersedes void M()
{
// do something
superseded(); // calls original method
// do more
}

public supersedes int X {
get { return superseded; }
set
{
superseded = value;
RaiseNPC(nameof(X));
}
}
}
```

I believe this works for non-private, property changed, logging and Disposable injection. I am not sure what method timing is (if it's perf instrumentation, then yes, it would work).

So, the most important of the ones you listed is the last. The supersedes proposal will not help that case. Thank you, that was what I was interested in.

I'll have to noodle on that and thinking of similar scenarios, because I would like to see that particular problem permanently solved with analyzers, because I remain in the camp that the code itself should be correct and understandable by anyone reading it and generation be an enhancement programmers can easily find. But this is an opinion.

@jaredpar

My push back on "code is better written like X" is always the same: then write a code fixer. If the code is better expressed as a different C# pattern then encourage developers to just author it that way.

Except there is more than one way in which code can be "better".

For the example of LINQ to foreach optimization, that results in code that performs better, but is harder to understand and maintain. Which is why I would prefer to have LINQ in the source code I'm editing and foreach in the code that's executing.

A code fix is a half-measure: I can use a LINQ query when I first write the code, but then I have to convert it for foreach and from then on, always read and maintain it as foreach. That's better than nothing, but not good enough.

FYI, I just released the Conan compiler, a lightweight fork of the .NET Compiler Platform ("Roslyn") by adding a compiler plugin infrastructure (as described in this discussion as the "post-compilation" solution)

Since we're mentioning PoCs, I thought I'd mention Cometary, which also adds compiler plugins to Roslyn. Instead of being a fork of the compiler, however, it's a simple analyzer that hooks into the inner workings of Roslyn when it is loaded by it, and then rewrites things to load other plugins in memory.

Citing from the "Roslyn Overview" wiki page (with emphasis added by me):

This is the core mission of Roslyn: opening up the black boxes and allowing tools and end users to share in the wealth of information compilers have about our code. Instead of being opaque source-code-in and object-code-out translators, through Roslyn, compilers become platforms—APIs that you can use for code related tasks in your tools and applications.

The transition to compilers as platforms dramatically lowers the barrier to entry for creating code focused tools and applications. It creates many opportunities for innovation in areas such as meta-programming, code generation and transformation, [...].

Having read this issue, I cannot help but thinking that all of this innovation could happen a lot easier if Roslyn provided suitable extensibility points, rather than forcing everyone to come up with their own workaround solutions (inevitable bugs included).

Roslyn's "core mission" was to open up the compiler and enable new scenarios (and simplify existing ones)... why stop now?

Absolutely. Without counting Cometary which is more of a hack, I know two different projects that attempt to bring something close to metaprogramming and/or compiler plugins by forking Roslyn: Conan and StackExchange.Precompilation.

Furthermore, the most popular tool right now for extending the compilation process is Fody, which has the advantage of working on all of .NET, but the disadvantage of only allowing the IL to be modified.

Having read this issue, I cannot help but thinking that all of this innovation could happen a lot easier if Roslyn provided suitable extensibility points, rather than forcing everyone to come up with their own workaround solutions

This issue is about finding the right extensibility points. As has been said several times already: we want to add generators to the language. In fact we spent a considerable amount of time in the Dev15 time frame doing exactly that. At the same time we have to find a solution that meets the expectations of our users.

It's a shame to see this stalled again at the moment.

This issue is about finding the right extensibility points. As has been said several times already: we want to add generators to the language. In fact we spent a considerable amount of time in the Dev15 time frame doing exactly that. At the same time we have to find a solution that meets the expectations of our users.

The issue does seem to be a bit more than that. People are asking for a powerful low level code transformation and generation extensibility point in the compiler. As @stakx said, this could bring uniformity and ease of use to the existing set of post-processing tools, and it does seem to fall within Roslyn's stated goals of being a true CaaS.

The resistance to that request does not actually seem to come from the design challenges on the IDE side, but from some fear that this feature might somehow be too powerful and that people will use it badly. As somebody who would like to be able to write these types of transformations, that feels like it's babying me a bit - any sufficiently powerful tool will inevitably allow me to break things if I use it wrong. I don't understand the distinction between it being okay that this is available through third-party post-processors but not okay when using an official API.

I can actually see one reason why this could be an issue, and I haven't seen it mentioned here so I'll lay it out. In a typical IL rewriter scenario, the person using the tool is expected to be very aware that they're doing so - it's always a conscious and considered decision. However, if rewriters could ship through nuget alongside libraries, we would have a lot more situations where somebody is using a transformation inadvertently. This particularly applies to the LINQ-like API optimisations discussed.

It's a shame to see this stalled again at the moment.

It's not stalled. It's being actively looked at for C# 9.

People are asking for a powerful low level code transformation and generation extensibility point in the compiler.

That specifically is not under consideration. The code generation capabilities we are consideration are just that code generation not code mutation. There is no design under consideration which would allow generators to alter the code that developers have written.

For developers who want to mutate developer code there are two avenues to consider:

  1. Author a code fixer in Rosyln. If you truly believe the code can be written more efficiently then write an analyzer / fixer pair that automates the code transformation for the customer.
  2. There are a number of build post-processing pipelines available to rewrite the built binary that you could look into.

I'm very happy to hear that some work in this area is being considered, sorry for making a false assumption because this issue had gone dry. Generation without mutation is still a powerful feature which I'm excited about.

That said, I still believe that it's a shame that mutation is off the table. That binary post-processing route is being used today for some very useful and powerful things, and I would very much like to be able to make that a more standardised and easier process. Roslyn, being a tool that's designed for analysing and modifying C# compilations, provides a beautiful set of APIs with which to do this - except that it doesn't.

All I want is to have a low level hook directly before Emit, where I'm given the compilation, do whatever I want to it, and give it back. It's fragile if used poorly, but a much improved experience vs using an IL rewriter. It's not meant to be a common feature, it's just about having the open CaaS actually be open to those of us who want to do crazy things. We're already given the power to customise the build pipeline with MSBuild, why not just have one earlier hook point?

Do you see mutation as off the table forever, or just off the table for C# 9 and you might come back to it?

All I want is to have a low level hook directly before Emit, where I'm given the compilation, do whatever I want to it, and give it back

Understood and that's exactly what we don't want to provide. Providing such a hook means that developers can no longer have confidence the code they are writing is what is being executed. Any hooked in generator could be rewriting it under the hood to have subtly, or largely, different semantics than what they typed out.

One of the reasons C# goes to great length to specify behavior our features is so that developers can reason about what the code is doing. Once silent mutation is in play then that's all out the window. The generators define the semantics of the language, not the compiler.

Do you see mutation as off the table forever, or just off the table for C# 9 and you might come back to it?

Forever for the reasons stated above.

As always I give my usual push back on this. If the mutation is so important why can't it be rewritten as a code fixer? Rather than silently rewriting code for users provide a fixer that just changes C# to be preferred pattern?

If the mutation is so important why can't it be rewritten as a code fixer?

Because typically this is about writing simple code and having complex code come out and you don't wanna maintain that complex code.

Because typically this is about writing simple code and having complex code come out and you don't wanna maintain that complex code.

Yes but how often does the transform preserve the originals semantics? In my experience very rarely.

If it preserves the original semantics and is "better" then why not just fix in the compiler? Why force a complex post processing step?

For some scenarios (AOP) it's desireable to change semantics, for others, like linq optimizers, for most i'd argue they don't care and consider it an acceptable trade off to change semantics slightly as long as the program keeps the same observable behavior.

Now, i get why roslyn insists on staying true to the language spec but people who want to shoot themself in the foot will achieve this with or without the help of roslyn and will or will not report the issue first on roslyn where someone will need to figure if a post processing step or profiler is to blame. I guess this is just about drawing the line that this is not a scenario you want to support but you can do if you insist

Understood and that's exactly what we don't want to provide. Providing such a hook means that developers can no longer have confidence the code they are writing is what is being executed.

What I fundamentally don't understand is how that's a change from the current state of affairs. People are already doing mutation with IL rewriting. We're not asking Roslyn to encourage us to do it everywhere, just to provide better tooling for those of us who do. It feels wrong that this open CaaS is actually being very restrictive in allowing us to customise those things. MSBuild is already official tooling which supports it, it just supports it poorly. You're not preventing us from doing it, you're just making it harder, uglier, slower, and more bug prone.

As always I give my usual push back on this. If the mutation is so important why can't it be rewritten as a code fixer? Rather than silently rewriting code for users provide a fixer that just changes C# to be preferred pattern?

Yes but how often does the transform preserve the originals semantics? In my experience very rarely.

If it preserves the original semantics and is "better" then why not just fix in the compiler? Why force a complex post processing step?

Good code and fast code are not always the same thing. As a library author targeting high performance environments, I often find myself torn between performance and a nice API.

For example, sometimes it's much nicer to have a fluid builder than a messy constructor, but that introduces additional memory copies and potentially boxing. That's basically the LINQ scenario. By spec, my builder chain is meant to be exactly equivalent to a fiddly struct I could put together manually. I'd love to be able to tell Roslyn about that equivalence so that I can write nice maintainable typesafe code and have high performance code come out.

I don't want a code fix, because I don't want the ugly code in my project.

Semantically, this is a bit like introducing my own JITer rules. It can't just be about fixing the compiler, because the compiler doesn't know about the semantics of my library.

Good code and fast code are not always the same thing. As a library author targeting high performance environments, I often find myself torn between performance and a nice API.

+1 . This is very painful to me.

I desperately need this AOP feature, and I've been waiting for it for three years. I currently use Roslyn instead of Emit and ExpressionTree, which is very high performance and easy to encode. But it can't do what AOP does.

For some scenarios (AOP) it's desireable to change semantics,

This is a primary reason why supporting mutators in code generators is an explicit non-goal. As stated before C# goes to great lengths to specify the semantics of the language constructs we offer. Including generators in the pipeline that specifically aim to change those semantics defeats the purpose of having such a precisely specified language.

People are already doing mutation with IL rewriting.

Correct and they can continue to do so. It's appropriate for non-C# compiler tools to offer non-C# semantics. Changing the C# compiler to explicitly allow non-C# semantics for C# code defeats the purpose of having a specified language.

Understand the point of view taken by several of the responders here. However we've listened to the arguments, considered them but it has not changed our stance.

Correct and they can continue to do so. It's appropriate for non-C# compiler tools to offer non-C# semantics. Changing the C# compiler to explicitly allow non-C# semantics for C# code defeats the purpose of having a specified language.

That is a completely philosophical standpoint. Eventually i would get "non-C# semantics" either way.
Basically As long as you live under my roof (csc.exe), you obey my rules

But fair enough, if the decision has been made already there's no point in discussing

Correct and they can continue to do so. It's appropriate for non-C# compiler tools to offer non-C# semantics.

The way I view it, these plugins would be non-C# compiler tools. This distinction seems quite arbitrary. I still don't understand what the meaningful difference is between letting us plug things into MSBuild, and letting us plug them in slightly earlier.

Semantically, I don't want this to become a C# compiler feature, I just want the non-C# compiler tools to be better.

I recognise that you're firm in your position on this, but I think that's a shame. It's complicating and holding back the quality of some of my projects. I think it should be like unsafe - let me do it, just put it behind a big warning that I'm stepping into danger town.

Just to throw out a completely separate example, I've been thinking for a while that it might be fun to try to fully re-implement a version of Code Contracts using Roslyn. The design time static analysis is probably doable with analyzers, but the lack of mutation means the runtime functionality is currently out of reach. It's very frustrating to have access to an awesome powerful tool like Roslyn and have this be so close to within reach, but impossible.

Could the situation be improved if the IDE could give an indication of the change that would happen? That would mean that "silent mutation" is no longer in play.

Imagine that transformations are written as normal code fixes, but with some tag that indicates that they should auto apply on build. The IDE could use the normal tooling to display diagnostics alongside them, highlight the code that will be changed, and allow you to preview the transformation in a tooltip. #pragma transform enable|disable could exist to control them directly inline. Pragmas may also be a good way to resolve ordering conflicts, which would be build errors until resolved. A tool could allow you to see a list of all of the transformations in the project to preview them all.

As a variation on this, #pragma codefix #CS12345 applyonbuild could even be allowed for arbitrary code fixes.

This would put developers fully in control of any transformations that happen, and would make it much harder for weird bugs to slip in. It might even make it relatively simple to apply intellisense and analyzers on top of transformed code, as the IDE would know about it and be able to display it... I'm not sure on that point.

This wouldn't help things at all for people developing in notepad, but would eliminate the "silent mutation" concern in the common case of an IDE.

Because typically this is about writing simple code and having complex code come out and you don't wanna maintain that complex code.

Yes but how often does the transform preserve the originals semantics? In my experience very rarely.

If it preserves the original semantics and is "better" then why not just fix in the compiler? Why force a complex post processing step?

Because it isn't "better" for everyone. It might be better for me + my team/project.
One simple example that always pops back up is logging where simplicity during development is at odds vs. runtime effects.
I think I can present the "case" for it by using a source-level re-writing we are actually doing:
We have a very low latency internal logging library (which we might even open source in the future if we can dedicate resources to supporting it).
Fun fact: It's based on a tracing library that comes with a clang compilation time plugin that takes an AST and mutates it.

What this sort of approach allows for is that developer write simple code like this:

_log.Debug($"Wow {such} nice logging: {very} success");

But it relies on being able to-rewrite them during a pre-compilation process where we scan the code,
turn the format string: "Wow {such} nice logging: {very} success" into a static entry in a big array of strings and instead call:

_log.DebugInternal(123, such, very);

Which, unlike to how the code is written originally, is orders of magnitude more efficient because it pushes those values into a shared memory based queue with 20ns avg. latency for 2-3 parameters (I forget the numbers) and have another process (which we launch) read from said shared memory to render into a string and push a structured log.

We do this because interpolated strings are such a nice hook to take advantage of: they add so much rich context while keeping the code very clean and succinct. It would be impossible to do the same thing from a pure binary IL mutator given how roslyn handles the interpolated strings before we see them in the binary.

This was a relatively low effort source-processor to develop, but I can't even imagine pulling this off in binary.

We are able to generate huge and rich tracing logs this way from C# backends while keeping developers largely oblivious to its mechanics and writing very clean and readable code.

I would never in my life think about even proposing any of this as an extension to the compiler.
I'd like to think that I would be put up against the wall and shot, and rightfully so.
Nor would I desire a code-fix for this.

I can and do have a way out: A code mutator, that does not change the semantics, but for that code mutation is necessary.

I get that you don't want to support this. But there are valid use cases for this.

@damageboy, how would you feel about doing this with the very rough auto apply on build code fixes idea I suggested above? I'm growing to like that idea, but nobody has reacted to it and I'm slightly scared that it's dumb for an obvious reason that I missed. Would you be happy with that as a workflow in your case?

I would love for anything that is standard and works.

If I can turn on and control these transformations at the project level I
would be more than happy.

I have something that works now, but just you would expect from a 2 day
hack, it is not standard, not chainable, and needs resources to maintain.

What we seem to be lacking is acceptance from people steering roslyn that
there is some validity is source code mutation.

On Sun, 22 Sep 2019 at 8:16 Sam notifications@github.com wrote:

@damageboy https://github.com/damageboy, how would you feel about doing
this with the very rough auto apply on build code fixes idea I suggested
above? I'm growing to like that idea, but nobody has reacted to it and I'm
slightly scared that it's dumb for an obvious reason I missed. Would you be
happy with that as a workflow in your case?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/roslyn/issues/19505?email_source=notifications&email_token=AAA6WIVKVE5ZZT7FHUJGFW3QK35TPA5CNFSM4DLLLXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7I6UXY#issuecomment-533850719,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6WIQ3ECPYIFTDUMYKUD3QK35TPANCNFSM4DLLLXJA
.

>

Shechter.

I wanted to add the same code mutation to ZeroLog either automatically through a Fody addin or manually though an analyzer (which is a viable but arguably less nice option in the ZeroLog case), but never got to implement it. I probably should do that someday.

If I had an easy way to apply a code mutation automatically at compile time, I would have used it ages ago, and ZeroLog would only need a standard logging API instead of all that .Append(...) stuff.

@damageboy I'm wondering: how do you apply your transformation? Do you use a compiler fork (such as Conan or maybe one you wrote yourself), or is it a custom MSBuild target that runs before the compiler? Just curious.

We have a very hacky solution, which is why we don't want to publish it.

I based it off of https://github.com/roji/AsyncRewriter/tree/master/src/AsyncRewriter

I didn't know about Conan, very nice...

More interesting projects to keep an eye on:

CodeGeneration.Roslyn
Uno.SourceGeneration

I'm surprised no one has linked to the Source Generators doc here yet. I wouldn't have found out about it if not for Twitter.

Source Generators look like they will fit Orleans' needs, so please let us know if we can provide early feedback.

We are using Roslyn for consumption (analysis) and production of source now (trees or text, depending on whether runtime or build time codegen is being used) - which was not true when this issue was opened. The MSBuild/AdHocWorkspace integration code is something I would happily delete and try to forget.

For reference, Orleans' code generator library: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator
And the MSBuild integration: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator.MSBuild

Was this page helpful?
0 / 5 - 0 ratings