Roslyn-analyzers: Add analyzer to warn users when they reference a Workspace API from an Analyzer

Created on 17 Feb 2018  路  8Comments  路  Source: dotnet/roslyn-analyzers

Most helpful comment

  1. Split the templates to have 2 separate projects
  2. Work in the NuGet area to ensure the packaging happens correctly

This work isn't hard and is something I would be happy to do. @mavasani I get that image loads are a real perf problem. I want to do some analysis to figure out what the perf impact is here so we can accurately articulate it to users of the roslyn API. I think analyzers shipped by Microsoft should always be in a single assembly, but I really feel that the templates as they exist today are not leading people into the pit-of-success鈩笍

Ideally we should:

  1. Document this problem and make sure the documentation is discover-able
  2. Write an analyzer to tell the developer when they hit this
  3. Author the templates so that this issue is difficult to encounter in the first place

However since there is no design discussion to be had for 1 and 2 lets just do them.

To accomplish 3 there are all sorts of options other than packaging two assemblies. We could, for example, use the .NET linker in the templates so only one assembly is ever loaded even if there are separate projects.

All 8 comments

I would heavily prefer refactoring dotnet/roslyn-sdk templates to place analyzers and code fixes in separate assemblies. I believe getting this analyzer correct will be time consuming for both implementation and execution time, while separating the assemblies already produces exactly correct results. If an analyzer is involved, I would want to see it restricted to simply reporting a warning if an assembly containing analyzers has a reference to the workspace package during builds (enforced separation at the assembly level).

Edit: I believe it is bad practice to mix incompatible references in the same assembly, so on top of the technical challenges of creating this analyzer I believe it would be a disservice to users to suggest that the mixed approach is considered a good practice.

I believe it is bad practice to mix incompatible references in the same assembly

I feel it is even more bad practice to recommend an approach to our users, which we ourselves do not follow (and will not follow in future), primarily due to huge performance hit associated with this design. Separation of assemblies for analyzers and fixers mean very likely going from 3 assemblies (core, C# and VB) to 6 assemblies. None of the analyzer/fixer pairs that Roslyn team owns across all our repos lie in different assemblies for primarily this reason. When we eventually move our style analyzers and fixers out from roslyn repo to roslyn-analyzers, we are not going to split them into separate assemblies.

Writing an analyzer to detect such a mistake is trivial - a compilation analyzer that determines the transitive type reference chain can easily flag an error if a DiagnosticAnalyzer type transitively references something from Workspaces.

I am strongly in favor of not enforcing a performance hit for a scenario that majority of analyzer authors will never encounter, and can be easily avoided via a simple analyzer.

@mavasani I have not observed the performance issues you describe. For the time being, we've been proceeding in the direction of splitting up the analyzers/code fixes during the transition to packages that support NuGet-based distribution. You can see the initial work here:
https://github.com/dotnet/roslyn/tree/master/src/CodeStyle

I would certainly be willing to look into alternatives if the performance overhead becomes observable, but currently it's at least 3 orders of magnitude down from the known problems at hand. Based on measurements from AnalyzerRunner (dotnet/roslyn#23582, dotnet/roslyn#23583, and several feedback tickets), I believe our limited engineering resources are better directed towards other areas where they will have more impact on performance for end users, and at least for now allow assembly separation to solve this problem for us.

I don't buy the argument about going from 3 to 6 assemblies being a perf hit argument. How many customers would hit this? They would have to have both vb and c# projects in their solution for one thing. Otherwise we're just talking about one more assembly load. Is one more assembly load to load code fixes really going to result in a user perceived perf hit?

Also note, the CLR folks consider the fact that an assembly can contain references to other assemblies that may not be loadable and yet work properly in other areas an implementation detail of the JIT, and in fact ngen can't take it very well. Mono can't take it at all last I tried. So a plan that encourages that seems like it's going against guidance from multiple runtimes.

I believe our limited engineering resources are better directed towards other areas where they will have more impact on performance for end users, and at least for now allow assembly separation to solve this problem for us.

If resources is the primary factor, then I vote for either doing nothing here or writing this trivial analyzer (at most couple of hours of work). The design you propose needs bunch of work:

  1. Split the templates to have 2 separate projects
  2. Work in the NuGet area to ensure the packaging happens correctly (explained here
  3. Split all the existing analyzers/fixers that we own into separate assemblies - these includes all the different analyzer packages in the roslyn-analyzers repo as well as IDE analyzers. Additionally, it also needs some infrastructure work in the repo to ensure the NuGet packages are generated correctly with fixer and analyzer assemblies.

Even after these things are implemented, note that the performance can only get worse with the new design, it cannot improve. I do not agree with your claim that is fine to double the number of assembly/image loads for this specific corner scenario, especially given that image loads is one of the most closely tracked metric in RPS.

we're just talking about one more assembly load. Is one more assembly load to load code fixes really going to result in a user perceived perf hit?

@AArnott - extra single assembly load is not an issue for a single analyzer, but it magnifies quickly for an end user when we are taking about 10s-100s of them. For example, we want to eventually move FxCop analyzers into the box enabled by default. We currently have 5 sub-packages with 15 assemblies in total (which is already too many), and this design doubles it to 30 assemblies.

We also need to understand the root cause why the analyzer authors are running into this issue:

  1. They _inadvertently_ referenced something from Workspaces. This was the case for our roslyn-analyzers analyzer, which was a very special case as the analyzer was checking the implementation of CodeFixProvider and used types from Workspaces to get fully qualified names. Users in this category quickly figure out this issue when they try to test their NuGet package in a command line build - the reason why we have not done anything for this scenario in the past.
  2. They are _intentionally_ referencing something from Workspaces as they want to write VS-only analyzer and/or find some useful API/service that is missing in CodeAnalysis. For VS-only analyzer authors, even assembly split is not of any help as they will add a reference to Workspaces assembly in their analyzer project to achieve their goal. For the latter, we should move such missing functionality down from Workspaces to CodeAnalysis.

Except for @sharwell in the past and @AArnott now, we haven't got any external feedback about this issue. We may want to consider waiting for more feedback before we do anything here. If we strongly feel something needs to be done here, I feel analyzer should suffice.

I would let @jinujoseph make a call here.

I think an analyzer is far better than the situation we're in now, and actually still applies in a split assembly world if we end up going that way, since folks who don't understand may add a workspaces reference to the analyzer assembly.
If it is as cheap as you say, that's fantastic. When can it be done?

I didn't know we expected so many analyzer assemblies. I also expected they would be loaded out of proc. As both of these you're refuting, I submit to your proposal.

  1. Split the templates to have 2 separate projects
  2. Work in the NuGet area to ensure the packaging happens correctly

This work isn't hard and is something I would be happy to do. @mavasani I get that image loads are a real perf problem. I want to do some analysis to figure out what the perf impact is here so we can accurately articulate it to users of the roslyn API. I think analyzers shipped by Microsoft should always be in a single assembly, but I really feel that the templates as they exist today are not leading people into the pit-of-success鈩笍

Ideally we should:

  1. Document this problem and make sure the documentation is discover-able
  2. Write an analyzer to tell the developer when they hit this
  3. Author the templates so that this issue is difficult to encounter in the first place

However since there is no design discussion to be had for 1 and 2 lets just do them.

To accomplish 3 there are all sorts of options other than packaging two assemblies. We could, for example, use the .NET linker in the templates so only one assembly is ever loaded even if there are separate projects.

@jmarolf - That was perfectly stated! Agree with you on all points.

If it is as cheap as you say, that's fantastic. When can it be done?

@AArnott I have a PR out here: https://github.com/dotnet/roslyn-analyzers/pull/1589. Please look at the tests and confirm if this seems sufficient for the case you hit.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stephentoub picture stephentoub  路  3Comments

paulomorgado picture paulomorgado  路  3Comments

paulomorgado picture paulomorgado  路  3Comments

OmarTawfik picture OmarTawfik  路  3Comments

lgolding picture lgolding  路  4Comments