Roslyn: Decompilation should put new lines between blocks

Created on 7 Mar 2018  路  16Comments  路  Source: dotnet/roslyn

We should layout methods a little better by introducing new lines to make a little more readable (Reflector did this well)

  1. Create Console App (.NET Framework)
  2. In the following, CTRL+Click int

``` C#
class Program
{
static void Main(string[] args)
{
int foo;
}
}


3. When prompted say Yes to decompile
4. Scroll down to CompareTo:

**Expected:**

``` C#
        public int CompareTo(object value)
        {
            if (value == null)
            {
                return 1;
            }

            if (value is int)
            {
                int num = (int)value;
                if (this < num)
                {
                    return -1;
                }

                if (this > num)
                {
                    return 1;
                }

                return 0;
            }

            throw new ArgumentException(Environment.GetResourceString("Arg_MustBeInt32"));
        }

Actual:

C# public int CompareTo(object value) { if (value == null) { return 1; } if (value is int) { int num = (int)value; if (this < num) { return -1; } if (this > num) { return 1; } return 0; } throw new ArgumentException(Environment.GetResourceString("Arg_MustBeInt32")); }

Area-IDE Feature Request help wanted

Most helpful comment

Honestly, what might make a lot of sense would be to have Roslyn attempt to parse the decompiled source, and if that does not produce parse errors then light up normal roslyn services on the file (like formatting it and simplifying it and whatnot). That way the decompiler doesn't have to go try to match everything in Roslyn.

The reason we didn't do that initially was because it was unclear if we could get good trees all the time out of the decompiler. It's clear there are cases where we don't, but it seems like we will much of the time. In that case, it would be rather simple to just get this improvement in Roslyn itself.

All 16 comments

@davkean Hello, this is my first time trying to contribute to open source and I'd like to give this a shot. I believe this is actually a modification that needs to be done to ICSharpCode.Decompiler. I've narrowed it down to a change that's needed in the CSharpOutputVisitor but I'm having trouble getting a test case written so that I can figure out where exactly the change is needed. Both the Roslyn and ICSharpCode projects have a lot of dependencies where the call is to decompile.

Thanks for the offer @Candace-Williford! Hey @sharwell, can you help @Candace-Williford out here?

Changing the code in ILSpy itself would be the shortest path to a solution here. I didn't close it as an outright duplicate because it's also possible to resolve this by using the Roslyn code formatter on the decompiled source code, and updating the formatter to add the newlines.

@siegfriedpammer Can you point @Candace-Williford to the responsible code in ILSpy?

Changing the code in ILSpy itself would be the shortest path to a solution here.

@sharwell @Candace-Williford currently we do not implement many of the formatting options in CSharpFormattingOptions because many of them were implemented at a different level in NRefactory. We only took the parts needed for a decompiler (i.e. type system, semantic analysis, AST and output visitor). The CSharpFormattingOptions were applied in the FormattingVisitor which changed the locations of the tokens in the AST.

However, this solution does not work anymore, because we insert the AST locations in the final step, when actually printing the AST to the output. Before the output step there are no tokens in the AST that could be modified, therefore the code that handles the formatting options would have to be moved to/implemented in CSharpOutputVisitor. This is a bigger task. At the end of this we could add a new formatting option that allows you to add new lines between block statements.

If you are interested in contributing this, I will be happy to help you. I think the easiest way to discuss this would be either opening an issue in our repo and/or contacting us on gitter.

Just one thought: Why not use the Roslyn Formatter/Simplifier? This would solve #25251 as well.

@sharwell I have done some changes to the ILSpy integration in Roslyn, I will soon provide a PR.

@siegfriedpammer @sharwell Which option would be the most ideal solution, making the change described in the second paragraph or using the Roslyn Formatter?

@Candace-Williford The Roslyn formatter does not currently insert these lines. I believe a new formatting option in Roslyn to insert these would be desirable, but I'm not sure how easy it would be. The ILSpy formatting options also don't seem to have a corresponding option.

(This comment is a fancy "I don't know".)

believe a new formatting option in Roslyn to insert these would be desirable, but I'm not sure how easy it would be

It would be fairly easy IMO. Note: this could be done by having a full option for this (which the engine would respect and follow), or just providing a one-off rule to the engine when it formats here. My preference would be the former as i think this is a reasonable style to have a true user-visible option for (and it's something the Roslyn IDE codebase would likely want to enable for itself).

AFAICT, this would be an IFormattingRule that implemented GetAdjustNewLinesOperation. The way that woudl work is that every time it was called on a } it would get the containing construct for that block*. It would make sure that that } was the last token of that construct. It would then see if the containing construct was in a list and was *not the last item of the list. If so, it would try to force lines so that there was a blank line between it and the next token.

Note: open questions exists around what to do if there are more than blank line already. Should multiple blank lines be collapsed to one blank line?

--

Getting the containing construct is fairly simple. Just get the parent of the } And only accept if it is a statement, type, or namespace. In the case that the parent is a block, check if the block is the embeddedstatement of another statement or child of a member. If so, walk up one more.

@Candace-Williford @sharwell I have created an issue in our repository for further discussion of the changes that would have to be made to the decompiler, in case you're interested in discussing this/helping out.

Thanks @siegfriedpammer I will keep an eye on these and contribute as I can. I'm going to hold out on working on it for now until I get a new system because my crappy old laptop apparently can't handle the size of both of these projects very well. Hopefully I will have one in the next week or so. If anybody else wants to jump in and work on it before then, certainly feel free.

@coder-candace Are you still interested in contributing / helping us out with this?

Honestly, what might make a lot of sense would be to have Roslyn attempt to parse the decompiled source, and if that does not produce parse errors then light up normal roslyn services on the file (like formatting it and simplifying it and whatnot). That way the decompiler doesn't have to go try to match everything in Roslyn.

The reason we didn't do that initially was because it was unclear if we could get good trees all the time out of the decompiler. It's clear there are cases where we don't, but it seems like we will much of the time. In that case, it would be rather simple to just get this improvement in Roslyn itself.

@CyrusNajmabadi What is the main reason why Roslyn wouldn't be able to parse the decompiled source? If it's unspeakable identifiers (they do seem to cause a pretty big mess), then maybe the parser could have a heuristic for parsing them? (Possibly hidden behind an option.)

Another approach would be some special syntax with which the decompiler would tell the parser "I know this is not a valid identifier, but parse it as one anyway". The syntax could be something like `identifier`.

Though I'm not sure if modifying the parser just for this would be worth it. (On the other hand, unspeakable identifiers are really common in decompiled sources.)

@svick Actually, I don't know yet what's wrong with VS not being able to collapse the comments, but I recently added the EscapeInvalidIdentifiers transform to the decompiler pipeline used in VS. That should take care of unparsable identifiers. So actually parsing the source code should be possible in most cases.

Some language features produce uncompilable (although maybe parsable?) code if not detected properly by the decompiler, e. g., tuple member names, dynamic... We will have to improve the decompiler to support these properly, but that will take some time. One thing that might cause problems with the Roslyn language services are parameterized properties (see https://github.com/icsharpcode/ILSpy/issues/1089). If anyone has a good idea, how we could solve this, please chime in...

@CyrusNajmabadi What is the main reason why Roslyn wouldn't be able to parse the decompiled source? If it's unspeakable identifiers (they do seem to cause a pretty big mess), then maybe the parser could have a heuristic for parsing them? (Possibly hidden behind an option.)

The parser is generally pretty loathe to have special knowledge of anything outside of the language. That said, if there was a good definition of what tehse looked like, we could potentially try to strip/replace them out beforehand.

However, it's unclear if that would make the code parseable without errors. And without a good parse tree, things "go off the rails". Formatting, in particular, doesn't even bother trying to format things with errors, because it will most likely make things worse.

Some language features produce uncompilable (although maybe parsable?) code if not detected properly by the decompiler,

That's fine. Most LS features just like having a good tree. It would also allow us to make thigns much closer to the current metadata-as-source experience without forcing all that work on ilspy. This would make me happy given these other bugs i filed:

https://github.com/dotnet/roslyn/issues/26436
https://github.com/dotnet/roslyn/issues/26426

Was this page helpful?
0 / 5 - 0 ratings