After enabling the mono linker for our .NETCoreApp binaries it has identified a number of places where we have dead code.
These fall into 3 categories.
The reports were generated by enabling ILLink (mono linker build from codegen team /cc @erozenfeld) and diffing the output.
The diffs are from a Windows build and constrained to only assemblies that are part of NETCore.App.
When the linker is enabled it will produce a "PreTrim" folder in the object folder of an assembly, for example <corefx>\bin\obj\AnyOS.AnyCPU.Debug\Microsoft.CSharp\netstandard\PreTrim\Microsoft.CSharp.dll
.
To enable a central copy of the trimmed assemblies you can set the property BinPlaceILLinkTrimAssembly=true
, This will copy all pretrimmed and trimmed assemblies to a folder under bin: <corefx>\bin\ILLinkTrimAssembly\netcoreapp-Windows_NT-Debug-x64
.
To generate the reports linked in this issue you must have a copy of the AsmDiff.exe tool. This tool isn't yet available for core, see (https://github.com/dotnet/buildtools/issues/1420). If you have the desktop version of the tool you can enable the reporting by setting AsmDiffCmd=<pathToAsmDiff.exe>
The linker can be enabled for any project (not just those in NETCore.App) by building with ILLinkTrimAssembly=true
after https://github.com/dotnet/corefx/pull/17825 is merged.
Pick a library and make a note you're going to work on it.
Open the html report above for the library. Look for lines in red. These are theoretically dead. Search for the code in the src
folder of this repo and delete it. Continue through the report for the library and delete all the dead code like that.
BUT there ere are some special cases:
src\<library>\src\Resources\strings.resx
. The build will regenerate SR after you do this. Ignore anything else in âclass SRâ -- there are several methods that may show up as dead.#if uap
or having a condition in the .csproj file like eg '$(TargetGroup)' == 'uap'"
After you have removed the dead code from a library, make sure it builds (run âmsbuildâ in the âsrcâ folder for the library). If it does not, go backwards. If it does build, next check the tests build and pass (run âmsbuild /t:buildandtestâ in the âtestsâ folder). Again if they do not, retrace your steps.
If that all looks good, you can put up a PR for the library. When the PR goes in, we can check it off here.
Here's a table listing the diff in bytes, in descending order. XML has an issue already
| | | |
|-------------------------------------------------------|---------------|------------|
| Library | Bytes removed | % decrease |
| TOTAL | 1221632 | 7.62% |
| Microsoft.XmlSerializer.Generator.dll | 392704 | 46.29% |
| System.Private.Xml.dll - dotnet/runtime#20506 | 93696 | 2.54% |
| System.Net.Http.dll | 42496 | 12.89% |
| System.Private.DataContractSerialization.dll | 39936 | 4.40% |
| System.ComponentModel.TypeConverter.dll | 36352 | 10.71% |
| System.Data.Common.dll | 36352 | 2.97% |
| System.IO.FileSystem.dll | 32768 | 26.56% |
| System.Net.HttpListener.dll | 32256 | 9.39% |
| System.Runtime.Extensions.dll | 31744 | 12.20% |
| System.Net.Mail.dll | 28672 | 10.81% |
| System.Net.Security.dll | 24064 | 9.73% |
| System.Security.Cryptography.X509Certificates.dll | 23552 | 15.13% |
| System.Net.WebSockets.Client.dll | 22016 | 27.22% |
| System.Net.Primitives.dll | 20992 | 21.58% |
| System.Net.Requests.dll | 20480 | 11.56% |
| System.IO.FileSystem.DriveInfo.dll | 18944 | 47.44% |
| System.Transactions.Local.dll | 17920 | 9.70% |
| System.Net.NetworkInformation.dll | 17408 | 15.45% |
| System.Net.NameResolution.dll | 15360 | 28.30% |
| System.Net.WebHeaderCollection.dll | 14336 | 34.57% |
| System.Net.Sockets.dll | 12288 | 5.35% |
| System.IO.FileSystem.Watcher.dll | 11776 | 28.40% |
| System.IO.Compression.dll | 11264 | 7.83% |
| System.Security.Cryptography.Algorithms.dll | 10240 | 7.04% |
| System.Threading.dll | 10240 | 15.15% |
| System.IO.Pipes.dll | 9728 | 14.18% |
| System.Diagnostics.Process.dll | 9216 | 7.83% |
| System.Security.Cryptography.Encoding.dll | 9216 | 22.22% |
| System.IO.MemoryMappedFiles.dll | 8704 | 19.77% |
| System.Security.Cryptography.Csp.dll | 8704 | 9.83% |
| System.Security.AccessControl.dll | 8192 | 7.69% |
| System.Security.Cryptography.Cng.dll | 7680 | 7.04% |
| System.Collections.dll | 7168 | 5.88% |
| System.Net.Ping.dll | 7168 | 15.05% |
| System.Console.dll | 6656 | 7.51% |
| System.Linq.Expressions.dll | 6656 | 1.03% |
| System.Security.Principal.Windows.dll | 6656 | 9.29% |
| Microsoft.Win32.Registry.dll | 6144 | 12.77% |
| System.Private.Uri.dll | 6144 | 5.41% |
| System.Net.WebClient.dll | 5632 | 8.27% |
| System.Private.Xml.Linq.dll | 5632 | 3.37% |
| Microsoft.CSharp.dll | 4096 | 0.78% |
| System.ComponentModel.Primitives.dll | 4096 | 16.00% |
| System.Reflection.Metadata.dll | 4096 | 0.81% |
| System.Threading.Tasks.Dataflow.dll | 4096 | 1.86% |
| System.Diagnostics.StackTrace.dll | 3584 | 20.59% |
| System.IO.FileSystem.AccessControl.dll | 3584 | 12.96% |
| System.Linq.Parallel.dll | 3584 | 1.41% |
| System.Text.RegularExpressions.dll | 3584 | 2.59% |
| System.IO.IsolatedStorage.dll | 3072 | 8.00% |
| System.Threading.Overlapped.dll | 3072 | 28.57% |
| System.Security.Claims.dll | 2560 | 5.32% |
| Microsoft.VisualBasic.dll | 2048 | 1.05% |
| System.Collections.Concurrent.dll | 2048 | 2.03% |
| System.Collections.NonGeneric.dll | 2048 | 4.44% |
| System.Diagnostics.TraceSource.dll | 2048 | 4.08% |
| System.Diagnostics.Tracing.dll | 2048 | 11.11% |
| System.Runtime.InteropServices.RuntimeInformation.dll | 2048 | 16.00% |
| System.Threading.Tasks.Parallel.dll | 2048 | 4.17% |
| Microsoft.Win32.Primitives.dll | 1536 | 17.65% |
| System.Collections.Specialized.dll | 1536 | 3.66% |
| System.ComponentModel.Annotations.dll | 1536 | 1.94% |
| System.Diagnostics.FileVersionInfo.dll | 1536 | 10.71% |
| System.Reflection.DispatchProxy.dll | 1536 | 5.36% |
| System.Resources.Writer.dll | 1536 | 7.69% |
| System.Runtime.Serialization.Formatters.dll | 1536 | 1.10% |
| System.Threading.Thread.dll | 1536 | 6.82% |
| System.ComponentModel.EventBasedAsync.dll | 1024 | 5.71% |
| System.Drawing.Primitives.dll | 1024 | 2.30% |
| System.IO.Compression.ZipFile.dll | 1024 | 6.90% |
| System.Linq.dll | 1024 | 0.69% |
| System.Linq.Queryable.dll | 1024 | 1.60% |
| System.Net.ServicePoint.dll | 1024 | 6.25% |
| System.ObjectModel.dll | 1024 | 2.63% |
| System.Runtime.InteropServices.dll | 1024 | 4.17% |
| System.Runtime.Numerics.dll | 1024 | 1.41% |
| System.Runtime.Serialization.Primitives.dll | 1024 | 7.69% |
| System.Security.Cryptography.Primitives.dll | 1024 | 2.41% |
| System.AppContext.dll | 512 | 8.33% |
| System.Collections.Immutable.dll | 512 | 0.26% |
| System.Diagnostics.DiagnosticSource.dll | 512 | 1.69% |
| System.Diagnostics.TextWriterTraceListener.dll | 512 | 4.35% |
| System.Globalization.Extensions.dll | 512 | 7.14% |
| System.Net.WebProxy.dll | 512 | 4.55% |
| System.Net.WebSockets.dll | 512 | 2.50% |
| System.Numerics.Vectors.dll | 512 | 0.32% |
| System.Reflection.Primitives.dll | 512 | 7.14% |
| System.Reflection.TypeExtensions.dll | 512 | 3.57% |
| System.Runtime.dll | 512 | 1.43% |
| System.Security.Cryptography.OpenSsl.dll | 512 | 6.25% |
| System.Xml.XmlSerializer.dll | 512 | 5.56% |
| System.Xml.XPath.XDocument.dll | 512 | 6.67% |
This is a dead easy issue for a new contributor to grab. Just open up one of the htm files, and go delete the sources marked in red.
An obvious thing that jumps out is that the generated SR
class almost always has several members that are not used, and it shows up in all (or at least most) of our assemblies. That could give us some easy global improvements if we figured out how to fix that.
I also see that private const
fields show up in some cases. They are used by that class's implementation, but are obviously not visible or used outside. It's probably not worth trying to "fix" those problems.
If the class implementation that uses the private const
fields is reachable via an entry point visible outside of the assembly the fields are not removed.
Presumably if any library has internals-visible-to (eg for unit tests) we would see false positives, right?
If assembly has InternalsVisibleTo, then all internals are considered roots as well.
OK. I think there are a few cases where tests use reflection to get internals. But those will be quickly discovered when we run tests.
@mellinoe
An obvious thing that jumps out is that the generated SR class almost always has several members that are not used, and it shows up in all (or at least most) of our assemblies. That could give us some easy global improvements if we figured out how to fix that.
I already made a pass in Feb deleting 100's of dead strings. There will be very few left. I think these are cases where the method that references the string is itself trimmed. In which case of course the resx needs to be trimmed.
@danmosemsft I'm referring to this block of members that is always emitted in the generated file:
C#
internal static string Format(string resourceFormat, object p1);
internal static string Format(string resourceFormat, object p1, object p2);
internal static string Format(string resourceFormat, object p1, object p2, object p3);
internal static string Format(string resourceFormat, params object[] args);
Oftentimes (at least from my brief skimming đ ), only the first is used, and occasionally the second. In some of the ones I clicked, this was the largest chunk of the diff.
OK. I think there are a few cases where tests use reflection to get internals. But those will be quickly discovered when we run tests.
I've already covered most of those: https://github.com/dotnet/corefx/pull/17825/commits/2277db969616a5ad4fd9f4a8118a3b4030ff62d3
We're already inner and outerloop clean.
The report for Dataflow says that types in the System.Threading.Tasks.Dataflow.Internal.Threading
namespace are dead. That's not true, they are dead in the default configuration, but there are other configurations that use them. See https://github.com/dotnet/corefx/pull/17912 for my proposed change.
Is it likely there are other types like that? Is there something that should be done about them?
This report is only for NETCoreApp on Windows. It's possible that another configuration still uses them and that would be caught by the -allConfigurations build if someone tried to remove them completely. For cases like this you could improve the NETCoreApp configuration by if-def'ing/splitting the source as I mentioned in item 2 above. This looks like what you're doing in dotnet/corefx#17912
If you're interested in identifying the diff for other configurations where we aren't running the linker you can turn it on using the details I listed in the background section. This will be easier once the PR is merged so you may just want to wait.
@erozenfeld We won't want to remove any consts simply because they're inlined by the compiler, because that loses code clarity. But some of the consts really are dead, and those we probably do want to remove.
For example in http://tempcoverage.blob.core.windows.net/report2/System.ComponentModel.TypeConverter.diff.html the whole of VSStandardCommands
is marked dead. Some are visible (via public class StandardCommands
), and just inlined. But some are truly dead eg cmdidThisWindow
It would be nice to be able to see which are in the latter category.
Right, we can't remove consts inlined by the compiler from the source. ILLink operates on msil and can't distinguish a truly dead constant from a constant that became dead since it has been inlined everywhere.
It also flags as dead any private default constructors, which we don't want to remove generally as they'r ethere to prevent any accidental parameterless construction. I assume that's unavoidable, for the same reason. They don't have any purpose after compile time as far as I can think.
Marking as 'easy' per discussion above. I plan to point a few first-time contributors to this issue.
If anyone picks a file, please say so on this issue, to avoid duplication of work.
@danmosemsft do you want to add your how-to you sent to vendors here as well? Maybe adding it to the top-post would be best, with all exceptions / things to avoid ...
@karelz done
I have taken a cursory look at the diffs. There is way too much noise in them - it was hard for me to find anything easy and actionable. Can we filter the noise from the diffs (consts, private parameterless constructors, ...) so that it has just the easy actionable items?
You can open up the CSVs and apply whatever filter you like.
It would be nice to fiddle with the master CSV to see what code in /src/common is actually dead.
@erozenfeld I may be missing something here. it says in http://tempcoverage.blob.core.windows.net/report2/System.Reflection.DispatchProxy.diff.html that EventAccessorInfo.InterfaceRaiseMethod
is dead, but it is set in the constructor, which is itself not dead.
https://github.com/dotnet/corefx/blob/master/src/System.Reflection.DispatchProxy/src/System/Reflection/DispatchProxyGenerator.cs#L919
removing up-for-grabs and easy, I don't think it is ready (feel free to change if you think I am wrong)
I don't see how it's not ready, the instructions are pretty clear... i think it is a good introduction of a sort. Anyway, vendor folk will start on it. I already did some and checked off many that are already clean.
Something else it incorrectly marks dead: strings that are loaded at runtime via SR.GetResourceString("..."). Unfortunately there are still some eg return SR.GetResourceString("PropertyCategory" + value, null);
That has to get caught at code review.
From skimming the thread it looked like there is concern of noise in the results. Again, I don't feel strongly - feel free to flip it back ;-)
It's fine :)
In System.Transactions.Local, there is code that is currently not accessed, but some amount of it will eventually be accessed once we do the work for supporting distributed transactions. I am reticent to simply remove this code.
I've already removed some of the dead code identified here in Microsoft.CSharp and some more (the analysis didn't spot that e.g. ExpressionBinder.BindPtrToArray
is dead as per dotnet/corefx#17948) and the rest is in my sights as part of my ongoing refactoring of that assembly.
@danmosemsft Regarding EventAccessorInfo.InterfaceRaiseMethod: it's saying that get_InterfaceRaiseMethod is dead and since there is no set_InterfaceRaiseMethod, the property is dead. The code in the constructor sets the compiler-generated backing field
@jimcarley that makes sense, I checked of S.Transactions.Local above as done. I guess if there are odds and ends that actually are dead, you might want to remove them at some point.
@JonHanna if you believe that ILLink should have been capable of spotting that code was dead, I would let @erozenfeld know.
@danmosemsft I'd be extremely impressed if it could, I just meant that the bits mentioned here are already a subset of removals I intend to make, so that item in the list is going to be covered.
@danmosemsft @JonHanna ILLink doesn't attempt to find unreachable code within methods. If a method is reachable it assumes that all il instructions within the method are reachable.
For the benefit of others: another case came up offline, where an int field was getting assigned by a constructor that was flagged as dead. That field was getting read elsewhere however, and once that constructor was removed, the compiler warned that the field was never getting set. The reader was reading the default value. In such a case then the field needs to be explicitly initialized when removing the dead setter.
@erozenfeld this code appears dead: https://github.com/dotnet/corefx/compare/master...danmosemsft:dead.xml?expand=1
but was not flagged as dead. It is in the IL (fields in struct). does ILLink specifically avoid removing fields from structs in case eg they are used in interop?
In this particular case, the fields are initialize in the implicit static constructor so they do appear to be used in the IL. To detect them as dead code, ILLink would have to:
Ah - of course.
I noticed these as they were in https://github.com/dotnet/corefx/pull/18395/files .. other finds in there are various constants that are dead (which of course ILLink can't tell as it shows all constants as dead) and some things like initializations that only a human could tell have no side effects. N othing that ILLink should have found.
@huanwu the SGEN tool has 400KB of dead code.
Is all this reasonable to remove or is it a work in progress, you're going to use this code?
See http://tempcoverage.blob.core.windows.net/report2/Microsoft.XmlSerializer.Generator.diff.html
In System.Linq and System.Linq.Queryable the only code flagged is either used constants or methods in common files where other methods of the same class are used in that project, so they require no further work.
I've just identified some dead code using a tool I've written - see https://github.com/dotnet/corefx/pull/18395. I'm using your code as input data to validate my code.
In the past I've done a massive delete on Roslyn (18.5 kloc) that has been good at identifying code which shouldn't be dead - see the comments on https://github.com/dotnet/roslyn/pull/17630. i.e. Only used in the debugger, due to a regression, incomplete implementation of tests, should be conditionally compiled.
@JonHanna thanks! Top post updated with your info.
dotnet/corefx#18414 removed most of the relevant dead code in S.Linq.Expressions. Aside from constants etc. there's still some that might become relevant if currently unsupported features are ported (including one with an issue requesting just that) so it would probably be best not to do further now.
Sounds good, it's checked off in the list. Most of the rest only need minimal cleanup. SGEN is the major exception (@huanwu please see question above)
@zhenlan question above about Microsoft.XmlSerializer.Generator.dll -- it's got 400KB of dead code (!) ... is this code temporarily dead, or are you done with this work and someone can delete it? I can ask the vendors to do much of the work, most likely. Just let me know.
@danmosemsft These are not dead code. Most of code are shared with System.Private.Xml. If you check the project file https://github.com/dotnet/corefx/blob/master/src/Microsoft.XmlSerializer.Generator/src/Microsoft.XmlSerializer.Generator.csproj, you can find most of code files are located under $(SourceDir)SystemXmlSerialization
@huanwu even if we can't delete the code from the repo (because it's used by others) if the code is not used within this assembly we ideally would not compile it into the assembly.
ReflectionAwareCodeGen
XmlSerializationWriterCodeGen
XmlSerializationCodeGen
all seem dead classes. Ideally they could be moved into their own code files so they're only compiled in where they're needed.
ReflectionAwareCodeGen is actually within #if XMLSERIALIZERGENERATOR
so maybe it is totally dead, if it's dead in this assembly.
Also the great majority of the strings are unused in this binary.
If this code is dead in this assembly that would save 400KB so it's well worth doing.
Hi @danmosemsft, I may not fully understand how the dead code is determined, but something feels off.
For example, the diff file Microsoft.XmlSerializer.Generator marked ReflectionAwareCodeGen
as red, which I interpreted as dead code from the tools point of view. However, it is clearly used in Microsoft.XmlSerializer.Generator
here. Can you please help to clarify?
OK, I see the tool actually marked the whole chain of *CodeGen as dead code. Please ignore my previous question and let me explain. The Microsoft.XmlSerializer.Generator project just had its initial check-in. Some code is only dead temporarily. They will be used when we have more work done. Maybe we can run the tool again when we get closer to code complete?
@zhenlan that's exactly what I assumed. Code complete was last week actually :) Perhaps this is a DCR. Is there much more to do?
@danmosemsft yes what you assumed is right :)
Microsoft.XmlSerializer.Generator (Sgen) is a CLI tool that does not ship as part of .NET Core. For the part of code Sgen shares with S.Private.Xml, I don't think there are any big gaps except bug fixes and we are targeting 5/10 for ZBB. For the part of the code that is solely under Sgen, there is more feature work to do but we should have more leeway due to different release vehicles. Hope this clarifies.
@erozenfeld much of the remaining deletable code is likely parts of common shared files that nobody uses. Without analyzing each by hand or doing some tiresome joining of the CSV it's not possible to clean that code up. Is there some straightforward way to run ILLink over the whole set of assemblies, coalescing files that are used in multiple assemblies, so we can get one of these HTML reports showing what's dead in the common code?
We would also want to merge in the test assemblies as some tests compile in the product code (eg PathInternal.cs). Also the Unix ones because of course they use common code.
@danmosemsft My tool can do that. i.e Run it on the whole code base, automatically deleting lines including false positives, build and revert repeatedly then test. Finally push it up to my Github account and create a pull request. Probably quite a bit of work involved though so not to be attempted unless it'll be used. See https://github.com/dotnet/corefx/pull/18395 for results on a subset
@danmosemsft
Is there some straightforward way to run ILLink over the whole set of assemblies, coalescing files that are used in multiple assemblies, so we can get one of these HTML reports showing what's dead in the common code?
No, there isn't a straightforward way to do that. ILLink operates on assemblies, not source files used to generate them.
@erozenfeld I don't think that was what I meant. I don't expect ILLink to operate over sources. But right now any shared sources that it shows as dead in binary X can't be easily deleted because they may be used by binary Y or Z. If notionally all our libraries were compiiled into in one big dll like ILMerge or suchlike had run, with pdb's fixed appropriately, then we could delete any common code that showed up as dead. My question was whether ILMerge could merge the assemblies into one, then run closure analysis over that one.
@danmosemsft No, ILLink can't merge assemblies into one like ILMerge does. I don't know how ILMerge handles clashing names across assemblies. If you have class N.C in assembly A and class N.C in assembly B, in general you need to keep both (and rename one or both) when merging unless you can prove that they are identical in a deep sense.
Gotcha. Well maybe an approach will emerge in the future. It's not a big deal.
It also occurs to me that it could be done with some processing of the CSVs...
All this data is likely out of date now. We should probably rerun the reports if you want to do more work on this. You can find instructions on doing that in the background section above.
In System.Collections.Immutable and System.Collections.NonGeneric the only code flagged is either used constants or methods in common files where other methods of the same class are used in that project, so they require no further work.
Thanks @YoupHulsebos, top post updated.
In System.IO.FileSystem.Watcher and System.Security.Cryptography.Primitives the only code flagged is either used constants or methods in common files where other methods of the same class are used in that project, so they require no further work.
@yaelkeemink thanks updated checkboxes.
Hello,
As a first time contributor, I would like to tackle the System.Net.Mail
library if that is okay !
Sure, thanks! Go ahead. If you hit any road blocks ping us here on file new issues (and tag me for routing).
I have created a pull request for System.Net.Mail and signed the agreement.
Thanks @Ermiar! Top-issue updated with link to the PR.
First-timer here - I will take the following:
System.IO.FileSystem.DriveInfo
System.IO.FileSystem.Watcher
đ
Awesome, welcome onboard @garfbradaz!
Let us know when you submit PR, we will update the top post (@ViktorHofer can you please help today?)
Sure.
@garfbradaz please make sure after removing possible dead code to run all test in both projects for Desktop and Core.
You can do this by first running:
build.cmd
and build.cmd -framework:netfx
from corefx root and after that running the individual test projects like this:
msbuild /t:RebuildAndTest
and msbuild /t:RebuildAndTest /p:TargetGroup=netfx
.
Don't hesitate to contact me if you need help. Thanks for your help!
Thanks @ViktorHofer / @karelz - slowly ploughing through
There's been a false positive: dotnet/corefx#19826 is implemented with code removed by commit 81506698 on this issue. Although the method is 'dead' for now, it's essential to be there to bring back DbProviderFactories back from .NET full. So it will be resurrected if dotnet/corefx#19826's PR is merged.
First time contributor, I'd like to take a swing at System.Linq.Expressions.
Great @mccbraxton! If you need any help, ping me.
@mccbraxton It seems System.Linq.Expressions is already cleaned up. Do you wanna take another piece?
@ViktorHofer You're right - must have read wrong. How about System.Security.Cryptography.Algorithms?
Looks good! I added you in the table. Please make sure to not delete code which is used by any target. Thanks for your help!
dotnet/corefx#18162 took care of what remained in the report for Microsoft.CSharp that was truly dead.
Hello new contrib here , I can take System.ComponentModel.*
Thanks @norek go for it.
Hi below summary of System.ComponentModel.
I was reviewing list and I found dead code in System.Diagnostics - StackTraceSymbols
is marked as dead but name of this class in solution exlorer is StackTraceSymbols.CoreCLR.cs
. I found in closed issues: dotnet/corefx#19368
and its dead or not?
@mikem8361 do you own S.D.StackTraceSymbols?
I do own it and it isn't dead code. Reflection is used by System.Private.CoreLib to load this assembly and reference this class.
So coreclr can print source/line number info on unhandled exception stack traces.
@mikem8361 thanks for the information. It looks like it is protected by illinktrim.xml:
<linker>
<assembly fullname="System.Diagnostics.StackTrace">
<!-- used by System.Private.CoreLib StackTrace code to load portable pdbs for the runtime diagnostic stack trace to get source/line info -->
<type fullname="System.Diagnostics.StackTraceSymbols" required="true" />
</assembly>
</linker>
@ericstj @erozenfeld can we fix the CSV that ILLink emits so it doesn't show code as dead when we've forced it to retain the code?
The CSVs that we have are merely just the diff after trimming, so they don't have stuff explicitly rooted. The data linked in this issue isn't live. It's from a one-off run when I opened the issue. I suspect if you looked at the live info it'd no longer show as a diff.
Hello, I'm a first time contributor. I'd like to try System.Net.Http and System.Net.HttpListener.
@soapdogg go for it..
@AlexGhiondea as mentioned
Hello. Can I take Microsoft.VisulaBasic
?
@satano for sure, go ahead. It looks like there's only a few bits to remove (all the const fields and parameterless ctor need ot stay, per my writeup above). Perhaps you could do CSharp as well?
There are also some libraries that are new since the above analysis was done...
OK, I will do CSharp and VisualBasic.
PR for Microsoft.VisualBasic
is ready.
Regarding Microsoft.CSharp
- everything red in diff file is already removed. So just mark it in the issue.
Are we going to leave this issue to first-timers, or can I continue?
@satano can you please find the change that removed the red diff?
It is ok to continue, we don't save it only for first-timers.
I do not know if it was all, but something was removed here: https://github.com/dotnet/corefx/commit/3eb339702e2fcdf924b50c2e32d7e9e02395e52f
@JonHanna even said it here https://github.com/dotnet/corefx/issues/17905#issuecomment-291924301
I will continue from the top. So next I will take System.IO.Compression
, System.IO.FileSystem.AccessControl
, and System.IO.FileSystem
.
@satano go for it -- no need to ask permission, if someone else was looking, they would post here.
One interesting "meta problem" for someone is whether there is a way to use the same linker tool to find common code that nobody is using. If a class pulled from srccommon is only partially used by a library, it will show up dead in this analysis, but it cna't be removed because another library might be pulling it in and using it. What I was thinking is if there's a way to merge the assemblies, then run the linker on them. @erozenfeld could we use the linker to merge, then find dead code on the result?
ILLink currently can't merge assemblies. This is something we are considering adding in the future. There are other tools that can do that: https://github.com/Microsoft/ILMerge and https://github.com/gluck/il-repack although I'm not sure if they handle this scenario the way we want. They may or may not realize that the types/methods/fields with clashing names represent identical entities.
I am continueing with System.Net.Http
, System.Net.HttpListener
and System.Net.NetworkInformation
.
I will take whole System.Net.*
one by one.
Continuing with the rest: System.Private.*
, System.Runtime.*
and System.Security.*
.
@karelz
⊠can you please find the change that removed the red diff?
dotnet/corefx#18162
Well... Probably it's time to review this issue and maybe close it?
Dead code from this projects was removed and merged:
I looked also at the rest of the projects and there is nothing to do in them:
So the only project which remains is Microsoft.XmlSerializer.Generator
. It has the most of the dead code, but as mentioned in some comments here, there is a work in progress there - as I understood it.
Is this issue completed? If not could I take System.Console
?
@Iroca88 sure. Note the dumps linked above may be a little out of date now.
Thanks @danmosemsft, any suggestion on getting new dumps? Otherwise I'll start with the dumps provided above.
BTW: is @lroca88, I capitalized the first letter (L) in order to avoid this type of confusion :)
@Lroca88 the top post explains how to access the trimmed and pretrimmed assemblies. To diff them we used a tool that doesn't seem to be public. You could use a decompiler like Ilspy On both and do a text diff on the results. That would probably work.
I'll run a recent diff and share it. Stay tuned.
Here's the latest:
Cool, I was struggling with the old list! đš
@Lroca88 be sure to read the top post hints about how to do this pass, save some confusion..
Oops. I missed two newly orphaned bits in Microsoft.CSharp in dotnet/corefx#26491. I'll get that soon.
From the new list, System.Runtime, System.Web.HttpUtility, System.Linq and System.Linq.Queryable only have constants that aren't really dead and some shared code. System.Linq.Expressions has those and a few items used in debug views and resources used with particular compiler constants and so should be removed, so those five can all be checked off.
I've merged the libraries reviewed by @JonHanna and and those that I reviewed into a list. I'll keep updating the list as my time allows, if someone wants to add items to this list, feel free to ping me!
Library | Reviewed By | Status
-- | -- | --
Microsoft.CSharp (csv) | @JonHanna | dotnet/corefx#27104
Microsoft.VisualBasic (csv) | @Lroca88 | Nothing to remove
Microsoft.Win32.Primitives (csv) | @Lroca88 | Nothing to remove
Microsoft.Win32.Registry (csv) | @Lroca88 | Nothing to remove
System.Collections.Concurrent (csv) | @Lroca88 | Nothing to remove
System.Collections (csv) | @Lroca88 | Nothing to remove
System.Runtime (csv) | @JonHanna | Nothing to remove
System.Web.HttpUtility (csv) | @JonHanna | Nothing to remove
System.Linq (csv) | @JonHanna | Nothing to remove
System.Linq.Queryable (csv) | @JonHanna | Nothing to remove
Thanks @Lroca88 !
It doesn't surprise me that there isn't a lot of dead code left. However as noted at the top, this process cannot determine whether code in srccommon is dead (as it may show up dead in one assembly but not another). If you are interested to find a way to solve that you would likely find more dead code. Eg., maybe it is possible to ILMerge the assemblies and then run the dead code analysis on them. Or, it may be easier to post process the CSV's to find common code that is dead in all of them. @ericstj did you have ideas?
My pleasure to help @danmosemsft,
I agree with you, about improving this process, from my reviews the most flagged code is not dead, is either consts or often used in another assembly as you previously said. Looking at those false-positives makes the hunt a little tedious in my opinion.
Unfortunately I don't know how to perform the ILMerge and the dead code analysis or post process the CSV's. I'm willing to learn/collaborate though, if you guys want to spend some time tutoring me :)
Where are we with this? Is the OP being kept up to date with done classes or should we be going off this comment now?
@MisinformedDNA -- use https://github.com/dotnet/corefx/issues/17905#issuecomment-365349091 please
I don't know how to perform the ILMerge and the dead code analysis or post process the CSV's. I'm willing to learn/collaborate though, if you guys want to spend some time tutoring me :)
I don't know either and would have to think/experiment. Right now I'm fully engaged closing down 2.1 release so this will have to wait a bit đș
I wrote a script to load all the CSVs and then filtered down to just the most actionable ones. I came up with 3,942 issues/opportunities.
I now intend to take a look at each of the issues and take appropriate actions. You can follow my progress and leave feedback, if you'd like, at https://github.com/MisinformedDNA/corefx/tree/clean-dead-code
Nice, looking forward to seeing this @MisinformedDNA.
I found an issue with the dead code analysis, that either shows a problem or deficiency with the tool.
The report says that the following can be removed:
Friend Enum vbErrors
ObjNotSet = 91
IllegalFor = 92
End Enum
But these Enums are referenced elsewhere, so let's look at one possible branch:
' Enum cannot be accessed, so Function cannot be hit
' But the dead code analysis did not show this as removable!!
Friend Shared Function VbMakeIllegalForException() As System.Exception
Return VbMakeExceptionEx(vbErrors.IllegalFor, GetResourceString(SR.ID92))
End Function
' The chain continues:
' VbMakeIllegalForException() cannot be hit, so first "If" cannot be true, but the rest of the Function *could* still be hit
Public Shared Function ForNextCheckObj(ByVal counter As Object, ByVal loopObj As Object, ByRef counterResult As Object) As Boolean
Dim loopFor As ForLoopControl
If loopObj Is Nothing Then
Throw VbMakeIllegalForException()
End If
If counter Is Nothing Then
Throw New NullReferenceException(GetResourceString(SR.Argument_InvalidNullValue1, "Counter"))
End If
loopFor = CType(loopObj, ForLoopControl)
... cut for brevity
End Function
Logic has taken me to the place where the first "If" statement is never true. But that "If" statement is a null
check, so we'd have to assume that someone would pass in null
and that there is a missing case for determining dead code. And if there is an issue (or more) with determining dead code, then one or more of the Enums may not be dead code after all.
tldr: Summary
Question 1: Why is VbMakeIllegalForException not marked for removal, even though it couldn't possibly call the enum which was marked for removal.
Question 2: Why are the test cases in determining dead code, not passing in null values for all applicable parameters?
@ericstj
The enum values were probably inlined in msil so the tool doesn't see them. The same is true for constants. Nothing should be done for source code in these cases.
I just got done removing dead code for System.Data.Common
with build.cmd src\System.Data.Common
completing successfully. Now up until now, I was under the impression that I only needed to work in the library that I was in and if the build and tests passed, then I was good to go. But I decided to do a full build anyway, and I'm glad I did, cause I got a slew of errors.
Here are some example errors:
SystemDataSqlClientSqlCommandBuilder.cs(277,17): error CS0117: 'ADP' does not contain a definition for 'RemoveStringQuotes' [D:ReposcorefxsrcSystem.Data.SqlClientsrcSystem.Data.SqlClient.csproj]
SystemDataProviderBaseDbConnectionPool.cs(984,33): error CS0117: 'ADP' does not contain a definition for 'SetCurrentTransaction' [D:ReposcorefxsrcSystem.Data.SqlClientsrcSystem.Data.SqlClient.csproj]
D:ReposcorefxsrcCommonsrcSystemDataProviderBaseDbMetaDataFactory.cs(409,21): error CS0117: 'ADP' does not contain a definition for 'IsEmptyArray' [D:ReposcorefxsrcSystem.Data.SqlClientsrcSystem.Data.SqlClient.csproj]
SystemDataCommonDbConnectionStringCommon.cs(180,27): error CS0117: 'ADP' does not contain a definition for 'InvalidConnectionOptionValue' [D:ReposcorefxsrcSystem.Data.SqlClientsrcSystem.Data.SqlClient.csproj]
The missing members were all clearly specified to be deleted in System.Data.Common.diff.html, but not only do they cause errors in child classes, but there are no diff files for these child classes or there associated assemblies like System.Data.SqlClient
.
The first error shown above is linked to a public method, SqlCommandBuilder.UnquoteIdentifier
. So I don't see how ADP.RemoveStringQuotes
can be removed from System.Data.Common
without a breaking change in other classes.
How should I handle these? Any ideas why they are being marked for removal?
@anipik can you help @misinformeddna here?
@MisinformedDNA You will always have to build the whole repo and run the tests for all the libraries because its possible that one library could be dependent on other librarary directly(A using some type from B) or indirectly. (A using some type from B which using some type from C)
There is nothing in the function ADP.removeStringQuotes
, isemptyarray
, setTransaction
that is just specific to ApaterUtil or System.Data.Common so we can just move it to sqlCommandBuilder.cs
as a private static method. They are just manipulating the arguments. If you could point me to to the branch having this failure, i can help you with other failures.
All the methods that you mentioned above are internal static methods
which are no longer getting used in the assembly they are defined.
Internal members are generally not meant to be used outside of the assembly they are defined. but in some case we do that by using ``Internals Visible To``` attribute.
@danmosemsft is moving these internal functions to the assemblies where they are used optimal ?
Here's the fork I'm working on: https://github.com/MisinformedDNA/corefx/tree/clean-dead-code
As previously stated, the dead code analysis says ADP.RemoveStringQuotes
for System.Data.Common
could be removed. However, since ADP
was marked internal
, the analyzer may have assumed that it wasn't being used outside of the assembly. I'm not sure if it looks for the InternalsVisibleTo
attribute, but it wouldn't have mattered if it did, because it's not being used here. Instead, it is being shared via linked files.
<Compile Include="$(CommonPath)\System\Data\Common\AdapterUtil.cs">
<Link>System\Data\Common\AdapterUtil.cs</Link>
</Compile>
I assume the analyzer is seeing the different ADP
classes as separate, instead of shared, types, since they essentially are being duplicated. So while removing RemoveStringQuotes
is the correct thing to do for System.Data.Common
, it's the incorrect thing to do when considering other projects to which it is linked.
Unless the analyzer can determine that all those classes are coming from the same file, we will perpetually think we have dead code in some of these files.
Unless, someone has a better suggestion, I'll start adding these methods back to the code.
@MisinformedDNA i talked abouth this with @danmosemsft offline . Another possible approach is to make ADP class partial. And then you can move this functions to these libraries.
I filed my first PR, but 2 of the builds failed. Both build failures seem to indicate a timeout. What action should I take to resolve this?
I will look at the System.Private.Uri
Here's an app(s) I wrote to download all the files, combine them and then filter out all those that are likely false positives. Hopefully, it will be helpful to others.
@ericstj does it make sense to regenerate the data so that we have a clearer picture where we are with this?
Started on the following namespaces:
I've regenerated the data. Here it is as a new list, I didn't want to overwrite what was there in case folks find it relevant.
Thanks Eric!
Checked the following:
Cleaning up dead code for:
System.Diagnostics.DiagnosticSource
System.Diagnostics.FileVersionInfo
System.Diagnostics.Process
System.Diagnostics.StackTrace
System.Diagnostics.TextWriterTraceListener
System.Diagnostics.Tools
System.Diagnostics.TraceSource
System.Diagnostics.Tracing
System.Drawing.Primitives
dotnet/corefx#33095
I'd like to take on cleaning System.IO.Compression as my first contribution.
In start post System.Console
marked as done, but in regenerated data it's not marked. I found that it's only some SR
's strings are needed to be cleared.
So, @ericstj, is it kind of mistake or I can take a look on other libraries and fix such small dead codes?
I created a small tool which remove some useless stuff from this data.
This is a very common one. Anything with âconstâ, please ignore. These are all fields inlined by the compiler, we want to keep them in the sources. Typically ints and strings.
SR
's strings which can be removed. But SR
's methods aren't removed too, be aware.Anything that is âstatic stringâ in âclass SRâ is special. Instead of editing a C# file you must find the matching entry in the .resx file and remove that. The .resx file for a library is in src
srcResourcesstrings.resx. The build will regenerate SR after you do this. Ignore anything else in âclass SRâ -- there are several methods that may show up as dead.
Ignore any private parameterless constructors. We have these in the code to prevent accidentally instantiating the class, we want to keep them in the code.
Ignore any public parameterless constructors if there are no other constructors. These are not real.
You can find cleared files here. Mention me after updating data and I will update that.
@ericstj is it safety to remove internal enum
?
After using that tool I've checked all assemblies. Some of them doesn't have any dead code.
Updated list looks something like below. I change links to htmlpreview for cleared files from tool's repository. Hope nobody won't mind.
So, @ericstj, is it kind of mistake or I can take a look on other libraries and fix such small dead codes?
When regenerating the diffs I started with everything cleared again. The mechanism for doing the dead-code-diffs is too manual. It's using an internal tool for the assembly diffs.
@ericstj is it safety to remove internal enum?
It depends on what is done with that internal enum. Enums will often show up as dead in the binary since the compiler turns their source reference into integer literals. You'll likely find that the source may still need it.
Just some thoughts on how we can make this dead code scrubbing a better experience that's more steady-state moving forward.
Today the mechanism we use for tree-shaking dead-code is the linker which operates on the binaries. So to map that back to source we diff its output with input and emit the ids. Another thing I've discussed with the linker folks in the past is to emit more logging that could more directly map back to things our developers care about. It's also operating on PDBs so it technically has all the source code and line information.
@ericstj is it technically possible to run on corelib? I noticed a dead method in corelib, and no doubt there are more. Certainly it would need care given all the callbacks from the VM
@danmosemsft It's already running on System.Private.Corelib.dll as part of coreclr build. What's the dead method you noticed that's not getting removed?
@ericstj I guess we can close this?
I think this is done for now. Many thanks to all who contributed to make our codebase better!