Currently ilasm /DEBUG* is ignored on CoreCLR.
WPF recently went down a path of trying to do some IL modification and we just ran into this issue. There are other avenues for us to explore, but this was blocking the path of least resistance.
It seems like there are no plans to support /DEBUG in 3.0, is this accurate?
@vatsan-madhavan
@davidwr @jkotas @briansul @janvorli - Do we have anyone on the team who is an official owner for ilasm?
I'm trying to find out if there would be any pushback adding a portable PDB writer to ilasm. An initial thought would be hosting CoreCLR so that we could invoke the writer that is in System.Reflection.Metadata.
Adding others who may be interested in this: @MichalStrehovsky @tmat @AaronRobinsonMSFT
@noahfalk From my conversations with @RussKeldorph there is no official owner of ILAsm.
I think this is a reasonable request for support. My only concern is the additional assets that would be needed to make this "just work". If I recall, ILAsm is now xcopy-able and as mentioned this work would fail gracefully if additional assets weren't present (i.e. missing Runtime). That however doesn't fully capture what would be required to make this scenario actually reliable. At a minimum the following would be needed to have ILAsm create PDBs:
1) Targeted framework installed on the system.
1) nethost.dll which will find the installed Runtime to load [Aside to @noahfalk we have an import lib, not a static lib - sorry for the confusion].
1) A *.runtimeconfig.json to indicate framework requirements.
Based on the above, we now have 3 artifacts that must be present _and_ a system dependency. This is going to necessitate an update to deployment instructions and to nuget packages, not to mention the alluded to error message that indicates ILAsm isn't going to be able to perform PDB creation.
Not saying this is impossible or that we shouldn't do this work since it would be beneficial for me as well. I just want to make sure the above costs and expectations are known.
Correction: @dotnet/jit-contrib officially owns ILAsm (and ILDasm).
dotnet/coreclr#23840 and dotnet/coreclr#25144 might be tangentially related to requirements in case you weren't aware of them.
/cc @BruceForstall
@RussKeldorph @AaronRobinsonMSFT - thanks for the info and feedback : )
This is going to necessitate an update to deployment instructions
Does anyone know where instructions can be found? If they exist they weren't easily discoverable at least : )
… and to nuget packages
I'm not certain what expectations people have on that package, but tentatively I could see at least a few options:
a) Add nothing to the NuGet packages. For use cases that need PDBs to be emitted the onus lies with the package user to compose the artifacts together.
b) Add all the artifacts except the framework to the existing NuGet package. This clearly makes the package a bit larger, but hopefully isn't overly burdensome on the use cases that didn't care about PDBs. Onus remains on the package user to furnish a runtime.
c) For completeness, add all artifacts + the framework either in the package or as NuGet dependencies. I'm assuming this option is not desirable because of the greatly increased size and previous efforts to expressly avoid a runtime dependency.
I wonder if it wouldn't be better to invert the approach. Make ILasm.exe a managed, self-contained .NET Core app and put all the C++ code into a native dll that gets loaded by the ILasm.exe app.
Agree with @tmat, but let's take it a step further and use CoreRT and link-in the native bits so you still have a single exe.
@tmat - @MichalStrehovsky and @jaredpar appeared to have strong feelings that ilasm should not have a full .Net Core dependency and there has been work to explicitly remove those dependencies. I'm sure they can represent their positions better than I can, but it appeared to be a combination of wanting single-file xcopy deploy, wanting a small size on disk, and difficulties dealing with .Net Core versioning. I don't have a horse in this particular race, if you can convince them its fine by me : )
@msabby - I'm not up-to-date with what CoreRT can do these days, but initially I'd be wary for a few reasons:
a) Platform reach - its not clear that CoreRT has the same breadth at the same quality bar as .Net Core.
b) Supportability - Last I knew CoreRT doesn't have any support for debugging and it doesn't seem reasonable to ship code we won't be able to debug
IIRC the issues we had with .NET Core dependencies would not be present with self-contained .NET Core App.
I guess it could be a good use case for /p:PublishSingleFile=true then.
I think not having PDBs is a pain for everyone who uses ILAsm right now. If we add support for it, everyone will want to use it. It would be sad if we make it an option that inflicts setup pain whenever someone wants to use it.
While I'm one of the biggest proponents of not writing C++ for things that can/should be written in C#, CoreCLR has a really bad startup time. It's not suitable for a tool like ILAsm that gets potentially invoked thousands of times. The startup delay would add up to minutes in places like the CoreCLR test tree. If we rely on CoreCLR for this, we'll end up having to design an ILAsm compiler server (like Roslyn did, to solve this exact problem). The Windows build lab compiles their managed build tools with .NET Native and they shave off minutes of CPU time per build by doing that.
After dotnet/coreclr#25144, ILAsm carries its own copy of IMetaDataEmit. We could extend that to generate the portable PDB structures. Portable PDB is based upon the ECMA-335 format which is exactly what IMetaDataEmit generates - it should have the necessary facilities.
I do not think that the CoreCLR startup would be a problem for ilasm. Note that ilasm has been paying for CoreCLR startup until the very recent David's change. Roslyn is running a few orders of magnitude more code than ilasm would ever run (even if it was completely rewritten in C#).
However, I agree that writing the portable PDB writer in C++ sounds much easier than dealing with all the managed/unmanaged interop. (Rewriting ilasm/ildasm in C# would avoid the interop as well, but that is more work....)
Note that ilasm has been paying for CoreCLR startup until the very recent David's change
Did CoreCLR have to do the full startup dance when MetaDataGetDispenser was called? I would expect MetaDataGetDispenser to be pretty lightweight (metadata APIs sit on a pretty low level within the runtime). MetaDataGetDispenser is the only thing ILAsm used - I would expect the startup cost of the VM to be orders of magnitude higher than that.
ilasm/ildasm used to call ‘coreclr_initialize’: https://github.com/dotnet/coreclr/blob/release/3.0/src/ildasm/unixcoreclrloader/coreclrloader.cpp#L53 . You are right that initializing whole VM is not necessary for unmanaged metadata reader/writer. I was just pointing out that it is not that expensive to drive the design decision.
... writing the portable PDB writer in C++ ...
This would be wonderful for the larger community and solve at least one big issue facing an internal partner team.
... writing the portable PDB writer in C++ …
Interesting, I hadn't initially considered this but especially for the limited subset of pdb data that the IL assembler currently writes it does seem reasonable. Thanks for raising this suggestion! To add some additional context this issue started because a customer was asking how they could unblock themselves. For the path where we did hosting and reused the existing C# writer I had some small samples showing code similar to what they would need to write. I am a little concerned that a new C++ writer will feel like a more daunting task. I'd guess the amount of code needed will be less, but there will be more effort involved for someone with minimal knowledge of the format or any of the pre-existing code to learn what needs to be written. I'll do some investigation to evaluate better what that might look like, but if anyone has suggestions of particular patterns/examples we should look at do let me know. Thanks!
I wrote a Portable PDB reader in native a while ago. I pushed it up to my github (https://github.com/AaronRobinsonMSFT/PPDB). A writer would be significantly more work, but it could serve as a start to help someone understand the format.
After getting another opportunity to chat with our potential contributor on email my concerns were unfounded, they sounded happy with the approach of writing a portable PDB writer in C++. Does anyone have any objections to this plan?
@dotnet/jit-contrib - since Russ says you guys own ilasm could someone confirm this plan has your thumbs up?
@MichalStrehovsky - Above you were suggesting that we might use the implementation of IMetaDataEmit as a starting point for the portable PDB writer? Do you have a recommendation for what elements of the metadata implementation would be worth re-using and how we'd tie the overall solution together? I'll admit my first reaction was that the metadata codebase is a complex layering of many abstractions and caters to numerous other goals aside from emitting files so it may be more challenging than necessary. For example compared to something like @AaronRobinsonMSFT 's C++ portable PDB reader Aaron's code feels far easier to quickly understand and modify. Nonetheless I don't want to succumb to NIH syndrome and my first impressions may not be very accurate. I'd very much appreciate getting your thoughts on it. Thanks!
I'll admit my first reaction was that the metadata codebase is a complex layering of many abstractions and caters to numerous other goals aside from emitting files so it may be more challenging than necessary
I think it will have to be tied into the metadata emitter if we want to support embedded portable PDBs. Otherwise it can be just duplicated on the side. Duplication tends to be a better implementation strategy in a legacy codebase, as you point out (I don't know if anyone who's still on the team _really_ did any work in /src/md besides Karel and Mei-Chin many years go). The writer has many oddities to support EnC, metadata merging used by link.exe, and stuff like that. Avoiding duplication is better for long-term maintainability though.
Portable PDB format basically just defines a set of extra tables on top of the existing type system tables in the ECMA-335 file format. The IMetadataEmit interface is basically a set of methods that add entries into the individual tables ("add new type", "add new method", "add new field", etc.). AFAIK the result of the Save() method __is__ what a standalone Portable PDB is (a "naked" metadata stream).
ILAsm already uses IMetadataEmit to generate the metadata. It then wraps the "naked" metadata stream in a PE file with an ICeeFileGen.
I would look into just extending IMetadataEmit (adding IMetadataEmit3?) with methods that add entries into the new tables ("add new document", "add new method debug information", etc.).
ILAsm can then use it to emit the debug info: for the embedded PDB case, we would just add debug table entries into the existing dispenser and the saving of that would naturally fall out. For the sidecar PDB case, ILAsm would grab a vanilla dispenser, use it, and save the naked stream by calling Save() on the dispenser.
for the embedded PDB case, we would just add debug table entries into the existing dispenser and the saving of that would naturally fall out
As far as I know the embedded PDB is discrete from any metadata the IL assembly happens to have (ie the resulting PE file has two metadata headers inside somewhere). I was assuming that we'd need a second instance of an emit interface if we went this route, one for the standard assembly metadata and a 2nd one for the portable PDB metadata. Is that what you were suggesting and I just misunderstood, or were you envisioning a scheme where a single emitter accumulates data for all tables and it understands which tables go into which blob? Thanks!
I spoke with @noahfalk offline, and I'm ok with whichever native code plan we end up with, as the trade-offs get worked out.
As far as I know the embedded PDB is discrete from any metadata the IL assembly happens to have
I only know what's in the spec. I assumed that since it says "metadata can (but doesn’t need to) be stored in the same metadata section of the PE/COFF file as the type system metadata", and the spec doesn't specify how we would find it in a PE file, it would be in the same streams as the type system tables. Looking at things in a hex editor, it doesn't appear to be the case (the contents of the #~ stream where the tables are stored is identical between -debug:portable and -debug:embedded). I can't find the other #~ stream (there's only one occurrence of this literal in the file).
I guess I'll defer to @tmat on that.
Portable PDB metadata tables are designed so that they can be stored in the same metadata stream as type system metadata. But it turned out that it's better to store them separately, so Embedded PDBs don't use that capability. Instead, Embedded PDBs are stored compressed in debug directory. https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PE-COFF.md#embedded-portable-pdb-debug-directory-entry-type-17
But it turned out that it's better to store them separately
If we don't anticipate they'll be stored into the type system tables in the future, maybe creating IMetaDataEmit3 on the existing emitter is not needed and this can be in a new class that looks a lot like the existing metadata emitter. I guess it can be a judgement call. I don't have strong opinions.
Ilasm can now generate PDB in portable PDB format #37702
This feature is available through command line arguments: -debug -pdbfmt=portable
@ivanbasov awesome work! would it make sense to use -debug:portable instead of a separate argument to be consistent with csc/roslyn?
@akoeplinger thank you.
I agree it would be nice to have consistency across the tools, but I have introduced a separate argument in order to avoid ambiguity with existing ilasm -debug options:
-debug:opt
-debug:impl
So the question is not about whether -pdbfmt and -debug options can be merged but whether it will be clear enough for ilasm users.
@noahfalk what do you think?
So the question is not about whether -pdbfmt and -debug options can be merged but whether it will be clear enough for ilasm users.
@ivanpovazan I would agree that taking inspiration from the csc options is appropriate here.
-debug[+|-] Emit debugging information
-debug:{full|pdbonly|portable|embedded}
Specify debugging type ('full' is default,
'portable' is a cross-platform format,
'embedded' is a cross-platform format embedded into
the target .dll or .exe)
Since the debug flag appears to have a predefined set of options for ilasm and I personally wouldn't want to deal with the matrix of multiple argument, I propose slightly modifying your suggestion to align with existing ilasm syntax and acknowledge csc convention.
-debugfmt={full|portable}
At this point we can support full and portable. In the future we can add embedded as the need arises.
I'm fine with the approach @AaronRobinsonMSFT suggested above. The thing I care about most is that we don't break any CLI syntax for scripts/build projects already succesfully using ilasm. Having new syntax that is understandable/consistent with csc is certainly nice as well. I don't fret about it too much because I assume there are a relatively small set of people in the world who will need to format ad-hoc ilasm commands.
Most helpful comment
@ivanpovazan I would agree that taking inspiration from the
cscoptions is appropriate here.Since the
debugflag appears to have a predefined set of options forilasmand I personally wouldn't want to deal with the matrix of multiple argument, I propose slightly modifying your suggestion to align with existingilasmsyntax and acknowledgecscconvention.-debugfmt={full|portable}At this point we can support
fullandportable. In the future we can addembeddedas the need arises.