Hi! I'm developing a programming language of my own using the expression tree API and want to debug programs written in Expresso(which is the name of the programming language) in VS Code.
To do that, I need to first generate PDB/MDB files and I found and heard that one can generate Portable PDB files by using APIs from the System.Reflection.Metadata namespace. But I just can't find any documentations on or about it, nor source codes that apparently use them.
So, is here the right place to ask it? If so, would you mind if I ask how to use the APIs?
I found the documentation at https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PortablePdb-Metadata.md to be extremely helpful. The APIs in System.Reflection.Metadata map pretty directly to that low-level structure.
As far as finding real-world code that uses these APIs, this was a useful reference at times since I think it's the main driver for System.Reflection.Metadata: http://source.roslyn.io/#System.Reflection.Metadata/System/Reflection/Metadata/Ecma335/MetadataBuilder.cs,91bf369f98df83a7,references
I found the documentation at https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PortablePdb- Metadata.md to be extremely helpful.
I also found the documentation about the format of the portable PDB, which will be indeed helpful if I will have decided to implement the feature by myself.
As far as finding real-world code that uses these APIs, this was a useful reference at times since I think it's the main driver for System.Reflection.Metadata:
Hmm, when I said how to use the APIs I meant public-facing APIs that users could call and didn't need to know about how it does something. So I was expecting you were telling me about the public-facing APIs. Sorry about that.
Is there any APIs that I can call even though I don't know how it achieve generating portable PDB files?
Anyway, thank you for telling me Roslyn uses System.Reflection.Metadata, because I didn't notice that. I will look through it for other places that use it.
Hmm, when I said how to use the APIs I meant public-facing APIs that users could call and didn't need to know about how it does something. So I was expecting you were telling me about the public-facing APIs. Sorry about that.
These are public-facing APIs and critically useful ones at that. I've used them in a number of projects. Personally, I'm scratching my head at the suggestion that APIs could exist that would allow you to build PDBs for your custom language without knowing what PDBs are and how they work—some kind of abstraction over kinds of PDB?—but I don't speak for the corefx team, so let's wait to hear from them.
These are public-facing APIs and critically useful ones at that. I've used them in a number of projects.
Oh, I see. I'll try that out!
Personally, I'm scratching my head at the suggestion that APIs could exist that would allow you to build PDBs for your custom language without knowing what PDBs are and how they work—some kind of abstraction over kinds of PDB?
Ah, I'm feeling that I misunderstand your point. Did you mean "Use the MetadataBuilder class. It's the public-facing API"? I thought it contained public-facing APIs.
By the way, I just thought it would be wonderful to have API like the ones in Mono.Cecil with which you write something like the following to read and write back PDB files.
var writer_provider = new Mono.Cecil.Cil.PortablePdbWriterProvider();
var asm_resolver = new Mono.Cecil.DefaultAssemblyResolver();
asm_resolver.AddSearchDirectory(options.OutputPath);
var asm = Mono.Cecil.AssemblyDefinition.ReadAssembly(
file_path, new Mono.Cecil.ReaderParameters{
AssemblyResolver = asm_resolver
}
);
asm.Write(GetAssemblyFilePath(ast), new Mono.Cecil.WriterParameters{
WriteSymbols = true,
SymbolWriterProvider = writer_provider
});
And I expected System.Reflection.Metadata has similar APIs as well.
Ah, I'm feeling that I misunderstand your point. Did you mean "Use the MetadataBuilder class. It's the public-facing API"? I thought it contained public-facing APIs.
I'm sorry I didn't understand you right away! Any public (or protected) surface area is externally visible to other assemblies, and thus this surface area is considered an API since it performs all the functions of an API. External assemblies bind to it as the interface (i.e. protocol, gateway) to specific functionality.
Therefore, copying MetadataBuilder.cs into your source code would be an example of using the implementation while managing to avoid using the API. Coding directly against the MetadataBuilder class in System.Reflection.Metadata.dll is an example of using the API.
If Mono.Cecil works for you, great! It parses everything eagerly into a domain model in memory, so for my projects I have a strong preference for S.R.Metadata which is low-level. I think of it like the difference between XmlDocument and XmlReader/Writer. If I'm writing a tool that handles IL, performance has typically happened to be at a premium or else I'm introducing lag into the inner-loop debugging experience. Your mileage may certainly vary!
If Mono.Cecil works for you, great!
Unfortunately, it doesn't work for me because it reads an existing PDB file, modify it and write it back. I need some library that actually creates PDB files. To tell the truth, I can already debug my programs written in my original programming language in VS on Windows, but I want to debug them in VSCode and I want to do that on Mac as well, so I'm here to ask you.
Therefore, copying MetadataBuilder.cs into your source code would be an example of using the implementation while managing to avoid using the API. Coding directly against the MetadataBuilder class in System.Reflection.Metadata.dll is an example of using the API.
So I need to use the MetadataBuilder class? But how? Because the methods you just showed me are internal and private, I'm afraid I can't use them.
And I want to know how to generate portable PDB files using System.Reflection.Metadata. The MetadataBuilder class is for building metadata which is part of portable PDB, isn't it? So would you mind if I ask about that?
It's almost 1 AM in Japan, so I'm going to bed, and therefore it will take a while to reply back next.
So I need to use the MetadataBuilder class? But how? Because the methods you just showed me are internal and private, I'm afraid I can't use them.
I'm sorry! I assumed you would have looked at MetadataBuilder in intellisense and seen all the public methods. Here's its public API: https://docs.microsoft.com/dotnet/api/system.reflection.metadata.ecma335.metadatabuilder
My original link lists internal Roslyn methods in the sidebar which demonstrate the use of MetadataBuilder, such as FullMetadataWriter's methods. FullMetadataWriter is specific to the internals of the Roslyn compiler though, so it's only an example. You'd still use the public members of MetadataBuilder directly.
Besides this link, here's another one demonstrating another way MetadataBuilder can be used: https://github.com/dotnet/symreader-converter/blob/420f4bfaaf7e423c537d88db2b5f65ab4f3c7b24/src/Microsoft.DiaSymReader.Converter/PdbConverterWindowsToPortable.cs#L65
As a community member I'm happy to answer questions about MetadataBuilder, but like I said, https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PortablePdb-Metadata.md is excellent. It answered all my questions.
Sleep is good. I say never feel apologetic about checking out of an issue for as long as you need. It's expected. Asynchronous communication for the win!
I assumed you would have looked at MetadataBuilder in intellisense and seen all the public methods. Here's its public API:
OK, I'll try that out! Thank you very much!
My original link lists internal Roslyn methods in the sidebar which demonstrate the use of MetadataBuilder, such as FullMetadataWriter's methods.
Besides this link, here's another one demonstrating another way MetadataBuilder can be used:
As a community member I'm happy to answer questions about MetadataBuilder, but like I said, https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PortablePdb-Metadata.md is excellent.
OK, then I'll use them as references.
Asynchronous communication for the win!
Then we must be sure to be synchronized like we are doing now, haha ;- )
By the way, there is a PortablePdbBuilder class in the System.Reflection.Metadata namespace and I'm just wondering isn't it the one I'm supposed to use when I generate a portable PDB file?
Oh, also I'm wondering whether it will get along with (maybe bad English?) or fit with the expression tree. Because it's tedious to repeat same things like the expression tree generates metadata and then I also have to generate the same metadata to generate portable PDB files using System.Reflection.Metadata. So I hope it will fit into it.
PortablePdbBuilder is a final step after you finish populating the MetadataBuilder. If you look at what you do with it, PortablePdbBuilder is just acting as a container formatter. You won't get very far unless you use PortablePdbBuilder and MetadataBuilder together.
Here's where API documentation for S.R.Metadata would save time for you and me. I can infer how to use it based on source.roslyn.io, but that can take time and doesn't guarantee a full picture. @karelz Has the team discussed putting out reference docs and guides for libraries like S.R.Metadata?
get along with
Perfectly understandable, not bad English. We typically use this phrase when talking about people, so it anthropomorphizes the subject if the subject is not a person. It's unusual enough that it might communicate a shade of playfulness. (I'm not a particular authority on English, so take this with a grain of salt.) If I wanted to stand out a bit less, I might have said something similar to, "whether it works well in conjunction with expression trees."
Because it's tedious to repeat same things like the expression tree generates metadata and then I also have to generate the same metadata to generate portable PDB files using System.Reflection.Metadata. So I hope it will fit into it.
I'm not familiar with generating metadata from expression trees. Is this the way your compiler generates IL? It sounds like you want to merge or post-process the metadata generated by compiling expression trees?
I might recommend writing IL directly in a single pass instead of merging or post-processing, but then you'd probably have to stop relying on expression trees to generate the metadata for you. I'm interested in seeing a simple demonstration of this, just to make sure I know what I'm talking about here.
PortablePdbBuilder is a final step after you finish populating the MetadataBuilder. If you look at what you do with it, PortablePdbBuilder is just acting as a container formatter. You won't get very far unless you use PortablePdbBuilder and MetadataBuilder together.
I'll learn how to use the classes from source codes you just showed me.
Thank you so much!
Perfectly understandable, not bad English.
That eased me.
If I wanted to stand out a bit less, I might have said something similar to, "whether it works well in conjunction with expression trees."
I didn't come up with the phrase! I'll use that one!
I'm not familiar with generating metadata from expression trees. Is this the way your compiler generates IL? It sounds like you want to merge or post-process the metadata generated by compiling expression trees?
Currently they don't generate metadata, because I chose to do so.
I might recommend writing IL directly in a single pass instead of merging or post-processing, but then you'd probably have to stop relying on expression trees to generate the metadata for you.
Hmm, well, I wouldn't like to emit IL codes by myself because expression trees do save a lot of time and effort for me! But because I can write a source code that generates the same IL codes as the current implementation emits, it wouldn't be a big problem.
I'm interested in seeing a simple demonstration of this, just to make sure I know what I'm talking about here.
Then this would be a simple example that generates a PDB file(thus generates metadata) along with an assembly file using expression trees. (I think) it only works on Windows because the DebugInfoGenerator.CreatePdbGenerator method complains that "It's not available on this platform" on Mac, though. And it apparently generates a Windows-only PDB file so I can't use it.
@jnm2 Has the team discussed putting out reference docs and guides for libraries like S.R.Metadata?
I let area owners & experts to comment on plans in the space - @tmat @nguerrera
@hazama-yuinyan I was not aware of DebugInfoGenerator.CreatePdbGenerator or Expression.DebugInfo, so you're ahead of me there! Looks like it's Windows PDBs only, and .NET Framework only. If you want to write portable PDBs you'll have to have some way of mapping the offsets into the generated IL back to your source text. Can you determine those offsets if you rely on compiling expressions to generate IL?
Has the team discussed putting out reference docs and guides for libraries like S.R.Metadata?
Unfortunately we do not have resources to do so at this point.
@tmat Seems like we can probably figure, if you're manipulating IL, you're smart enough to do without guides anyway. 😊
@jnm2 Something like that :) The source references you mentioned above are good places to look for samples.
If you want to write portable PDBs you'll have to have some way of mapping the offsets into the generated IL back to your source text. Can you determine those offsets if you rely on compiling expressions to generate IL?
@jnm2 I already placed DebugInfoExpressions which represent Sequence Points in the expression tree(for now only for variable declarations to make sure they are placed in right places), and I'm thinking that I could implement an ExpressionVisitor class which visits the expression tree and generates Sequence Points from the DebugInfoExpressions. I'm hoping that would work for me ;-)
@hazama-yuinyan Ultimately, will DebugInfoExpression lock you into Windows-only PDBs?
@jnm2 I don't know until I'll give it a try. But I guess(and am only guessing) DebugInfoGenerator is the class responsible for actually generating PDBs, so it will generate a portable PDB instead if I pass a DebugInfoGenerator that converts DebugInfoExpressions to Sequence Points and generates portable PDBs. This is what I meant in the previous reply. I'll look at the source codes to figure out what it does.
@jnm2 Could I ask a question about the portable PDB format?
Skip below if you refuse to answer it.
I think I'm almost there, the first step, which means a portable PDB file that only contains a Document table and MethodDebugInformation table.
I've done with serializing sequence points. But I have no idea of what the typeSystemRowCounts parameter of the PortablePdbBuilder class's constructor, that is, "TypeSystemTableRows" in the portable PDB format, is. Yes, the Specification says something about it, but I can't understand it. What is type system metadata table? Why does it claim that the size of the array must be 64? After reviewing the specification again, I realized that 64 comes from the size of the ReferencedTypeSystemTables field, but the former question remains unanswered.
I first thought, when I saw the field and the explanation, that it must represent the number of types defined in the assembly that I emit, so I passed an immutable array of {1}(an array with 1 item, which is 1). However, as I said, it complains about it being a 1-item array.
So what does typeSystemRowCounts represent?
Certainly!
These are the only two places where Roslyn specifies a typeSystemRowCounts for the PortablePdbBuilder constructor:
http://source.roslyn.io/#Microsoft.CodeAnalysis/PEWriter/MetadataWriter.cs,1746
http://source.roslyn.io/#Microsoft.CodeAnalysis/PEWriter/PeWriter.cs,212
Just passing metadataBuilder.GetRowCounts() has worked for me in the past.
So what does
typeSystemRowCountsrepresent?
The size of each table in the program debug database. The serializer needs to know ahead of time exactly how much space to allocate.
It works for me too!
Thanks a lot!
But the resulting PDB doesn't. Still needs some trial and error, sigh ;(
What are you doing to test the resulting PDB to see if it works?
I'm trying to actually debug my custom program(I mean the one written in my own programming language) in VS Code because it's the goal and I think it's the easiest way to see if it works. Although I realized that I can debug it in Visual Studio on Windows using a Windows-only PDB, I don't know whether it works on Mac as well.
I found that the method I expected to be called wasn't actually called.
Finally I realized that Mono's LambdaExpression.CompileToMethod didn't accept the second parameter and it just ignored a DebugInfoGenerator. It's very disappointing ;-(
And I asked if they support it and they answered not soon. That leaves me the options to implement it by myself or to customize Mono and distribute it instead, sigh ;=(
Oh, I forgot to mention that metadataBuilder.GetRowCounts doesn't work on Windows for me. It complains that the row count for # 49 must be zero. Although if I pass a 64-item array whose items are all 0 to it, that part work, but the resulting PDB doesn't work.
Hi! It's been a while! I've finally rewritten code generation using Reflection.Emit.* APIs, so now I can emit SequencePoints by myself. During that process, however, I'm lost in using MetadataBuilder.
It complains that the row count for # 49 must be zero.
It seems that the error is caused because I don't add MethodDefinitions, TypeDefinitions, Modules and Assemblies to MetadataBuilder. So I tried to add them, but I don't know what I should pass to the APIs.
For example, the signature of MetadataBuilder.AddMethodDefinition looks like (MethodAttributes attributes, MethodImplAttributes implAttributes, StringHandle name, BlobHandle signature, int bodyOffset, ParameterHandle parameterList) but I have no idea of what I should pass for signature. As the parameter name says, it should be something like generic type parameters plus the return type but what format should it be in?
And the source codes you showed me before are about reading existing PDB files and converting them to another format and therefore they don't use APIs like MetadataBuilder.AddMethodDefinition.
I found that the MetadataBuilder.AddMethodDefinition method definition contains xml comments but it doesn't say almost nothing about what signature is.
So the question is "are there any documentations about them or source codes that use them?"
It's really really hard to figure out how to use them from what I have now, so please help me =-(
@hazama-yuinyan Can you refresh my memory—are you rewriting an existing PDB stream or generating one from scratch?
@jnm2 The latter. My compiler generates and creates programs from scratch.
I would appear that Roslyn does not use MetadataBuilder.AddMethodDefinition at all, unless something's gone wrong with the source browser: http://source.roslyn.io/#System.Reflection.Metadata/System/Reflection/Metadata/Ecma335/MetadataBuilder.Tables.cs,d5f891e17bb58177,references
It complains that the row count for # 49 must be zero.
It seems that the error is caused because I don't add MethodDefinitions, TypeDefinitions, Modules and Assemblies to MetadataBuilder.
Which table is dotnet/runtime#13861 and why would adding method definitions etc cause the row count to become zero since it is not zero currently?
Edit: Ah, here it is, TableIndex.MethodDebugInformation. The message is saying that the MethodDebugInformation table must be empty.
@hazama-yuinyan What's the actual line that's throwing the exception? I want to see it at source.roslyn.io.
@jnm2 Now the error message is slightly different. It says "Row count must be zero for table dotnet/runtime#13860." and dotnet/runtime#13860 is for Document table. Here is the stack trace when the exception is thrown.
at System.Reflection.Metadata.Ecma335.PortablePdbBuilder.ValidateTypeSystemRowCounts (System.Collections.Immutable.ImmutableArray`1[T] typeSystemRowCounts) [0x0009c] in <4bd432d0a09845d3867347e567b864a4>:0
at System.Reflection.Metadata.Ecma335.PortablePdbBuilder..ctor (System.Reflection.Metadata.Ecma335.MetadataBuilder tablesAndHeaps, System.Collections.Immutable.ImmutableArray`1[T] typeSystemRowCounts, System.Reflection.Metadata.MethodDefinitionHandle entryPoint, System.Func`2[T,TResult] idProvider) [0x00013] in <4bd432d0a09845d3867347e567b864a4>:0
at Expresso.CodeGen.PortablePDBGenerator.WriteToFile (System.String filePath) [0x0005e] in Documents/MonodevelopSolutions/Expresso/Expresso/CodeGen/PortablePdbGenerator.cs:70
at Expresso.CodeGen.CodeGenerator.VisitAst (Expresso.Ast.ExpressoAst ast, Expresso.CodeGen.CSharpEmitterContext context) [0x006eb] in Documents/MonodevelopSolutions/Expresso/Expresso/CodeGen/CodeGenerator.cs:386
at Expresso.Ast.ExpressoAst.AcceptWalker[TResult,TData] (Expresso.Ast.IAstWalker`2[TData,TResult] walker, TData data) [0x00001] in Documents/MonodevelopSolutions/Expresso/Expresso/Ast/ExpressoAst.cs:100
at Expresso.Test.EmitterTests.SimpleLiterals () [0x0006a] in Documents/MonodevelopSolutions/Expresso/ExpressoTest/EmitterTests.cs:36
at (wrapper managed-to-native) System.Reflection.MonoMethod.InternalInvoke(System.Reflection.MonoMethod,object,object[],System.Exception&)
at System.Reflection.MonoMethod.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00032] in /Users/builder/jenkins/workspace/build-package-osx-mono/2017-12/external/bockbuild/builds/mono-x64/mcs/class/corlib/System.Reflection/MonoMethod.cs:305
I called MetadataBuilder.AddDocument. I think that's why the compiler complains, but I'm just following what my reference site(which, by the way, I guess you showed me before) says.
If not calling MetadataBuilder.AddDocument is the right way for me to take, then I just wonder what I should do to emit PDB files for my programs. Hmm
To make stack traces readable on GitHub, I usually use code fences:
````markdown
Stack trace lines
````
You're running into this: http://source.roslyn.io/#System.Reflection.Metadata/System/Reflection/Metadata/Ecma335/PortablePdbBuilder.cs,70d6e0b339ef60ca
TableMask.ValidPortablePdbExternalTables = TypeSystemTables & ~PtrTables & ~EncTables
This is what Roslyn is doing:
Something's broken with the source browser because this does indeed call into the same code you're using:
PortablePdbBuilder constructor which calls ValidateTypeSystemRowCounts which throws if you fill in any tables except for TableMask.ValidPortablePdbExternalTables.
@hazama-yuinyan Here's a question: are you trying to put the debug tables in the same metadata blob as the IL? I think you're supposed to either keep it in a separate PDB file or else serialize all the debug metadata into a blob and store it in the PE debug directory like this.
To make stack traces readable on GitHub, I usually use code fences:
Thank you.
You're running into this:
Does that mean that I should not have emitted DebugTables even though the spec says you will populate DebugTables?
Hmm, could it be that if I will populate TypeSystemTables then PortablePdbBuilder will populate DebugTables?
@jnm2 I'm trying to put the debug tables in a separate PDB file because I couldn't take the former path. I'm using System.Reflection.Emit.ILGenerator for emitting IL codes. I think that the ILGenerator doesn't support the feature to emit IL codes and the debug tables all in one file.
Okay, cool. I have no clues besides reverse-engineering https://github.com/dotnet/roslyn/blob/c4189de08b943f87d9f03d45ef01e615800e359d/src/Compilers/Core/Portable/PEWriter/PeWriter.cs#L211-L236. Maybe clone the Roslyn repository and put breakpoints there and see how it builds up the PDB content to be written to the PDB stream?
Oh, hey, I'm blind. That parameter is named typeSystemRowCounts, so of course it's expecting to have only type system info (no debug info). So you can build the PDB debug info, but first, you must have the non-debug typeSystemRowCounts calculated so that it can be referenced while adding debug info.
That error message is saying, "Fundamentally, what you're doing doesn't make sense." (I.e., referencing already-built debug info while trying to build debug info.)
I'm embarrassed because I just remembered having this flash of inspiration about the name typeSystemRowCounts last year.
I'm embarrassed because I just remembered having this flash of inspiration about the name typeSystemRowCounts last year.
Why embarrassed? It makes sense and it's great, isn't it?
OK, build the type infos first and then the debug info.
I would appear that Roslyn does not use MetadataBuilder.AddMethodDefinition at all, unless something's gone wrong with the source browser:
Hmm, I have no other options except to look into the source codes for MetadataBuilder and figure out how to use it?
Embarrassed because I forgot and it took me a while both times to notice the simple parameter name that gives it all away. :D I'm fine.
Hmm, I have no other options except to look into the source codes for MetadataBuilder and figure out how to use it?
That's pretty much what I understood @tmat to say: https://github.com/dotnet/corefx/issues/29122#issuecomment-382171969 It's not documented, so we have to rely on looking at other projects that use it. Its development is mainly driven by Roslyn, I'd imagine.
Embarrassed because I forgot and it took me a while both times to notice the simple parameter name that gives it all away. :D I'm fine.
That relieves me ;-)
That's pretty much what I understood @tmat to say: dotnet/corefx#29122 (comment) It's not documented, so we have to rely on looking at other projects that use it. Its development is mainly driven by Roslyn, I'd imagine.
OK, I see. I'll try that out. Would you help me out if I get into big walls while I'm figuring out how to use it?
@hazama-yuinyan Yes, I am happy to help!
@jnm2 Thank you very much! It's really heartening!
Hi! I've successfully emitted debug informations but it doesn't work. I mean, I'm using VS for Windows to test if I can debug the resulting program because you can debug whatever programs if they contain portable PDB files or Windows-only PDB files, but I can't with the portable PDB file that my compiler emits.
The file size is almost the same as the one that is converted from a Windows-only PDB file, so I guess I have emitted every information in order for you to debug it. But it just doesn't work. I have no idea.
Do you have any suggestions?
Does the PDB ID in the Portable PDB matches the entry in the debug directory of the DLL?
You can see the IDs using MDV tool on both the DLL and the PDB:
https://dotnet.myget.org/feed/metadata-tools/package/nuget/mdv
@tmat I tried the tool and found that the one that is converted from Windows-only PDB has a section called Debug Directory, which I think is key to debugging. But I can't tell how I can emit it on my executable.
I'm using the standard AssemblyBuilder, ModuleBuilder and TypeBuilder classes to emit the executable and I'm wondering whether I can combine it with System.Reflection.Metadata. Am I on the right path? If so, how can I emit the debugging information on the executable?
Ah, I searched for PDB ID and Roslyn says MetadataReader, MetadataReader... He must be shy, I'm sure ;-)
@hazama-yuinyan Have you tried creating the PE debug directory using something like ModuleBuilder.DefineInitializedData?
Also see https://stackoverflow.com/questions/17995945/how-to-debug-dynamically-generated-method.
@jnm2 No. I haven't even heard of.
I can emit the debug directory section with this method? But how?
Also see https://stackoverflow.com/questions/17995945/how-to-debug-dynamically-generated-method.
Already have read and done it.
@hazama-yuinyan I don't think Reflection.Emit lets you customize the PE debug directory.
@tmat @jnm2 You're right, tmat. I tried ModuleBuilder.DefineInitializedData with both "DebugDirectory" and "Debug Directory" names, but it does nothing on the debug directory section.
I'll look into Pdb2Pdb's source code that I used to convert a Windows-only PDB file to a Portable PDB file.
The Pdb2Pdb's source code also uses PEReader. Maybe I should stop using Reflection.Emit or maybe I should use classes like PEReader to emit only the debug directory section.
Hmm, I'd imagine it would literally read metadata from executable files so it's not exactly what I want to do, but then what should I do?
I don't know much about it besides mimicking Roslyn (which doesn't use Reflection.Emit). Either you'll have to find a way to manipulate Reflection.Emit to emit the debug directory in the first pass, or post-process the saved PE either manually or via System.Reflection.Metadata, or convert all the Reflection.Emit code to SR.Metadata. In any of these three options, you'll want to compare your output to what Roslyn and SR.Metadata write. Here's the format of the PE debug directory: https://docs.microsoft.com/windows/desktop/debug/pe-format#debug-directory-image-only
@jnm2 I think there is no way to manipulate Reflection.Emit to emit the debug directory, so I choose post-processing for now. If there is any problem, then I'll consider taking other paths.
So could you tell me how to emit the debug directory with SR.Metadata? Which class will I use?
I don't know much about it besides mimicking Roslyn (which doesn't use Reflection.Emit)
You said this, so even you don't know how to do that? Then I'll try to figure out...
I think I've successfully rewritten the compiler and the executable now has the debug directory section. But when executing the resulting program, it throws an exception saying "BadImageFormatException: Index not found". I think that it results from the target platform being x86 and executing the program on x64 machines, but I don't understand why it's happening because I don't change the target platform from the one that AssemblyBuilder emitted.
How do you think can I fix it?
OK, I got over this wall, but have gotten into another one. I can't somehow get a parameter handle correctly set. I think I did what the spec says, I mean, setting the first parameter handle if the method declares parameters and setting the next method's first parameter handle otherwise.
Maybe I'm misinterpreting setting the next method's first parameter? What does it mean in the first place? Isn't setting the default handle if the next method doesn't have parameters enough?
Sorry I've been tied up with other stuff.
I can't find the document, but IIRC, each row in the method table points to a range in the parameters table. If a method has parameters, the method row points to the first parameter in the parameters table. If it does not have parameters, it points to where its parameters would have started, but since it has none, it is in fact pointing to the parameter of the next method that has parameters.
It makes sense because in order to find out how many parameters a method has, you look at the next method row and see which ending row it's pointing at. It still needs to be pointing at the correct place or else it's not possible to determine how many parameters the previous method has (without potentially walking through the entire rest of the method table looking for a method that actually has parameters).
Sorry I've been tied up with other stuff.
Never mind! I also can do other stuff.
If it does not have parameters, it points to where its parameters would have started, but since it has none, it is in fact pointing to the parameter of the next method that has parameters.
I still don't get it.
What if the rest of the methods all don't have parameters? In other words, let's say we have the following methods.
method A: no parameters
method B: has parameters
method C: no parameters
method D: no parameters
What should the row of the method C point at? I thought it should point at what method B does, but it didn't work. I'm confused.
@hazama-yuinyan I think it should point past the end of the list then.
Using this pseudocode to show the general rule for decoding the table:
methodParamCount[i] = methods[i + 1].StartParameterIndex - methods[i].StartParameterIndex;
Therefore:
methods[i + 1].StartParameterIndex = methods[i].StartParameterIndex + methodParamCount[i];
Method table:
// Rule: (next row's StartParameterIndex) - (this row's StartParameterIndex) = (this row's parameter count)
M0: Name=A, StartParameterIndex=0 // 0 - 0 = 0, therefore M0 has 0 parameters
M1: Name=B, StartParameterIndex=0 // 1 - 0 = 1, therefore M1 has 1 parameter
M2: Name=C, StartParameterIndex=1 // 1 - 1 = 0, therefore M2 has 0 parameters
M3: Name=D, StartParameterIndex=1 // (parameter table count) - 1 = 0, therefore M3 has 0 parameters
Parameter table:
P1: Parameter for method B
Or think of maintaining a nextParameterIndex pointer which starts at 0. More pseudocode:
var nextParameterIndex = 0;
foreach (var method in methods)
{
AddRowToMethodsTable(name: method.Name, startParameterIndex: nextParameterIndex);
foreach (var parameter in method.Parameters)
{
AddRowToParametersTable(parameter);
nextParameterIndex++;
}
}
Hmm, I tried that method, but mdv.exe recognizes that the last method has the parameter, not method B. I tried getting the first parameter handle each time I hit a method definition and it failed too. I'm completely lost...
My understanding may be wrong. Here's the XML docs:
/// <param name="parameterList">
/// If the method declares parameters in Params table the handle of the first one, otherwise the handle of the first parameter declared by the next method definition.
/// If no parameters are declared in the module, <see cref="MetadataTokens.ParameterHandle(int)"/>(1).
/// </param>
Does that help at all?
I'm not finding any documentation on this table. https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf is huge but maybe contains what we need.
@tmat You can probably spot what's wrong off the top of your head?
Does that help at all?
Unfortunately, no :(
I read that and understood that way. I'll take a look at the PDF.
@hazama-yuinyan Have you played with the indexes a bit to see if you can move around that last parameter?
For example, are the indexes 0- or 1-based? The sequence 0, 0, 1, 1 doesn't work, but what about 1, 1, 2, 2? or other permutations of 0, 1, and 2? You have to hit it eventually.
OK, I'll try that!
Gotcha! I made it!
But got into another problem again.
This time mdv.exe says "<bad metadata>" on the first method. But other methods seem fine. What's wrong with the first method?
What's the sequence you came up with?
What's mdv.exe and what does it mean when mdv.exe says "" on a method?
You can see the IDs using MDV tool on both the DLL and the PDB:
https://dotnet.myget.org/feed/metadata-tools/package/nuget/mdv
This is mdv.exe. It's a viewer of the metadata. And GitHub accidentally stripped off what it says.
Oh, got it. So what was the parameter list pointer value for each methods in that scenario?
Actually the previous example is taken from the real program, so it should be 1, 1, 2, 2, 2, 2(there are 6 methods in real).
What happens if you generate only the first method by itself and use 1? What about the first two methods and 1, 1? Still the bad metadata error?
Well, I eliminated all the parameters because I can't eliminate all the necessary methods but it still says <bad metadata> and I took a look at mdv.exe's source code and realized that a BadImageFormatException causes it.
The exception says "Not index found" but I think it would result from not having the same number of MethodDebugInformations as that of MethodDefinitions. I've already emitted enough rows for the MethodDebugInformation table but I still get the same error. Why is it happening? Still missing something?
I have no idea. I've never used mdv.exe. I guess it's possible it has a bug; either way, debugging the source code of mdv might be illuminating.
I investigated mdv.exe and found out that in mdv.exe the exception says "Invalid relative virtual address" on ".ctor". I'm setting them as the reader reads so I'm wondering why it says so, but I will try to modify it. Do you know why?
Looking at https://github.com/dotnet/metadata-tools/blob/master/src/mdv/Mdv.cs, RelativeVirtualAddress comes up twice. It looks like 0 means the method definition has no method body, and any other number is a relative virtual address of the method body (IL). https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf has a lot of info on this.
Oh, I guess I needed to write what method throws the exception. It's generation.PEReaderOpt.GetMethodBody at line 316 in mdv.cs. And as I edited the previous post, I'm setting the RVAs as the reader reads(thus they will be set as AssemblyBuilder emitted) but this exception is happening. Because the assembly AssemblyBuilder emitted will be executed without any problems I wonder why it's not working if it's created by ManagedPEBuilder.
I guess I'm not setting the IL stream correctly. I mean, it's not set at the relative address. Can't you find how to set it correctly somewhere?
I'm sorry, I don't know. What's the minimal Ref.Emit code you can come up with that produces a binary with which mdv.exe has this issue?
Er, I got over this problem(sorry for not reporting it) but got into another one again and again.
Actually, it's not the assembly which was emitted with Ref.Emit that is the problem but the assembly which was emitted with SR.Metadata. So I assume I'm just missing some steps to produce an assembly when I emit the assembly again with SR.Metadata.
For the new problem, I'm investigating it. mdv.exe this time says "<bad metadata>" in the IL stream. And the BadImageFormatException says "Invalid method header: 0xAB 0x04".
Ah, can you tell why?
Fyi with Markdown you have three options to keep it from interpreting < as an HTML tag.
Idiomatic markdown:
\<bad metadata> →
Code (when appropriate):
`<bad metadata>` → <bad metadata>
HTML:
<bad metadata> → <bad metadata>
Ah, can you tell why?
I don't have experience here either. I'm probably not going to be much help at this point unless there's repro code I can poke at.
Fyi with Markdown you have three options to keep it from interpreting < as an HTML tag.
OK, I forgot to escape it.
I'm probably not going to be much help at this point unless there's repro code I can poke at.
Then, I can push my repository to GitHub with the current code, but it's a little bit complex to clone it and get it to work. Would you mind that? If no, then I'll consider that.
That might help, though I was thinking more along the lines of: how small can you strip down the code and still reproduce the problem?
Let me see...
Only my PEBuilder wrapper class might reproduce the problem but that still needs some other classes.
So I thought it would be easier for you to clone the whole repository and test it than for me and you to figure out which classes are needed and include it in the reproduction code.
Well sure, that's easier for you 😆 but I've always seen it to be useful for one's own sake to create a reproduction with as little code as possible, too. Using a bare PEBuilder if possible. Half the time you find the issue while doing so.
Oops, it's rather a driver class not a wrapper. The class drives the ManagedPEBuilder, MetadataBuilder and similar classes and emit a PE with the debug directory section.
OK, I'll figure out which classes are needed to reproduce the problem ;-)
I tackled and found out that nested classes appear twice in the resulting assembly(one is nested and the other lives in the global namespace) and this xml comment.
/// <remarks>
/// Entries must be added in the same order as the corresponding nested type definitions.
/// </remarks>
So I tried to emit nested classes first but it failed because of the relative virtual address. Because mdv.exe doesn't say "
Plus, I noticed that the constructor of the nested class is trimmed. Its method body is empty after it rebuilds the assembly.
Same for another global class. The constructor becomes empty after it rebuilds it.
Do you need the source code? Then I can copy and paste the (possibly) minimal reproduction code.
OK, I've resolved the constructor problem. And because it's not happening in the real compiler, I could ignore the nested class problem, too. Sorry to disturb you.
You are not disturbing me, don't worry. I'm replying when I have time and knowledge.
Sorry to disturb you again. Although the assembly works now, ASCII strings become empty after it rebuilds the assembly and it segfaults because of UTF-8 strings. After debugging it, I found that correct strings and empty strings are added as user strings. Everything else seems to work fine.
Do you have any suggestions?
You are not disturbing me, don't worry. I'm replying when I have time and knowledge.
Thanks!
Did Reflection.Emit generate the UTF-8 strings? I didn't realize they were considered legal by the CLR.
I guess I made a mistake. It's encoded in UTF-16, though I didn't make sure I was right.
I found out that the offsets for user strings are somehow moved and therefore the correspondence between ids and the real strings will be broken and that user strings seem to be automatically added. I don't understand how that happens.
Maybe I was wrong. User strings aren't automatically added.
Should I inspect the method bodies and add the user strings?
I don't know. Are you translating method bodies or just treating them as blobs?
Just treating them as blobs.
I found Roslyn does that, so maybe I should too.
I think the method bodies contain handles to table rows, so they need to be translated unless you can guarantee that the row indexes in those tables don't change.
OK, because I can't guarantee that(it somehow changes the row ids), I have to translate them.
I've implemented the Roslyn solution but Windows refuses to execute the resulting assembly with BadImageFormatException saying "Index not found". I found that trying to load a 64 bit dll in a x86 file or vice versa causes it so I tried to switch the target platform to x86 or x64 but it failed either.
Do I have to configure AssemblyBuilder to target a specific platform? How can I do that?
mdv.exe is now satisfied. I have lost the direction. What should I do? On Mac and Mono, it runs without any problems. How can I satisfy you, Mr. Windows...
I tried with my minimal reproduction code, and Windows denied the access.
@hazama-yuinyan How are you constructing your PEHeader?
@jnm2 Just as it reads out from the assembly that AssemblyBuilder built. Concrete values needed?
Well, that's the point at which you'd switch from AssemblyBuilder's x64 or x86 to AnyCPU.
Oh, OK. I'll look at it.
The Machine enum doesn't have AnyCPU or something similar and Unknown didn't work. How do I specify AnyCPU? I found CreateExecutableHeader static method on PEHeaderBuilder, but it didn't work either.
A MSDN page says that it's because you are running CLR version 2.x and trying to load an assembly produced with over 4.x. But of course it says it's running the latest .NET Framework. So I think it's because some values are according to some old specification as they were from the AssemblyBuilder.
I guess I need to follow Roslyn. Thanks for pointing out the direction!
Followed Roslyn but Windows is harsh on me...
I don't know what I should do...
Yeeeeeees! It has run! And as I expected, the break points will be hit.
It was because the field row id and the method row id is out of range. Windows, you were right. Index was definitely missing.
This ends my journey. There are much more tasks left but I would like to say thank you to you. I wouldn't have been here without you @jnm2 and @tmat.
Next time, maybe I will create another issue.
@hazama-yuinyan That's fantastic! Props to you for doing the hard work and not giving up on it! 🎉
Issues are best when you have a single specific question. https://gitter.im/dotnet/roslyn would be glad to have you for general questions, though!
@jnm2 Thank you so much!
Issues are best when you have a single specific question. https://gitter.im/dotnet/roslyn would be glad to have you for general questions, though!
OK, I'll use that if I have multiple questions. Thank you again!