We’re currently working on .NET support in our CPU profiler (www.superluminal.eu). I’d like to add support for NuGet’s symbol server so that profiling of NuGet packages is entirely transparent to the user.
We’re currently using dbghelp for symbol server support (SymFindFileInPath) and since NuGet’s symbol server seems to follow the regular symbol server format, I’d expected it to just work.
However, I’m getting a 403 error for any PDBs I try to get. I know that Visual Studio does support NuGet, so I reverse engineered what that’s doing and it seems NuGet requires a SymbolChecksum header to be present for symbol server GET requests. If I manually send GET requests with this header it indeed works.
I’m aware I can retrieve the checksum from the PE header, but that’s pretty inconvenient as it requires the image file to be present in order to read the checksum, which is not always the case (for example, when examining a ETW trace made on another machine). The checksum also seems redundant, since the PDB signature should already be enough to retrieve the PDB (and indeed that’s how it works in the native case).
So, I was wondering what the reasoning behind this requirement is, and if it’s possible to somehow opt out of this behaviour?
Not having the requirement for this header would allow client tooling to treat NuGet’s symbol server as “just another symbol server”, rather than requiring special case handling to add the header, which seems desirable to me.
Any input is appreciated!
From my recollection, this was meant to address some security concern. Both the server and client are supposed to validate that the downloaded PDB matches the checksum. You cannot opt out of this behavior on nuget.org.
/cc @cristinamanum @tmat @diverdan92 As they designed this feature and may remember additional details.
Here some resources that may help:
Thanks for the response and the links!
I can definitely see the security concerns with downloading random PDBs from a potentially untrusted source, but I'm not sure that validation on the server-side really gains you anything there? For clients that care about that security, checksum validation on the client-side seems like it should be enough.
On the other hand, the current approach breaks compatibility with existing symbol server tooling. Some practical examples we're running into:
I understand that the checksum header requirement is unlikely to change, but I thought it might be worth it to at least start a discussion about it. Breaking compatibility with all existing symbol server clients seems like a pretty steep price to pay :-)
@rovarma As @loic-sharma mentioned this header is required for security reasons. We intentionally did not want existing clients to work with the new symbol server as that would potentially allow attackers to hack these clients via malicious PDBs. The PDBs available from NuGet are only Portable PDBs. Existing clients/libraries wouldn't support those anyways. Note that NuGet symbol server is the only symbol server that allows anyone (untrusted parties) to publish symbols to. That's different from other symbol servers who trust the publishers. Hence this extra security measure.
The checksum is stored in Debug Directory entry: https://github.com/dotnet/runtime/blob/master/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19 next to the CodeView entry that has the other PDB information. Does ETW not include the entire Debug Directory?
@tmat thanks for additional clarification!
The PDBs available from NuGet are only Portable PDBs. Existing clients/libraries wouldn't support those anyways.
I understand that existing clients wouldn't be able to consume Portable PDBs out of the box, but they seem like two unrelated concerns to me: there's teaching particular tools to understand Portable PDB (like ours does now), and there's retrieving the correct PDB from a symbol server (which our tool already did perfectly fine). The retrieval part is independent of the 'understanding' part; as long as the store follows the regular symbol server conventions (and NuGet does, for the most part) existing retrieval code (including dbghelp) can just function as-is without any changes.
We intentionally did not want existing clients to work with the new symbol server as that would potentially allow attackers to hack these clients via malicious PDBs. [...] Note that NuGet symbol server is the only symbol server that allows anyone (untrusted parties) to publish symbols to. That's different from other symbol servers who trust the publishers. Hence this extra security measure.
While I definitely understand the concerns relating to untrusted publishers, I'm not sure I understand what server-side validation wins you wrt security. As I see it, the client needs to give NuGet the checksum of the PDB that it's looking for so that it can verify on the server-side that the PDB still matches that checksum (and I guess consider it an error if it doesn't). But, since the client is the one who needs to provide the checksum, it needs to retrieve it from somewhere, which in this case is the PE, which was published to NuGet in the first place. So, I think there can only be two reasons why the checksum wouldn't match on the server-side:
I think the only way clients can be reasonably sure the PDB hasn't been tampered with, is if they perform the checksum validation on the client-side since that's the only place that trusted code is running at that point. This also guards against the PDB being modified in transit. Even then it's not a guarantee, since an attacker could have modified both the PE and the PDB on NuGet's side.
Of course, it's possible I'm missing some attack vector here that you're trying to guard against; if I am, I'd love to know, if only for curiosity's sake :-)
The checksum is stored in Debug Directory entry: https://github.com/dotnet/runtime/blob/master/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19 next to the CodeView entry that has the other PDB information.
Yes, I'm aware of the checksum entry -- we already had code to retrieve it when I made the original post, but as I mentioned earlier, it does require that the PE is available at the time you need to retrieve the matching PDB from NuGet.
Does ETW not include the entire Debug Directory?
Unfortunately not. The info needed to retrieve (and match) images & PDBs from symbol servers is injected into the .etl file as a result of a call to CreateMergedTraceFile with the EVENT_TRACE_MERGE_EXTENDED_DATA_IMAGEID flag. This is what happens as part of xperf -merge. The way it works is it goes through all modules (both native & managed) in the trace, opens each module, extracts some of the debug information from the PE header and outputs that into the .etl file. The debug information it currently outputs is:
(See here to get an idea of the information ETW outputs)
This is all the information needed to uniquely identify & retrieve both images and PDBs from symbol stores. But, it currently does not output the checksum info, which means that if an .etl trace is transferred to another machine for analysis (which is quite common), the checksum info won't be available, which means NuGet's symbol store is unusable.
I noticed in the roadmap document that @loic-sharma linked above, that the addition of the PDB checksum was added by the compiler team at NuGet's request. It might make sense to request whatever team is in charge of the ETW part to get that to output the checksum info as well; that would allow this to work for all ETW scenarios (though clients would unfortunately still need custom code to deal with NuGet's symbol server).
The attack vector would be that someone publishes a package with DLL but does not publish PDB for it. The attacker notices this and publishes PDB with the right signature (the one that's written in the DLL debug directory) but malicious content. Existing clients who do not validate the checksum would receive and open this PDB.
Ah, thanks for the info! That makes sense.
Consider this "resolved" then :-)
Most helpful comment
The attack vector would be that someone publishes a package with DLL but does not publish PDB for it. The attacker notices this and publishes PDB with the right signature (the one that's written in the DLL debug directory) but malicious content. Existing clients who do not validate the checksum would receive and open this PDB.