On CI, we'd like to start capturing binary logs by default. There is some concern about this, however, because this will capture the machine environment which may include secrets. Is there already a way to prune a binary log of certain strings or to mask the value of a given list of properties?
@KirillOsenkov any ideas?
No, unfortunately MSBuild just logs the entire environment as a list of strings, and we dutifully write that to the log... see related https://github.com/Microsoft/msbuild/issues/3432
How hard would it be to post-process a binlog file to mask known strings? I haven't looked into the format of the file closely enough to estimate how difficult this is.
Technically it should be doable - there's an API to read the raw records and to play back the event args:
http://msbuildlog.com/#api
Here's another idea - you can play back the .binlog into another BinaryLogger, and filter the secrets from the event args.
Should be fun and doable - please let me know if I can help with this!
Another thing: take the .dll from the Releases tab:
https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/download/v1.2.25/StructuredLogger.dll
The NuGet package is unfortunately out-of-date and I don't have time to update it...
Thanks @KirillOsenkov. Just wanted to know if this was possible. We're currently deciding whether or not to always produce a binlog in CI builds. If we decided to do that, we'll probably implement a tool to strip secrets from the binlog. I'll ping this thread again if I run into issues using StructuredLogger to modify a binlog file.
馃憢 @natemcmaster @KirillOsenkov
I'm very interested in masking or filtering .binlog files so a user can share them with another party while still maintaining _Good Feelings鈩. 馃槃
I have a OSS project/proof-of-concept that integrates .binlog files with the GitHub Checks API. This allows me to read build warnings and errors for files that are being compiled and write that data back to GitHub. This allows me to expose build warnings/errors, code analysis warnings/errors, roslyn analyzers in a GitHub Pull Request.

Please check it out: https://github.com/justaprogrammer/MSBuildLogOctokitChecker
Unfortunately I did not understand the the security risks in binlog files before I started. @KirillOsenkov thanks for making my project possible. I would definitely be interested in hearing your recommendation on making this securable.
@StanleyGoldman very nice! Thanks for sharing! If you tweet about it I'll retweet (although I already stole your thunder, sorry!)
I'm now convinced that this feature is necessary and so I've filed a bug here to track this work:
https://github.com/KirillOsenkov/MSBuildStructuredLog/issues/191
Thanks @KirillOsenkov I appreciate the support. Although I'm probably going to start tweeting about it after we come up with a better name for the project. 馃ぃ
I'm looking for other interesting data points that could be inferred from the binlog files. Things that would be informative to a person reviewing a pull request. Some ideas that come to mind (provided a i have a history of binlog files) are spikes in build time or nuget packages that have changed.
The GitHub Checks API also provides for a callback. I would have to learn more about Roslyn, but it would be possible to have Roslyn Analyzer Code Fixes being applied automatically through the GitHub user interface.
I'm sure you might have some creative ideas of your own. If you come up with any, drop us a line here: https://github.com/justaprogrammer/MSBuildLogOctokitChecker/issues/87
This came up again internally, and I had a thought that I don't love but that could make this possible:
Could we just build up a terrible regex in memory
(?:secret1|secret2|secret3)
and then replace the string that gets serialized into the binlog
That would require a "feed me all your secrets" configuration (on the command line? another, unlogged environment variable? something else?) which is definitely not ideal. Also generate a fair amount of garbage if the strings are common, but that feels like pay-for-play to me.
Most helpful comment
Thanks @KirillOsenkov I appreciate the support. Although I'm probably going to start tweeting about it after we come up with a better name for the project. 馃ぃ
I'm looking for other interesting data points that could be inferred from the binlog files. Things that would be informative to a person reviewing a pull request. Some ideas that come to mind (provided a i have a history of binlog files) are spikes in build time or nuget packages that have changed.
The GitHub Checks API also provides for a callback. I would have to learn more about Roslyn, but it would be possible to have Roslyn Analyzer Code Fixes being applied automatically through the GitHub user interface.
I'm sure you might have some creative ideas of your own. If you come up with any, drop us a line here: https://github.com/justaprogrammer/MSBuildLogOctokitChecker/issues/87