Arcade: Arcade builds failing due to file signed with expired certificate

Created on 22 Apr 2020  路  22Comments  路  Source: dotnet/arcade

First Responder

All 22 comments

Some more relevant information:

  • The file in question is in the tools\netcoreapp2.1\Microsoft.Rest.ClientRuntime.dll from the Microsoft.DotNet.Build.Tasks.Feed package.

  • If I download the Tasks.Feed package of the failing build and inspect the relevant DLL I see that it was signed with an expired certificate. Today, SignCheck correctly flags that.

  • If I download the Tasks.Feed package of a succeeding build from a few days ago and inspect the relevant DLL I see that it was already being signed with an expired certificate. However, SignCheck.exe by then was INCORRECTLY reporting that the file was correctly signed.

My conclusion is that this change in SignCheck affected it in a way that now it captures the problematic DLL: https://github.com/dotnet/arcade/pull/5319

/cc @chcosta @joeloff as they might have insight about the recent changes and SignCheck behavior.

No changes to SignCheck from my side. Having an expired certificate is not the problem as long as the signing occurred during the validity period of the certificate. For example, if the file was signed after 2018-09-07, then the signature cannot be trusted.

The error does indicate that the certificate was revoked.

[File] 5a73739ed6565525a4bb2bf43f147bad.dll, Signed: False, Full Name: tools/netcoreapp2.1/Microsoft.Rest.ClientRuntime.dll [Error] HRESULT: 800b010c (A certificate was explicitly revoked by its issuer)

That might indicate that there is an underlying problem with the certificate itself. Normally, expired certs for files that were already signed won't be an issue

Can we check the certificate store on the machine that did validation. If someone manually revoked the certificate then this can occur. I ran signcheck locally against the same file and it's reporting it as valid.

@joeloff - After our talk I checked and the validation is executed on AzDO hosted agents. I don't think we have access to that. Moreover, the check failed in two different agents:

Agent name: 'Azure Pipelines 103'
Agent machine name: 'fv-az689'

Agent name: 'Azure Pipelines 32'
Agent machine name: 'fv-az677'

I retried two failing builds to see if they execute in different agents and what results we get. I can try and patch the build definition YAML so we can test this in one of our build pool machines and see what happens.

Thanks. One machine failing consistently I can understand, but multiple machines seem odd. Another option (don't know if we can do it), is force the validation to run on another machine other than the two where it failed.

The retries ran in one other agent:

Agent name: 'Azure Pipelines 17'
Agent machine name: 'fv-az674'

I'll patch the YAML that has the validation to see if I can make it ran in one of our machines.

I agree with Jacque's theory. This is odd. Is it possible to try a non Azure pipeline machine and see if it repros or not? Like, a machine from one of the MB pools? Per Jacque's theory, it's quite possible the Azure pipeline machines are getting into a bad state.

Yeah, I'm looking into that.

Is there a chance the DevOps machines are created from a bad image perhaps?

It's quite possible yes. And/or a windows update is causing an issue or something like that. If we get confirmation that a MB machine seems to work then we'll need to loop in the ADO folks and let them know there is an issue with their image.

I started the two builds below where one execute signing validation in a hosted agent and the other in one of our build queues.

Agent name: 'Azure Pipelines 35'
Agent machine name: 'fv-az776'
Current agent version: '2.166.3'
Current image version: '20200416.1'
Agent name: 'NetCoreInternal-Pool 49'
Agent machine name: 'a0000T3'
Current agent version: '2.166.3'

The validation passed when executing in one of our internal agents. I'll see to get the AzDO folks in the loop.

If you do, can we ask them to check the certificate store on the agents just so we could rule out manual revocation, especially if it was done on an image that's being used to stand up new agents

Another thing to try to help narrow this down is would it be possible to stand up a brand new clean machine with all security patches and everything and see if it repros? Just wondering if "newer" or clean machines are also an issue here when our "dev" machines could be dirty and may not repro the issue?
Do you have a binary I could look at as well?

AzDO support identified the issue on their side and suggested a workaround. Using the workaround the validation passes:

https://dnceng.visualstudio.com/internal/_build/results?buildId=614082&view=results

I'm double checking and if everything works I'll create a PR.

Hey Cesar. That is awesome!!! Thanks for reaching out to AzDO on this. I was also pinged by someone on another thread having the same issue. So, I was wondering if you might be able to give some more info on what the problem is that AzDO found? And, what the workaround looks like?

Hey Trevor, you can see what's the workaround in this PR: https://github.com/dotnet/arcade/pull/5338

They provided a Powershell script that should be executed before the validation.

Any idea when AzDO is fixing this?

They said a fix would be rolled out this week.

Waiting for PR to be approved.

Was this page helpful?
0 / 5 - 0 ratings