As introduced here: https://github.com/dotnet/arcade/pull/4625 , this change turns on SDL checks like Guardian by default (possibly intentionally) but we didn't have the service connection for Guardian defined on devdiv, so Roslyn's arcade-based builds broke.
I'm putting this up to be discussed in Arcade triage but basically we need to have some way when changes occur to make sure that any Service connections or variable groups (or literally any other shared resource I'm failing to list here) referenced in the eng\common folder need to exist everywhere Arcade bits flow.
Is this issue blocking (no)
Why no? AFAIK this is blocking all our signed builds.
Is this issue blocking (no)
Why no? AFAIK this is blocking all our signed builds.
you may change that a yes above if you like; I considered it non-blocking because it had a few-minutes, no-code workaround.
As introduced here: #4625 , this change turns on SDL checks like Guardian by default (possibly intentionally) but we didn't have the service connection for Guardian defined on devdiv, so Roslyn's arcade-based builds broke.
I don't think this was what introduced the problem. The Build Promotion Pipeline didn't require new service connections to work and the only related change in execute-sdl.yml was to make Run_SDL disabled by default. I think the issue might be that we didn't ran the SDL tools on devdiv yet? @sunandabalu might know better.
@MattGal told me that he created the missing service connection in DevDiv so I went ahead and queued a new build here: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=3438980&view=results @MattGal had already done that :-)
As introduced here: #4625 , this change turns on SDL checks like Guardian by default (possibly intentionally) but we didn't have the service connection for Guardian defined on devdiv, so Roslyn's arcade-based builds broke.
I don't think this was what introduced the problem. The
Build Promotion Pipelinedidn't require new service connections to work and the only related change in execute-sdl.yml was to make Run_SDL disabled by default. I think the issue might be that we didn't ran the SDL tools on devdiv yet? @sunandabalu might know better.~@MattGal told me that he created the missing service connection in DevDiv so I went ahead and queued a new build here: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=3438980&view=results~ @MattGal had already done that :-)
I don't think enabled or disabled is the deal here, in the case of the build I queued after adding the service connection the usage of it was indeed disabled. I still think it was this refactoring that brought the reference to the connection along and caused the issue... am I missing something?
@MattGal now I understand what happened here.
execute-sdl.ymlexecute-sdl.yml to always be evaluated by ADO parser.Now I got what happened :-)
So even though the SDL stuff wasn't actually running, AzDO is requires that the service connections be there regardless?
cc/ @mmitche @sunandabalu @jaredpar
Yes, this makes sense too since the conditions that disable a step from usage have to be evaluated at the last possible second (as, "previous step succeeded" is a very common check)
@markwilkie - That's correct.
The thing is that the type of the condition also change. Before we had a condition that was evaluated at template expansion time and then it got changed to a runtime condition..
Two questions then:
1) What are the next steps here to fix the root cause
2) Was there a breaking change from AzDO? (e.g. template expansion vs. runtime)
This was not a breaking change in AzDO.
I chatted with @riarenas about this a bit, and @jaredpar expressed frustration that Arcade is breaking signed builds a lot of the time. We have the arcade-validation repo builds specifically for this purpose, but we're still missing things. This comes down to a few things:
The second bullet should be one that we should be able to remedy in arcade-validation over time, but gaining that test coverage, given the complexity of official builds, will be time consuming and involve breaks in repos in the process.
Here is what I think we should do:
Mechanically, what it means is that arcade-validation will branch off the repos+branches we need to validate, update to the candidate version of arcade, and queue the build. The build must pass for arcade to move forward.
One additional notes: It's important that this not be a license for repos to use arcade in non-standard ways and get support for it.
/cc @shawnro @markwilkie @Chrisboh @tkapin
I'm sure we are, but I want us to make sure we're communicating directly with the people who are feeling this pain to let them know that we prioritizing and actively working on preventative measures so that these types of things don't keep happening.
This will add a considerable amount of time from the moment the arcade build finishes to the moment it starts flowing to the repos, I guess is a price to pay to warranty that the version which is flowing is clean enough but I think it would be good to let partners know
This will add a considerable amount of time from the moment the arcade build finishes to the moment it starts flowing to the repos, I guess is a price to pay to warranty that the version which is flowing is clean enough but I think it would be good to let partners know
We still have an option of promoting builds manually if we need. Most partners are taking daily updates, so they will not see a tremendous amount of difference. This is about quality first. We should then be working on our arcade-validation coverage, so that we can have a fast validation that we trust.
I added the details captured here to #4719
dnceng is different than DevDiv
Part of the way we can help is getting off of DevDiv. That's a goal we've been working towards and we're close to removing the biggest obstacle in our way of moving: OptProf training.
Unfortunately though in order to fix our OptProf issues we need consistent signed builds to insert into Visual Studio. The recent arcade publish issues have essentially blocked this work because it interrupts our signed build cadence.
arcade-validation does not have enough coverage for the breadth of scenarios that show up in official builds of real repos.
A point I've pushed several people on in person is whether we can actually truly close this gap. Publish validation is hard because there are just too many variables involved:
The question I've been asking myself are:
One idea we've been discussing is changing how we build. For repositories maybe official builds should focus on build only and not publish. The job of the official build would be to produce signed binaries in the form of build and pipeline artifacts. There would be a separate job, which was central to arcade, which was responsible for taking the artifacts and pushing them to all of our publish locations:
This would make our official builds significantly more reliable, reduce the validation burden of arcade and let us handle publishing issues centrally. It also has the benefit that when publishing goes down for a period of time no information is lost. The official build output is still waiting around in artifact form for the publishing process to pick up and continue working with.
This would make our official builds significantly more reliable, reduce the validation burden of arcade and let us handle publishing issues centrally. It also has the benefit that when publishing goes down for a period of time no information is lost. The official build output is still waiting around in artifact form for the publishing process to pick up and continue working with.
The build output is still waiting in the current case. Anyways, @JohnTortugo opened up an issue to implement this. I also think this is a good step forward, but except in cases where a repo cares less about the arcade publishing vs. VS insertion, and they are separate, this is likely to be similar to today:
Anyways, I think we should take both approaches. Separate out the publishing pipeline and add more repos to what we use for validation prior to promotion.
Another thing that we should think about is reducing the amount of dependencies in the build pipeline.
This particular issues was caused by not having a service connection for Guardian. Is the service connection for Guardian actually needed? Can it be eliminated?
Another thing that we should think about is reducing the amount of dependencies in the build pipeline.
This particular issues was caused by not having a service connection for Guardian. Is the service connection for Guardian actually needed? Can it be eliminated?
That's specifically what this is about:
Also, we are trying to move things out of the publishing stages altogether and into the build (meaning full .NET Core build) promotion pipelines, to avoid a lot of these issues altogether. We increased complexity in those stages because we had nowhere else to put validation, but that resulted in some major pain (e.g. symbol servers have problems)
What can we do in the immediate term to ensure we're not breaking partners in DevDiv that are actively trying to make progress towards dnceng? Or, is there some small price that partners in DevDiv can pay to achieve higher quality? @mmitche, how far away is inter-branch dependency flow? ie, Roslyn could take updates from Arcade into Roslyn@arcade-updates which publishes to some channel that, if successful, flows to Roslyn@master.
What can we do in the immediate term to ensure we're not breaking partners in DevDiv that are actively trying to make progress towards dnceng? Or, is there some small price that partners in DevDiv can pay to achieve higher quality? @mmitche, how far away is inter-branch dependency flow? ie, Roslyn could take updates from Arcade into Roslyn@arcade-updates which publishes to some channel that, if successful, flows to Roslyn@master.
That would be possible today, no extra features required besides some extra channels. I am reticent to say that it's on roslyn to validate this stuff though. I think the problem is most likely we won't convince everyone to move off of DevDiv. Nuget comes to mind. As I see it, we're supporting people in DevDiv today. If we want to make the statement that we don't support them there, then they could take the steps to get off of DevDiv, or add an extra valdiation step themselves. But AFAIK that's not the statement we have made, and thus putting the burden on our partners to validate is not a good approach.
Is this a lot of work or a large tax if they wanted to get more stable official builds?
To add an arcade-validation build in DevDiv? I don't think it's a lot of work. Especially since I think the other validation builds are potentially also worth doing. The statement being made is that arcade-validation is not good enough to not break builds in DevDiv OR dnceng.
Any input @jcagme ? Maybe this has been sorted based on the recent work in this area?
SDL related features in the release pipeline only include BinSkim. Source code checks need to be done during builds. Given this and assuming nothing else has been done around SDL in builds, I guess things should be the same as of Feb 4th.
Triage: Adding to the release epic as SDL is going to execute there.
Required infrastructure changes/update so SDL runs in validation and release pipelines are complete. Repo owners need to onboard to get the benefits but ATM we are not pushing on them to make the change and rather have them opt in when they have time
Most helpful comment
This was not a breaking change in AzDO.
I chatted with @riarenas about this a bit, and @jaredpar expressed frustration that Arcade is breaking signed builds a lot of the time. We have the arcade-validation repo builds specifically for this purpose, but we're still missing things. This comes down to a few things:
The second bullet should be one that we should be able to remedy in arcade-validation over time, but gaining that test coverage, given the complexity of official builds, will be time consuming and involve breaks in repos in the process.
Here is what I think we should do:
Mechanically, what it means is that arcade-validation will branch off the repos+branches we need to validate, update to the candidate version of arcade, and queue the build. The build must pass for arcade to move forward.
One additional notes: It's important that this not be a license for repos to use arcade in non-standard ways and get support for it.
/cc @shawnro @markwilkie @Chrisboh @tkapin