Version of your agent? 2.161.1
OS of the machine running the agent? Ubuntu 16.04.6 LTS
dev.azure.com
It appears as if version 2.161.1 (or 2.161.0) broke the System.DefaultWorkingDirectory environment variable, at least in bash steps. When looking at its value inside a script, it is set to /data/azure/azure-agent/_work/10/s/, which is the path on the hosting machine. I believe it should be set to /__w/10/s, which is the path inside the agent. This used to work on 2.160.1, and our agents were updated overnight and multiple jobs stopped working.
We have a workaround, as it looks like the SYSTEM_DEFAULTWORKINGDIRECTORY environment variable is set properly, so we'll update our scripts.
I'm also surprised that our agents were automatically updated by dev.azure.com despite that version being tagged as pre-release.
The broken step looked like:
node $(System.DefaultWorkingDirectory)/azure/update-versions.js
We were able to work around the bug by replacing it with:
node ${SYSTEM_DEFAULTWORKINGDIRECTORY}/azure/update-versions.js
If there is a specific section of the log you are interested in, let me know and I'll post it, but that log contains a lot of sensitive information, including our build steps, so I'm not willing to post it in it's entirety.
Work around doesn't seem to be working for stuff like the publish step: https://docs.microsoft.com/en-us/azure/devops/pipelines/artifacts/pipeline-artifacts?view=azure-devops&tabs=yaml#publishing-artifacts
Still looking into solutions, this is a major issue for us as we are unable to release to production.
We're affected by this as well. There's a community thread: https://developercommunity.visualstudio.com/content/problem/844025/container-build-jobs-paths-are-incorrectly-mapped.html
Our builds were working fine on 2.160.1, so it looks like 2.161.1 is the culprit.
Our agents are automatically upgrading to a release that is marked as a pre-release - I'm not sure if that's the expected behavior.
Also I'm not sure if there is a way to disable automatic upgrades as that would allow us to downgrade.
So some updates. We were successful in downgrading the agent to 2.160.1 and hacking our way around the auto-updating by messing with permissions, but it appears as if the capability requirement is set at 2.161.1 on our pipelines, so now we can't find agents with the right capabilities. Is there a way to override this requirement?
I cannot stress enough how bad this problem is for us. Our whole CI/CD infrastructure is down, and we haven't heard a word here in 4 hours, nor in the community thread linked above.
Same thing here. This is a major issue!
We rolled back the agent version on the server. But since the version went backwards, self-hosted agents will not be automatically updated. We will root cause the issue and get out a real fix for it ASAP. Any additional information you can provide would be helpful.
@jpricket thanks for that, we were able to make it work with your change yesterday.
Once that fire is extinguished, I think there's a larger conversation to have about giving users control over the version upgrades. It's not only that the new version broke something. It's that it broke something at a very bad moment for us, and that we had no way of rolling back the changes.
We have identified the bug that caused this issue and are working to get a new release out ASAP.
To comment on 2.161.1 marked as pre-release issue, as we roll out new versions of the agent, a version gets marked as 'release' once it reaches a certain threshold of customers. We are updating our process to mark as 'release' sooner.
In parallel, we are working on two things to help us respond faster to these types of issues in the future:
One, is a better rollback mechanism for when we detect an issue with the agent during rollout.
The second, is improved controls for hosted agents to manage their agent version. This is a complicated issue and will take some time to get right. The specification for this is happening here: https://github.com/microsoft/azure-pipelines-yaml/pull/399
@jtpetty thanks for your feedback. I commented on that PR since I believe we are neither customer "N" nor "K". Hope this helps.
@jtpetty @jpricket Builds just broke again with version 2.162.0. Please fix ASAP.
@jbblanchet - can you point me to the pipeline having this issue?
Organization ompnt, pipeline pi-server.
@jbblanchet - I do not appear to have access. Can you send me a log?
Exact same message as last time about the directory not found.
I can send you a log in private, or I can give you temporary access to our org. Is your email your github user name @microsoft.com?
my email is tommy.[email protected]
Sent. I appreciate the quick feedback.
Thanks for the logs. We are going to roll back 2.162.0 right now and try and debug this failure case.
This may actually be a separate issue from the one we addressed above and may be specific to publishing artifacts.
I'll be honest, whether the root cause is the same or not is not that important to us. The effect on the build is the same, and it's twice in 7 days that an update has broken our builds, so it points to issues with QC and availability.
Do we need to downgrade the agent versions on our server manually? Do we need to block auto-update?
@jbblanchet - Agreed an all of this. I want to get you back up and going as fast as possible. We have initiated a rollback of Agent 2.162.0. Once that is complete, you will be able to manually downgrade the agent on your servers. I will post on here when that is process is complete.
@jtpetty thanks for the help.
@jbblanchet - just as an update, we are in the process of creating agent version 2.163.0 which will be the same bytes as 2.160.1 and will auto update on your side.
@jbblanchet - 2.163.0 is now available on GitHub. You can manually upgrade to that version or you can wait for our rollout and it will do it automatically.
Also, I have triaged your failing case and am working on a fix. When I have something ready, I will message you and see how best to verify that this will no longer impact your team.