Openjdk-infrastructure: build-azure-win2008r2-x64 / build-softlayer-win2012r2-x64 nightly curl failures

Created on 24 Feb 2020  路  23Comments  路  Source: AdoptOpenJDK/openjdk-infrastructure

origin  https://github.com/adoptopenjdk/openjdk-jdk8u.git (fetch)
origin  https://github.com/adoptopenjdk/openjdk-jdk8u.git (push)
jdk8
origin  https://github.com/adoptopenjdk/openjdk-jdk8u.git (fetch)
Resetting the git openjdk source repository at /tmp/openjdk-jdk8u-windows-x86-32-hotspot/workspace/build/src in 10 seconds...
Pulling latest changes from git openjdk source repository
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

on build-azure-win2008r2-x64-2 and build-azure-win2008r2-x64-1.

bug windows

Most helpful comment

Thanks @Willsparker 鉂わ笍

All 23 comments

Also fails for jdk8/windows-x64/hotspot builds

This appears to be a git issue : https://stackoverflow.com/questions/38618885/error-rpc-failed-curl-transfer-closed-with-outstanding-read-data-remaining

It'll may be due to a slow internet connection on the machine overrunning a default git timeout - that would explain the intermittent nature of it.

Thanks @Willsparker . In that case, if we increased the git timeout settings on the machine, it might resolve the issue (the default timeout is around 5mins). https://www.git-scm.com/docs/git-config/1.7.8#git-config-httplowSpeedLimithttplowSpeedTime

@sxa555 do you have access to this machine to try increasing the timeout to 10mins?

Alternatively, if we can reproduce the error outside of the build script, i.e. git clone https://github.com/adoptopenjdk/openjdk-jdk8u.git you can set these environment variables to get a better idea of what's breaking:

GIT_TRACE=1 GIT_CURL_VERBOSE=1

Right, I've set the http.lowspeedtime to 600 seconds (using git config --global https.lowSpeedTime 600). I timed the git clone https://github.com/adoptopenjdk/openjdk-jdk8u.git and it ended up being ~ 4 mins 15 seconds, which could end up being over 5 minutes with the overhead of running the whole scripts and Jenkins, however we'll see if this has fixed the issue - I was unable to reproduce it on the machine unfortunately.

Last night's build failed on -1 as well ... great. I'd like to first ensure the fix for -2 fixed the issue before putting it onto -1 as well.

On the bright side, I've managed to recreate the issue...
at /tmp/openjdk-jdk8u-windows-x64-hotspot/workspace/src and running git pull .

The -v option on git pull doesn't add any additional info.
I've found that it only errors on git pull, not cloning the repo, so it may be worth removing the repos, and seeing if the new repo has the git pull error as well.

Okay, so, I've done the suggestion that many an online resource was suggesting and put both of the parameters https.postBuffer = 524288000 and http.postBuffer = 524288000 on it, and the git fetch works on both of them locally - as this is an intermittent issue, I don't know if this has officially fixed it.

If the issue carries on, I'll make sure to remove those configs as well, as I don't want to mess with default values if I can help it.

Unfortunately setting the environments haven't worked so I've unset the above variables.

Ref : https://github.com/AdoptOpenJDK/openjdk-build/issues/1236

I've rename the /tmp/openjdk-jdk8u-windows-x64-hotspot/workspace/build/src to ../src-1169 and cloned a new copy of the openjdk-jdk8u repo, on both machines. If that works, I'll delete the old copies of the repos later on.

@Willsparker FYI, build-azure-win2008r2-x64-1 needs the same fix as -2. It failed last night with the same issue https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/509/

I renamed the directories on both so I suppose that didn't fix it

Thanks @Willsparker 鉂わ笍

@M-Davies Sorry... Issue's recurring. Here's what I can gather:

  • build-azure-win2008r2-x64-1 appears to only be failing when running the jdk8u-windows-x86-32-hotspot job, not the jdk8u-windows-x64-hotspot job. I've already removed the x86_32 repo.

  • build-azure-win2008r2-x64-2 seems to not be connecting to the Jenkins agent.

  • build-azure-win2008r2-x64-1 appears to only be failing when running the jdk8u-windows-x86-32-hotspot job, not the jdk8u-windows-x64-hotspot job. I've already removed the x86_32 repo.

It appears to pull the repository fine within grinders https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2465/console. Since you removed the x86_32 repo, I take it it was fetching not pulling?

  • build-azure-win2008r2-x64-2 seems to not be connecting to the Jenkins agent.

That must be a recent problem. It connected fine last week https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x86-32-hotspot/498/console

Connection issue is fixed - the Jenkins service didn't want to restart. Disabling and enabling the machine on ci.adoptopenjdk.net and then starting the service got it going again.

The build scripts say it's fetching, but both git fetch and git pull within a failing repository will error with the same issue.

@Willsparker - so this can be closed now?

Sadly not - I've been looking at using git-for-windows instead of Cygwin's Git for the machines instead, however that causes it's own problems :

ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/AdoptOpenJDK/openjdk-build.git
    at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:909)
    at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1131)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1167)

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/527/
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/528/

It seems the the hotspot-x64 repository has worked a couple of times in a row - I'll keep an eye on it. The hotspot - x86_32 repository however has still been consistently failing.

Can you summarise the current situation on this one please @Willsparker as we need this to be reliable for the quarterly release in a couple of weeks - can it be replicated easily or is it still random enough that it can't be progressed, and do we have any solution or any other ideas to progress this?

Okay:
The build-azure-win2008r2-x64 machines are currently affected, however they're soon to be replaced so I'm not so concerned about those. The concerning issue is build-softlayer-win2012r2-x64 was displaying this issue intermittently- I just checked and it appears this doesn't seem to be the case anymore. I wasn't able to recreate it locally on that machine, but on the 2008 machines, it occured when running git fetch / git fetch --tags, or git pull on the sources repos at /tmp/openjdk-jdk8u-windows-*/workspace/build/src
So whilst nothing has been done specifically to fix it, it hasn't affected us in the last few days given https://github.com/AdoptOpenJDK/openjdk-build/pull/1638 has been merged. I'm going to close this issue on that basis, however, if this affects us in the future I can see 2 easy-ish workarounds:
1) Change the build scripts to use ssh to run the builds ( ref: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1585572400036400?thread_ts=1585568882.031500&cid=C53GHCXL4 ), and change the build machines (and playbooks) to accommodate this.
2) Change the build scripts to reclone the source repositories at the beginning of each build (ref: https://github.com/AdoptOpenJDK/openjdk-build/issues/1641 ). In my experience the repos in question wouldn't fail when being re-cloned, however I couldn't tell you why.

Neither of these will fix the issue, as I believe the issue is due to networking issues with the boxes, according to the Stack Overflow entries mentioned earlier.

Was this page helpful?
0 / 5 - 0 ratings