Origin: Build pod not running on cluster created using oc cluster up

Created on 16 Aug 2018  路  22Comments  路  Source: openshift/origin

I am currently unable to get a host of different applications to get successfully built and deployed on Openshift 3.10 clusters created via oc cluster up.
An example application is: https://github.com/snowdrop/spring-boot-http-booster.

The application gets correctly built and deployed on all Openshift 3.9 clusters, as well as "real" (ones that have been created via openshift-ansible) Openshift 3.10 clusters.

I have tried a host of different oc cluster up scenarios (straight up oc cluster up on a linux machine, Minishift, clean CentOS 7 VM) and they all fail.

Version

v3.10.0

Steps To Reproduce
  1. Create cluster via oc cluster up
  2. Clone application: https://github.com/snowdrop/spring-boot-http-booster
  3. ./mvnw fabric8:deploy -Popenshift -DskipTest
Current Result

It seems like the git-clone init container is stuck. The only logs it has are: Receiving source from STDIN as archive

Expected Result

The application should be deployed normally as is the case for all Openshift 3.9 clusters as well as "real" Openshift 3.10 clusters

Additional Information

Starting the build with an increased log level (done manually via
oc start-build spring-boot-rest-http-s2i --from-archive=/home/gandrian/projects/redhat/spring-boot-http-booster/target/docker/spring-boot-rest-http/latest/tmp/docker-build.tar --loglevel=5 --build-loglevel=5 -F) doesn't yield any results.
The build does start, but there is nothing in the logs.

Only when I do oc logs spring-boot-rest-http-s2i-5-build do I see Error from server (BadRequest): container "sti-build" in pod "spring-boot-rest-http-s2i-5-build" is waiting to start: PodInitializing.

After some time, the oc start-build responds with:

I0816 19:11:43.745383   20084 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "unable to wait for build spring-boot-rest-http-s2i-7 to run: timed out waiting for the condition",
  "reason": "BadRequest",
  "code": 400
}]
simaster

Most helpful comment

FTR, at least the 3.7 stream had two .z releases including significant bugfixes. I'd personally say this issue would also qualify.

All 22 comments

cc @ladicek

@openshift/sig-master

@geoand can we see oc describe output of that pod (and YAML version of that pod?) (oc get pods).

@mfojtik Here is the describe output and here is the json output of the build pod.

@geoand the build output suggests there was a build failure:

                        "message": "==================================================================\nStarting S2I Java Build .....\nS2I source build with plain binaries detected\nCopying binaries from /tmp/src to /deployments ...\ncp: cannot stat '/tmp/src/*': No such file or directory\nAborting due to error code 1 for copying /tmp/src to /deployments\nerror: build error: non-zero (13) exit code from registry.access.redhat.com/redhat-openjdk-18/openjdk18-openshift:1.3\n",

/assign @bparees

@mfojtik Let me check again, because I tried it multiple times and that might be the wrong output - meaning that it might be the output of one of my other attempts. Sorry for the invonvinience :(

@mfojtik Got the correct logs this time. Here the correct describe output and here is the json output.
Note that this output was captured a few seconds before the build failed. Once the build failed, the build pod disappeared so I couldn't retrieve the final output.

Also just to triple check, I ran the same start build command against a real Openshift 3.10 cluster and the build worked as expected

I see that the PR has been merged, :+1:

Do you have any idea when we can expect a new release?

Thanks!

We don't generally do bug backports to origin(now OKD). So the fix will be in the next 3.10 OCP errata release, and it will be in both the OKD 3.11 and OCP 3.11 releases. Generally the major version releases happen about once a quarter.

@bparees This is a blocking issue for us and users relying on Fabric8 Maven Plugin or S2I build as they will be blocked till you will release 3.10. This is not acceptable. Why don't you backport the fix and release 3.10.1 ?

@bparees This is a blocking issue for us and users relying on Fabric8 Maven Plugin or S2I build as they will be blocked till you will release 3.10. This is not acceptable. Why don't you backport the fix and release 3.10.1 ?

Not my call. But historically we have not backported fixes to origin releases except in extremely rare circumstances (generally only security issues).

@jwforres @smarterclayton

(even if i backport the fix to origin 3.10, there is no plan to create a 3.10.1 release to ship the fix)

@bparees Is there a way that tools that rely on binary builds can get around the problem when deploying to OKD 3.10?

@bparees Is there a way that tools that rely on binary builds can get around the problem when deploying to OKD 3.10?

the issue seems to be triggered when running builds that contain either a large number of files, or a set of files that have a large total size (not sure which). So if your repo is large enough to trip the issue, I don't think there is a way to workaround it that I am aware of.

@bparees Thanks for the insight!

@bparees Just for the record, the issue always occurs for us with small repos that produce java uber-jars.

@bparees Just for the record, the issue always occurs for us with small repos that produce java uber-jars.

yeah in your case the repo size is irrelevant, it's the jar that you're uploading to the build, and that's what matters for hitting this (the size of the content you upload into the binary build).

I am going to backport the fix, but i don't know when we can expect a new origin 3.10.x release that would include it.

@bparees Thanks for your help!

FTR, at least the 3.7 stream had two .z releases including significant bugfixes. I'd personally say this issue would also qualify.

Was this page helpful?
0 / 5 - 0 ratings