We're running up against the max build time allowed. Ideas for shortening so the tests can run to completion again:
AFAICT you'll forever be snookered by this issue without breaking the build steps out of the make files, to take advantage of the parallel build functionality.
The extra benefit of doing this is that same parallel structure can be used in Shippable - this would give you some redundnacy in the build infrastructure.
Dealing with the partial dependency tree has made me shy away from exploring parallelization options and favoring tricks to shorten the build where I can.
For example, when someone makes a change to base-notebook, we need to build/test minimal-notebook next before any other image. After minimal is built, we can parallelize r-notebook and scipy-notebook. Once scipy is done, we can then parallelize datascience-notebook, tensorflow-notebook, and pyspark-notebook. Finally, once pyspark is built, we can build all-spark-notebook.
Do you have experience setting up such a flow in Travis (or elsewhere)? Have an estimate of what it would take to do so here? I'd love to hear about it if you do!
@parente I'm currently building a port of the jupyter docker containers to rkt over here. The .travis.yml isn't the most elegant but you'll see the idea in matrix.
Travis runs 5 jobs at the same time.
The first 'job' builds the base and minimal notebooks.
The rest of the jobs build only one container.
Apologies for the red state I'm in the middle of switch from rkt fetsh + rkt image export to skopeo + docker2aci that the halve the current image sizes back closer to what you have with docker.
This is all temporary since - time permitting - I'd like to build the oci containers with Buildah and Alpine Linux. At the moment I have the base-notebook built this way, just need to push the code - let me know if your interested in seeing the wet paint.
@taqtiqa-mark thanks for sharing that! I've given it a glance and will read it more detail later outside work. Based on your description here, and what I see in the yml, we'd have to figure out how to wait on the base-notebook build in the matrix before proceeding with the next (and so on) in order to fully test the impact of a change in one of the upstream images (e.g., base-notebook) on a downstream one (e.g., all-spark-notebook).
Yup, you're starting to feel the pain of docker ;) "It seemed like a good idea at the time"
Anyway, acbuild is being replaced by Buildah which is getting mature. Thats the way out of docker land and into the world of containers ;)
Oh, in case you stay within Dockerfiles, you probably want to shift the CI to Shippable and their workflows. Jobs should get you there, http://docs.shippable.com/platform/workflow/job/overview/
With the Docker image dependency-graph in-mind - if each upstream image (starting with base-notebook) were able to be built/tested and pushed to Docker Hub (but maybe not tagged yet) - would that be an acceptable build-strategy, even on Travis? That might allow build stages to be used.
I wonder if the recent CNCF Buildpacks project could be useful here... It lets you cache different languages separately and compose the layers without invalidating the whole image: https://github.com/buildpack/spec/blob/master/buildpack.md#phase-4-export
FWIW, we haven't had a build failure due to timeout here in a while. Since opening the ticket, we've also documented how people can setup community stacks and started pushing for more use of recipes and separate images in other repos than continually adding more and more to these already large images.
Maybe an unpopular opinion, but we might want to call this issue moot and just continue to maintain what exists.
@parente Let's close this for now. If it becomes problematic again, we can try adding Azure Pipelines for testing too.
Of course, right after I said it, the daily Travis job failed for the first time in months due to a build timeout. Still, it's a rare occurrence and Travis is only used to test PRs. Docker Hub, which builds releases off master, has support for image caching and has not to my knowledge suffered from any time outs.
Closing this out as @willingc suggests for the time being.