Travis' memory issue is a bit too much and our build there now takes more than 3 hours.
Might also look at BuildKite if we can get people to donate hardware. It's easy enough for me to set up, which means it's easy. An agent running on @larsrh's 3874629847653-core machine would be 🔥
@tpolecat That won't work, unfortunately. That machine is university property.
On 9 July 2018 19:37:21 CEST, Rob Norris notifications@github.com wrote:
Might also look at BuildKite if we can get
people to donate hardware. It's easy enough for me to set up, which
means it's easy. An agent running on @larsrh's 3874629847653-core
machine would be 🔥--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/typelevel/cats/issues/2319#issuecomment-403559680
from https://gitter.im/sbt/sbt-contrib?at=5b71c115988005174ed40110 :
This plugin _may_ be of use/worth a try https://github.com/dwickern/sbt-classloader-leak-prevention
@kailuowang I'm a bit out of the loop, I think. The build takes 3 hours? How long does it take locally? What is it doing? Thanks to working at SlamData, I have a remarkably vast swath of experience debugging slow Travis builds. I'd be happy to take a look if you want.
the 2-3 hours is the combined total of the builds, each job is typically 20-30 minutes - https://travis-ci.org/typelevel/cats/builds/415542808
And a lot of that time is coverage testing, tut testing, doc testing, site building and so on.
Ok taking a quick look at things, literally the first things that occur to me:
sudo: required? I'm relatively certain those VMs are slower. Is it just for codecov? See below.travis-publish.sh script goes to great lengths to push things all into a single SBT instance. In my experience, this is exactly the opposite of what you want to do when you have a slow build. Separate SBT processes, sequentially invoked, gives you better memory characteristics and is better understood by Travis (especially if you don't split the build script out of .travis.yml)..jvmopts uses -Xmx6g. This is problematic because Travis doesn't have that much memory! You should strongly consider dropping that option altogether and allowing it to be the default (ditto with -Xms), which will be scaled off of the reported system memory.secure variables so we know which one is which.I didn't look at SBT itself. Looks like a lot of the logic is in tasks, so that may also contribute.
The build script actually does invoke sbt multiple times, but for jvm we could split even more as per the js build - but the jvm issue normally happens relatively early in the build.
for the sudo - that is a slower startup but you get the 7.5 Gb memory, we could try a lower setting. ref https://docs.travis-ci.com/user/reference/overview/
Why is sudo: required? I'm relatively certain those VMs are slower.
sudo: required gets 7.5 GB as opposed to 4GB. http4s adopted it because the IO was untenable on the container builds, but that should be far less a factor in cats.
We should have a discussion about whether or not code coverage is actually worth anything. Frankly, I've never seen it provide any value whatsoever, and it doubles the duration of the JVM build.
:+1:
My main concern _before_ moving would be to ensure that it really is not our build at fault! one simple option is to add parallelExecution := false to the jvm settings, already in js
re scoverage times... be careful here. The scoverage tests _also_ run the scalacheck tests, but with larger parameters than js. And after a successful coverage run, the code is just rebuilt not tested.
So whilst coverage will always be slower, i doubt it's causing any issues. What we might want to do is try running the scoverage with very low parameters (just to get coverage) and then run the full scalachecks with no scoverage.
IMHO, keeping/ditching coverage is best discussed as a separate issue
@djspiewak thanks so much for helping. And @BennyHill thanks for answering some of the questions.
To answer your questions above.
re the parallelExecution := false idea, this came up the other day on the scala native channel - https://gitter.im/scala-native/scala-native?at=5b6d631fa6af14730b170260
Finally, re the "separated build scripts" this was orignally done as per the ci docs,
But of course, that was a while back , so perhaps we can revisit that
And finally, finally.... one small advantage of separate build script is that it's far easier to "run* from the command line without having a local travis - see https://github.com/typelevel/cats/blob/master/scripts/travis-publish.sh#L17-L18
If you drop sudo: required it would be a good idea to add -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap to ensure that the heap size is set according to the container's memory limits
If a decision is made to move to circleci let me know as I would gladly help. Have used circleci last few years exclusively.
Are there any other alternatives being considered?
I have also heard really good things about Semaphore and BuildKite, although BuildKite requires its own infrastructure (I can highly recommend packet.com for that) and Semaphore's OSS policy seems to have mysteriously become a "Please email us if you are an OSS project" policy
I had a look at this a few days ago and my overwhelming impression was that it's hard to define a build matrix in a nice way in every one of the hosted services other than Travis. It's possible in Circle CI but relies on YAML dictionary operations rather than being a construct in its own right.
Whether that's a problem or not depends on how much faster (if at all) the builds run on those services IMO 😀
Thanks guys. We haven't seriously looked at the any of the alternatives yet. But we probably should soon given the elevated uncertainty in Travis future and it's suboptimal reliability lately. An easier migration from Travis is a nice to have, reason being if we have to switch yet again, it's slightly more likely to find another service somewhat confirm to the Travis way. How easier to set up a trial on circle ci?
I'd be glad to give a few different services a go and report back @kailuowang?
@DavidGregory084 that would be amazing. Thanks!
There's something interesting

about testing new CI systems

that brings out all the weird bugs 😄
Guys, I've opened a few PRs which demonstrate the config required to use different services.
I evaluated CircleCI too but I found that the container memory limit of 4GB was just not enough to run cats builds reliably. I found the configuration to be quite verbose, and I also had issues where the config validation in the CircleCI CLI disagreed with the service itself and my build didn't run after passing validation locally.
These services do experience intermittent failures with builds, but they all seem to be caused by a single flaky test (ApplicativeTests.monoid.combineAll).
I think we should focus on fixing that whatever we decide to do about CI in the future.
So far my instinct is that Drone.io is probably the best option as it is free for open source, easy to configure and super fast.
Semaphore has a very unclear open source policy and although Buildkite is very nice, I think that managing hardware in addition to the build itself could become a bit of a chore.
Thanks, @DavidGregory084 that's a lot of work. I will checkout their configs in your PRs , and take stab at ApplicativeTests.monoid.combineAll.
@DavidGregory084 Out of curiosity what specific memory related issues did you hit with CircleCI? Where you leveraging any of circle's parallel processing features?
@softinio you can see the config I used here. I tried using the cgroup memory limit detection (-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap), which didn't work correctly on CircleCI and resulted in the JVM allocating way too much memory. I also tried reducing the JVM memory allocation to 3.5G but I was still getting multiple jobs on each build killed by the CircleCI infra (Exited with code 137). You can see some example runs here.
@softinio it seems like exceeding 4GB of available memory requires using a paid plan; as an open source project we could probably use the resource_class: large if we contacted CircleCI support.
Update on this: Semaphore would like to donate Cats 8 bare metal performance agents for cats CI. In my tests, it cuts Cats’ build time by half. I think we should consider migrating to Semaphore, main reason being that we have so many TL projects on Travis all sharing 6 slow agents, it’s nice to have some more powerful CI resources.
Since nobody has worked on this for quite a while, I'm closing all old CI-related PRs.
Most helpful comment
Guys, I've opened a few PRs which demonstrate the config required to use different services.
I evaluated CircleCI too but I found that the container memory limit of 4GB was just not enough to run cats builds reliably. I found the configuration to be quite verbose, and I also had issues where the config validation in the CircleCI CLI disagreed with the service itself and my build didn't run after passing validation locally.
These services do experience intermittent failures with builds, but they all seem to be caused by a single flaky test (ApplicativeTests.monoid.combineAll).
I think we should focus on fixing that whatever we decide to do about CI in the future.
So far my instinct is that Drone.io is probably the best option as it is free for open source, easy to configure and super fast.
Semaphore has a very unclear open source policy and although Buildkite is very nice, I think that managing hardware in addition to the build itself could become a bit of a chore.