Hi,
I build a freebsd ci: https://julia.iblis.cnmc.tw
I need someone help me to
repo:status
scope.and the checking will available like this: https://github.com/iblis17/julia/pull/3
Does it pass?
@iblis17 Thanks for working on this! It would be great to get this working. What sort of resources is this running on?
@tkelman No... tons of test cases failed. please check out this https://julia.iblis.cnmc.tw/#/builders/1/builds/3
@JeffBezanson It is an old rack server, Dell R710, in my compus lab... I can use it until I finish my master degree (about one year later
I can't connect to the URLs you've posted. Not sure why. Is that happening for anyone else?
@iblis17 What options have you enabled for building? Is this FreeBSD 11.0-RELEASE?
I can't connect to the URLs you've posted. Not sure why. Is that happening for anyone else?
hmm, DNS issue (i use this domain for github webhook and works fine) ? or you can try this https://140.113.31.207/
I use the letsencrypt's SSL certificate, will it cause connection problem?
What options have you enabled for building? Is this FreeBSD 11.0-RELEASE?
Make.user : https://gist.github.com/iblis17/b4dca8221dcb96efcccde24f9cc2fa8d
OS: FreeBSD ionic 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r313193: Sat Feb 4 17:46:46 CST 2017 root@ionic:/usr/obj/usr/src/sys/GENERIC amd64
Are the build logs publicly accessible as they are for Travis and AppVeyor, our other CI services?
Yes, it's public.
The web ui of buildbot 0.9 requires websocket. Any error message on your browser's js console?
freebsd has an ilp64 system openblas? that's surprising and likely to cause problems unless its internal symbols have been renamed
Any error message on your browser's js console?
Nope, just doesn't connect.
freebsd has an ilp64 system openblas?
Where did you see that? OpenBLAS has to be installed from a port, and it doesn't look like the OpenBLAS build flag INTERFACE64
(which enables ILP64) is enabled by default.
On this system, openblas INTERFACE64
flag is enabled.
$ pkg info openblas-0.2.19,1
openblas-0.2.19,1
Name : openblas
Version : 0.2.19,1
Installed on : Mon Feb 13 10:43:43 2017 CST
Origin : math/openblas
Architecture : freebsd:12:x86:64
Prefix : /usr/local
Categories : math
Licenses : BSD3CLAUSE
Maintainer : [email protected]
WWW : https://github.com/xianyi/OpenBLAS
Comment : Optimized BLAS library based on GotoBLAS2
Options :
AVX : on
AVX2 : on
DYNAMIC_ARCH : off
INTERFACE64 : on
OPENMP : off
Shared Libs required:
libquadmath.so.0
libgfortran.so.3
Shared Libs provided:
libopenblas.so.0
libopenblasp.so.0
Annotations :
Flat size : 78.5MiB
Description :
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
OpenBLAS is an open source project supported by
Lab of Parallel Software and Computational Science, ISCAS.
NOTE: If you want to specify your CPU microarchitecture manually,
please use TARGET_CPU_ARCH knob, e.g., "make TARGET_CPU_ARCH=NEHALEM".
This value is set TARGET build flag.
WWW: https://github.com/xianyi/OpenBLAS
Any error message on your browser's js console?
Nope, just doesn't connect.
@ararslan hmm ... may traceroute julia.iblis.cnmc.tw
give you an idea?
If you do that without renaming the symbols, numpy and most other programs that use blas aren't going to work properly.
Seems openblas in ports do not rename the symbols?
I will disable INTERFACE64
and build julia again.
Julia can handle it, but a lot of other software can't. Just a warning about that option.
I add the missing arpack-ng and rebuilded: https://julia.iblis.cnmc.tw/#/builders/1/builds/9
One of test failures related to file
still exists: #8078
traceroute
shows that the connection gets to TWGate, gets forwarded around a bit, then stalls for a long time before quitting.
hmm TWGate
is my ISP's ISP...
I built a reverse proxy: https://julia1.iblis.cnmc.tw/
Please give it a try.
The owner of julia1.iblis.cnmc.tw has configured their website improperly. To protect your information from being stolen, Firefox has not connected to this website.
@ararslan (Y) I use my self-signed ssl certificate on julia1.iblis.cnmc.tw
. so... seems it does connect? I will migrate nginx setting from julia.iblis.cnmc.tw
to julia1.iblis.cnmc.tw
and change the DNS record later.
@ararslan the DNS record changed. Please try again: https://julia.iblis.cnmc.tw
Thanks, now it works for me.
Looks like https://github.com/JuliaLang/julia/issues/20798 and one of the recently added libgit2 tests are problematic?
@tkelman seems libgit2 tests are fine after build 510 now.
Ah okay, maybe I was looking at something from right before https://github.com/JuliaLang/julia/pull/21220
Oops! build 514 (6659b59fee30232c77cefbba1848b958041f5c61) popup the libgit2 error again.....
The CI worker is using system libgit2 -- libgit2-0.24.0
@tkelman I guess I hit this problem: https://github.com/libgit2/libgit2/pull/4169
My personal ~/.gitconfig
is a symbolic link...
I replace it with a normal file, and gmake test-libgit2
passes now.
Now, the build pass, thanks to @ararslan 's fabulous work. :)
Time to consider to turn on a webhook?
There are two worker for the build task, one owns both buildbot master and a worker on it, it's the faster one.
And the other one is an old IBM x3650 machine (10 years-old :p), it takes about 3 hr to finish a build. Is 3 hr acceptable for PR build?
Are they running FreeBSD 11.0-RELEASE on x86-64? No, 3 hours is way too long for a PR build; for reference, the Travis and AppVeyor builds typically take roughly between 45 minutes and 1 hour 15 minutes. Is it possible to cache the dependencies as Travis and AppVeyor do? That is, ensure the deps/ folder persists across builds.
@ararslan It's on an old -CURRENT
(https://julia.iblis.cnmc.tw/#/builders/1/builds/1479/steps/0/logs/stdio)
No, 3 hours is way too long for a PR build
ok, I will try to keep deps/
. (not sure how to make it
Would it be possible to use RELEASE rather than CURRENT? That ensures that any OS development changes don't affect the build. (Plus it should make maintenance of the machines easier, I'd think, since you can just use freebsd-update
to install system patches.)
You're using Buildbot for this, right? You might want to check the Buildbot documentation. It _should_ be possible, but I don't know enough to say for sure.
Would it be possible to use RELEASE rather than CURRENT? That ensures that any OS development changes don't affect the build. (Plus it should make maintenance of the machines easier, I'd think, since you can just use freebsd-update to install system patches.)
I think I can setup a 11.0-RELEASE jail as a buildbot worker. (most of my machines are -CURRENT
:p)
Seems there is no simple way to cache a dir.
Can I rely on gmake clean
and set method=clean
in buildbot? (http://docs.buildbot.net/latest/manual/cfg-buildsteps.html#git)
gmake clean
won't touch the dependencies, though unless you're wanting to verify that the from-scratch build keeps working (which isn't necessary on every single commit or PR, just the ones that touch the build system in non-trivial ways) you can usually leave dependencies alone. gmake cleanall
will make the dependencies at least re-run their individual gmake install
which can be necessary once in a while but isn't a totally from-scratch build. gmake distcleanall
will delete all dependencies and rebuild them from nothing
I configured 11.0-RELEASE workers, and applied gmake cleanll
before all of the builds.
I just ran several builds to check the stability.
But found a test case in file.jl
randomly fail, even on different worker [1][2].
LoadError: SystemError: opening file /tmp/juliaZ4oVpm: Interrupted system call
...
while loading /home/julia/ci/worker/12cur-amd64/build/test/file.jl, in expression starting on line 1134
The failed case refers to #13559
Were you getting that on 12.0-CURRENT as well, or only 11? I haven't seen that on 11.
@ararslan the [1] is on an 11.0-RELEASE phsical machine.
[2] is in a 11.0-RELEASE jail.
And... I cannot reproduce it by running test cases manully.
Hm, that's rather bizarre. I wonder if it's anything like our spurious InteruptException
on macOS.
(I'm going to make a loop to run gmake test-file
I still cannot reproduce that error. I ran 1000+ times gmake test-file
from my shell already...
Have you been running that on the same machines that are running the CI service or are you running it locally? It could be something specific to the machines.
I do those 1000+ test-file on the worker bsd-worker
...
[1] is worker bsd-worker
, it run on a 11.0 jail. The host is Dell R710 .
[2] is worker gaebolg
, it run on a phsical machine (IBM X3650), it's 11.0 also.
They are different machines.
I just blindly googled and found this:
https://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006778.html
Does it help?
(I need time to understand tons of terms revealed in that article...
Does it helpful?
Maybe, not sure yet.
So when you were running it before, you were seeing the error on both workers, but in repeated testing in a jail you're not seeing it? Have you tried repeatedly in isolation on the physical machine as well?
@ararslan I found a way to reproduce it on both BSD and Linux!
This script can pop the error up in 2 iteration on my machine.
https://gist.github.com/iblis17/b39e37071f3d816076e2770b338a8d07
(why (g)make test-file
can save it from crashing?)
@ararslan Could you confirm that?
Sorry, haven't had time to look at that much. Will take a closer look soon.
Ready to enable webhook?
All recent builds passed.
Which tests do you run? In particular, libgit2-online and pkg should both be failing after the libgit2 upgrade. For reference, Travis and AppVeyor test all, download, libgit2-online, and pkg.
I run gmake testall
(in line 3 of
https://julia.iblis.cnmc.tw/#/builders/1/builds/34/steps/5/logs/stdio)
I will add the orther testing... maybe i misunderstand testall
Ah, I see. I recommend taking a look at how Travis and AppVeyor run the tests. In particular, they do the equivalent of
julia test/runtests.jl all download libgit2-online pkg
I don't believe that download et al. are part of the testall
Make target.
I reconfigure my buildbot with: ['env', 'JULIA_CPU_CORES=6', 'gmake', 'testall', 'test-download', 'test-pkg', 'test-libgit2-online']
Now the test-pkg
failed as expected.
https://julia.iblis.cnmc.tw/#/builders/1/builds/38/steps/5/logs/stdio
good enough now?
We can try it unless anyone has any reasons not to - if it proves unreliable, slow, or burdensome to keep it working then maybe we can re-evaluate after a while, but worth getting more checks I think. How long should the server it's hosted on remain available for this purpose?
How long should the server it's hosted on remain available for this purpose?
You meaning how long we can have this service enabled? maybe 1 year ... (until I finish my master's degree
How long does it take to do an incremental build and test at this point? (You set it up to do incremental rather than fresh builds, right?)
About 40-50 mins on an increamental build.
I have 2 workers, the max len of running queue is 2.
Okay, that sounds great. +1 from me for turning on the webhook now.
hmm, just found that if one do rebase again before previous build finish, the old build will not be canceled. Is this the limitation of webhook? I'm curious about travis can handle this situation or not?
travis has a special feature for it. buildbot might have something you could turn on? otherwise we have a code snippet for fast-failing that queries the travis or appveyor api to see if any newer builds are queued for the same PR
Ok, I will investigate on it
(For reference, the build queue is here: https://julia.iblis.cnmc.tw/#/builders/1
Another feature that would be great is if you could implement build skipping. Currently someone can add [ci skip] to a commit message and that will keep Travis and AppVeyor from running on that commit. Would it be possible to detect that and skip running the FreeBSD CI if [ci skip] is given?
Akin to that, another useful feature would be to be able to skip just the FreeBSD build, just as we can currently skip the Windows build with [av skip] (though unfortunately Travis provides no equivalent for skipping macOS and/or Linux). Maybe something like [bsd skip].
Now ci skipping should work, I just implemented it:
[ci skip]
[skip ci]
[bsd skip]
[skip bsd]
Wonderful, thanks!
I tried it in #22821 but it didn't seem to work. Does buildbot have a built-in feature for that already?
@tkelman I am checking my code again...
No, it's not a built-in feature. PR is here: https://github.com/buildbot/buildbot/pull/3443
I think everyone's pretty happy with this, so closing.
Thanks again for this, Iblis! It's fantastic to have better support for FreeBSD, plus CI that runs way faster than any of the others.
Most helpful comment
Thanks again for this, Iblis! It's fantastic to have better support for FreeBSD, plus CI that runs way faster than any of the others.