sf 0.7-0 is now on CRAN; see resuts. We see a regression on r-devel-linux-x86_64-fedora-clang; see the end of this file:
> i = st_intersection(s) # all intersections
terminate called after throwing an instance of 'std::__1::bad_function_call'
what(): std::exception
This must come from code in geos.cpp; @dbaston do you have any idea what might be the cause? Does anyone have a suggestion how to run checks on this platform? r-hub?
I think I understand the issue; it's described here: https://stackoverflow.com/a/24600954/2171894
I can attempt a fix but it would be nice to test on the platform in question before committing. As it is, I can't even compile sf with clang locally after half an hour of fighting with autoconf.
Thanks! I can help testing either with r-hub, or in a docker container (if I can find/make one).
If you can point me to a Docker container that matches the CRAN environment, that would be really helpful. I think I will need to do some local testing make sure I really understand the problem.
I doubt that docker will help. I run Fedora 28, have built R-devel under clang:
$ R CMD config CC
clang
$ R CMD config CXX
clang++ -std=gnu++11
For now, with R-3.5.1, GDAL 2.4.0dev-358fa65bf3, released 2018/09/11 and gcc, I can't R CMD build master, with:
polygonize.cpp:112:8: error: ‘GDALContourGenerateEx’ was not declared in this scope
if (GDALContourGenerateEx((GDALRasterBandH) poBand, (void *) poLayer,
^~~~~~~~~~~~~~~~~~~~~
polygonize.cpp:112:8: note: suggested alternative: ‘GDALContourGenerate’
if (GDALContourGenerateEx((GDALRasterBandH) poBand, (void *) poLayer,
^~~~~~~~~~~~~~~~~~~~~
GDALContourGenerate
make: *** [/home/rsb/topics/R/R351-share/lib64/R/etc/Makeconf:168: polygonize.o] Error 1
ERROR: compilation failed for package ‘sf’
* removing ‘/tmp/RtmpeqJHBh/Rinst710e708ddba2/sf’
On clang/R-devel, sf_0.7-0 will not install with GDAL 2.4.0dev-358fa65bf3, released 2018/09/11:
clang++ -std=gnu++11 -I"/home/rsb/topics/R/trunk/builddir-clang/include" -DNDEBUG -I/usr/local/include -I/usr/local/include -I/usr/local/include -I"/home/rsb/topics/R/R-clang-devel-libs/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c polygonize.cpp -o polygonize.o
polygonize.cpp:112:8: error: use of undeclared identifier 'GDALContourGenerateEx'
if (GDALContourGenerateEx((GDALRasterBandH) poBand, (void *) poLayer,
^
1 error generated.
Both were caused by the GDAL 2.4.0 development version, that is, polygonize.cpp is not future-compatible and needs urgent attention (set up a Travis variant with development versions of GEOS, GDAL and PROJ). Now installing all the expletive dependencies suggested by sf (tidyverse essentially) to be able to get to the actual problem.
CONCLUSION
R CMD check --as-cran passes without error for sf_0.7-0 and master, for GEOS 3.7.0, GDAL 2.3.2 and PROJ 5.2.0 for my R-devel/clang build on Fedora 28; files (I bumped version number locally):
check_master.zip.
Have just pulled current GDAL 2.4.0 master, which seems to complete the transition of ContourGenerate to marching_squares, and GDALContourGenerateEx is in alg/gdal_alg.h now. Will report back here when checked (sf 3.7-0 and master).
R CMD check --as-cran passes without error for sf_0.7-0 and master, for GEOS 3.7.0, GDAL 2.4.0dev-adc21c19cb and PROJ 5.2.0 for my R-devel/clang build on Fedora 28; files
check_master_gdal-2.4.0.zip
Thanks! Do you have any ideas how we could reproduce the error seen on CRAN?
We'd need to know details of their F28/clang, R-devel is fixed, but we'd need their GEOS and Rcpp version details too. We'd get some of this from the full sf-Ex.Rout file (GEOS anyway). For me:
$ clang++ --version
clang version 6.0.1
CRAN has:
r-devel-linux-x86_64-fedora-clang r-devel Linux x86_64 Fedora 28 2x 6-core Intel Xeon E5-2440 0 @ 2.40GHz clang version 7.0.0; GNU Fortran 8.1 Details
This regression is now replicated by both OS-X platforms (also both clang):
r-release-osx-x86_64 r-release OS X x86_64 OS X 10.11.6 Mac Pro, Quad-Core Intel Xeon 2.93 GHz Xcode 8.2.1, clang 4.0.0, GNU Fortran 6.1
r-oldrel-osx-x86_64 r-oldrel OS X x86_64 OS X 10.11.6 Mac Pro, Quad-Core Intel Xeon 2.93 GHz Xcode 8.2.1, clang 4.0.0, GNU Fortran 6.1
The clang 7 release isn't due until Fedora 29 (target release now 31 October). I see some issues for clang 7 when linking to shared objects built with gcc 8.1.1 (here like GEOS?).
I can try to upgrade one of my systems on release day. I don't know that it's worth putting F29-beta into a container. Why is CRAN F28 using clang 7 (yes, trying to find problems early, so reasonable), but why does the problem surface in clang 4 but not in my clang 6? Something in Rcpp/GEOS? Could they re-check 0.6-3?
Do we know where the CRAN Fedora 28 GEOS comes from (I don't think this is the issue, as gcc F28 passes)? It could be clang 7 needing something more from gcc-built shared objects (but clang 4 also fails on OSX). Why is it only failing at that point?
@dbaston this bug fix did NOT make it into the sf 0.7-0 release on CRAN (despite my commit message). Can it be the cause of the trouble, in the combination with the unique pointer usage? (It did not cause trouble in previous, 0.6-3, where we did not have the unique pointers).
@edzer I don't think they're related. The error should come up when a unique_ptr up is default-constructed and then later gets a value associated with it with p.reset(some_pointer). In that case, the custom deleter never gets associated with up, and the bad_function_call comes up whenever up goes out of scope.
Trouble is, I don't see this situation anywhere in the geos.cpp code. We're always assigning a pointer and a deleter to a unique_ptr after creating it. So I can only guess that this is happening inside some of the STL code (sort, push_back, ??) but I need to see that locally to try and zone in on a fix.
Maybe clang7 is the ticket to reproducing it; I will try again this weekend build sf to build with clang. (The problem I ran into was that setting CXX to clang++ caused the GDAL test in autotools to not get the -std=c++11 flag, somehow.)
The CRAN F28/clang set-up is detailed at: https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang; maybe CXX='clang++ -stdlib=c++'? I didn't do this in my tests, will repeat.
And ... bang - with CXXSTD='-stdlib=c++', sf install fails on configure.
stdlib_c++_output.zip. Probably my abuse of R/trunk/config.site, same type of error for rgdal.
Compiled with clang 7 on Fedora 29 using the compiler flags linked by @rsbivand , and not seeing any failures.
Now with SHLIB_CXXLDFLAGS='-stdlib=libstdc++' no failures for clang 6 on F28 here.
I think this was accidentally closed by GitHub magic.
The docker file I just added confirms: with fedora 28/clang, the same GDAL/GEOS/PROJ as the CRAN check, I cannot reproduce the bug found on the CRAN platform. How shall we proceed?
Wait for F29 and clang 7? What about the -stdlib= argument? The closure yesterday could be the lengthy github outages, with action transactions only completed late last night CEST. At least the outages make me feel happier about R-Forge, github does outages too ...
Side comment - could I set up a no-code repo with rocker/travis etc. testing for rgeos and rgdal? I'd trigger the runs by committing something, for example? I'm seeing Debian/Ubuntu people who didn't read the R-sig-geo request to check workflows after changing exception handling to free memory better, but who run GEOS 3.5.1 binaries which give different answers to geos-config --cclibs and --libs, and where Debian/Ubuntu GEOS didn't simlink to libgeos.so from the libgeos-3.5.1.so and so on. Had they checked, or had I had access to an ancient system, I wouldn't need to update rgeos only days after releasing to CRAN.
No doubt you could, but I have no clue how to (i) set triggers from r-forge commits, (ii) check an R package in travis from a no-code repo. Moving the repos entirely to github would solve both problems, and also be more inviting for guest contributions.
Dummy repo, with travis job collecting tarball from R-Forge (R-Forge builds the source tarball). Manual trigger by touching and committing a file listing revision numbers (e.g. svn log output). I'd need to check that the tarball build had completed.
The CRAN OSX builds are failing with clang 4 (and I've tried clang 7 on Fedora 29), so I don't think waiting gets us anything. But without access to a platform that can reproduce it, I'm a bit stumped. Am I correct that CRAN can only test things that have already been released? (seems a bit unintuitive, but I guess resources are limited)
I can reproduce this error on a macbook that I have here (for the purpose of testing sf). #872 was indeed not the cause, as @dbaston suggested. Any ideas what I can try out?
I don't know what "clang 4" means; clang -v gives an entirely different output on the macbook (like: Apple LLVM version 9.0.0 clang-900.0.39.2)
@edzer can you get the line of code that produced the crash? Or, insert enough Rcout calls to figure out what the line is?
Here's a backtrace from lldb:
> i = st_intersection(s) # all intersections
libc++abi.dylib: terminating with uncaught exception of type std::__1::bad_function_call: std::exception
Process 18689 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff971d4d42 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff971d4d42 <+10>: jae 0x7fff971d4d4c ; <+20>
0x7fff971d4d44 <+12>: movq %rax, %rdi
0x7fff971d4d47 <+15>: jmp 0x7fff971cdcaf ; cerror_nocancel
0x7fff971d4d4c <+20>: retq
Target 0: (R) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff971d4d42 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff972c2457 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff9713a420 libsystem_c.dylib`abort + 129
frame #3: 0x00007fff95c8e94a libc++abi.dylib`abort_message + 266
frame #4: 0x00007fff95cb3c17 libc++abi.dylib`default_terminate_handler() + 243
frame #5: 0x00007fff967c2713 libobjc.A.dylib`_objc_terminate() + 124
frame #6: 0x00007fff95cb0d49 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #7: 0x00007fff95cb0dc3 libc++abi.dylib`std::terminate() + 51
frame #8: 0x0000000107a1cd4b sf.so`__clang_call_terminate + 11
frame #9: 0x0000000107a5cfdf sf.so`std::__1::__vector_base<std::__1::unique_ptr<GEOSGeom_t, std::__1::function<void (GEOSGeom_t*)> >, std::__1::allocator<std::__1::unique_ptr<GEOSGeom_t, std::__1::function<void (GEOSGeom_t*)> > > >::~__vector_base() + 223
frame #10: 0x0000000107a5bbf6 sf.so`CPL_nary_intersection(Rcpp::Vector<19, Rcpp::PreserveStorage>) + 5014
frame #11: 0x0000000107a29e76 sf.so`_sf_CPL_nary_intersection + 134
frame #12: 0x000000010010bd3a libR.dylib`R_doDotCall(ofun=<unavailable>, nargs=<unavailable>, cargs=<unavailable>, call=0x0000000107654320) at dotcode.c:570 [opt]
frame #13: 0x000000010010d6b3 libR.dylib`do_dotcall(call=0x0000000107654320, op=<unavailable>, args=<unavailable>, env=<unavailable>) at dotcode.c:1252 [opt]
frame #14: 0x000000010013bb3d libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654630) at eval.c:728 [opt]
frame #15: 0x000000010014df80 libR.dylib`do_begin(call=<unavailable>, op=0x0000000101827628, args=0x00000001076542b0, rho=0x0000000107654630) at eval.c:2191 [opt]
frame #16: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654630) at eval.c:700 [opt]
frame #17: 0x000000010014be52 libR.dylib`R_execClosure(call=0x000000010985b238, newrho=0x0000000107654630, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x0000000107654c38) at eval.c:1607 [opt]
frame #18: 0x000000010013b9ea libR.dylib`Rf_eval(e=0x000000010985b238, rho=0x0000000107654a40) at eval.c:747 [opt]
frame #19: 0x000000010014e435 libR.dylib`do_set(call=0x000000010985b2e0, op=0x0000000101827778, args=0x000000010985b2a8, rho=0x0000000107654a40) at eval.c:2583 [opt]
frame #20: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654a40) at eval.c:700 [opt]
frame #21: 0x000000010014df80 libR.dylib`do_begin(call=<unavailable>, op=0x0000000101827628, args=0x000000010985b318, rho=0x0000000107654a40) at eval.c:2191 [opt]
frame #22: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654a40) at eval.c:700 [opt]
frame #23: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654a40) at eval.c:700 [opt]
frame #24: 0x000000010014df80 libR.dylib`do_begin(call=<unavailable>, op=0x0000000101827628, args=0x000000010985b4a0, rho=0x0000000107654a40) at eval.c:2191 [opt]
frame #25: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000107654a40) at eval.c:700 [opt]
frame #26: 0x000000010014be52 libR.dylib`R_execClosure(call=0x0000000107655940, newrho=0x0000000107654a40, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x000000010985a8e8) at eval.c:1607 [opt]
frame #27: 0x000000010018bf27 libR.dylib`dispatchMethod(op=0x0000000106980988, sxp=0x000000010985a8e8, dotClass=0x0000000109a0b038, cptr=0x00007fff5fbfb9a0, method=0x00000001072c12a8, generic=<unavailable>, rho=<unavailable>, callrho=<unavailable>, defrho=<unavailable>) at objects.c:335 [opt]
frame #28: 0x000000010018ba3f libR.dylib`Rf_usemethod(generic="st_intersection", obj=<unavailable>, call=<unavailable>, args=<unavailable>, rho=0x00000001076555c0, callrho=0x0000000101852e70, defrho=<unavailable>, ans=<unavailable>) at objects.c:371 [opt]
frame #29: 0x000000010018c13b libR.dylib`do_usemethod(call=<unavailable>, op=<unavailable>, args=<unavailable>, env=0x00000001076555c0) at objects.c:451 [opt]
frame #30: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x00000001076555c0) at eval.c:700 [opt]
frame #31: 0x000000010014be52 libR.dylib`R_execClosure(call=0x00000001060e33c0, newrho=0x00000001076555c0, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x0000000106980988) at eval.c:1607 [opt]
frame #32: 0x000000010013b9ea libR.dylib`Rf_eval(e=0x00000001060e33c0, rho=0x0000000101852e70) at eval.c:747 [opt]
frame #33: 0x000000010014e435 libR.dylib`do_set(call=0x00000001060e3468, op=0x0000000101827778, args=0x00000001060e3430, rho=0x0000000101852e70) at eval.c:2583 [opt]
frame #34: 0x000000010013b86c libR.dylib`Rf_eval(e=<unavailable>, rho=0x0000000101852e70) at eval.c:700 [opt]
frame #35: 0x000000010014f92f libR.dylib`do_eval(call=<unavailable>, op=<unavailable>, args=<unavailable>, rho=<unavailable>) at eval.c:2961 [opt]
frame #36: 0x0000000100140146 libR.dylib`bcEval(body=<unavailable>, rho=0x00000001076552b0, useCache=<unavailable>) at eval.c:6429 [opt]
frame #37: 0x000000010013b701 libR.dylib`Rf_eval(e=<unavailable>, rho=<unavailable>) at eval.c:624 [opt]
frame #38: 0x000000010014be52 libR.dylib`R_execClosure(call=0x0000000106113120, newrho=0x00000001076552b0, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x00000001018873c8) at eval.c:1607 [opt]
frame #39: 0x000000010013fd04 libR.dylib`bcEval(body=<unavailable>, rho=0x000000010610b078, useCache=<unavailable>) at eval.c:6400 [opt]
frame #40: 0x000000010013b701 libR.dylib`Rf_eval(e=<unavailable>, rho=<unavailable>) at eval.c:624 [opt]
frame #41: 0x0000000100149eb9 libR.dylib`forcePromise(e=0x0000000107655ed8) at eval.c:520 [opt]
frame #42: 0x000000010013b965 libR.dylib`Rf_eval(e=<unavailable>, rho=<unavailable>) at eval.c:647 [opt]
frame #43: 0x000000010014fc81 libR.dylib`do_withVisible(call=<unavailable>, op=<unavailable>, args=0x000000010601f9c8, rho=<unavailable>) at eval.c:2990 [opt]
frame #44: 0x000000010018b280 libR.dylib`do_internal(call=<unavailable>, op=<unavailable>, args=<unavailable>, env=0x0000000107655f80) at names.c:1360 [opt]
frame #45: 0x00000001001402f9 libR.dylib`bcEval(body=<unavailable>, rho=0x0000000107655f80, useCache=<unavailable>) at eval.c:6449 [opt]
frame #46: 0x000000010013b701 libR.dylib`Rf_eval(e=<unavailable>, rho=<unavailable>) at eval.c:624 [opt]
frame #47: 0x000000010014be52 libR.dylib`R_execClosure(call=0x00000001061130b0, newrho=0x0000000107655f80, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x000000010601fb18) at eval.c:1607 [opt]
frame #48: 0x000000010013fd04 libR.dylib`bcEval(body=<unavailable>, rho=0x000000010610b078, useCache=<unavailable>) at eval.c:6400 [opt]
frame #49: 0x000000010013b701 libR.dylib`Rf_eval(e=<unavailable>, rho=<unavailable>) at eval.c:624 [opt]
frame #50: 0x000000010014be52 libR.dylib`R_execClosure(call=0x00000001061278b0, newrho=0x000000010610b078, sysparent=<unavailable>, rho=<unavailable>, arglist=<unavailable>, op=0x0000000106127ae0) at eval.c:1607 [opt]
frame #51: 0x000000010013b9ea libR.dylib`Rf_eval(e=0x00000001061278b0, rho=0x0000000101852e70) at eval.c:747 [opt]
frame #52: 0x000000010017bf48 libR.dylib`Rf_ReplIteration(rho=0x0000000101852e70, savestack=<unavailable>, browselevel=<unavailable>, state=0x00007fff5fbfe9c0) at main.c:258 [opt]
frame #53: 0x000000010017d43f libR.dylib`run_Rmainloop [inlined] R_ReplConsole(rho=<unavailable>, savestack=0, browselevel=0) at main.c:308 [opt]
frame #54: 0x000000010017d3d6 libR.dylib`run_Rmainloop at main.c:1059 [opt]
frame #55: 0x0000000100000f5b R`main + 27
frame #56: 0x00007fff970a6235 libdyld.dylib`start + 1
frame #57: 0x00007fff970a6235 libdyld.dylib`start + 1
(lldb)
After having filled the function CPL_nary_intersection with Rcout calls, it looks like the problem arises when the function returns (an Rcout statement right before the return statement prints its output).
I was able to get an OS X environment up and running and reproduced the issue. I have a fix in https://github.com/r-spatial/sf/pull/875.
There's something that's giving me further discomfort, though. The deleters for the unique_ptr reference the GEOS context handle, but then CPL_geos_finish ends up being called before the deleters. I don't know if this is actually OK for some reason that would escape me at the moment, or if this file should have some additional tweaking to prevent that. I think an easy way to ensure that the context is still around when we try to use it would be to open a new block scope ( { ) after calling CPL_geos_init and close the block scope before calling CPL_geos_finish )
Solves the bug for now - thanks!!
... and also on CRAN's r-devel-linux-x86_64-fedora-clang: https://cran.r-project.org/web/checks/check_results_sf.html