Julia: Segfault on 1.0.x due to changes in `precompile`/`nospecialize` statements

Created on 17 Dec 2019  路  11Comments  路  Source: JuliaLang/julia

Recently JuliaImages has been seeing segfaults on Julia 1.0.5:

and several others. It is triggered by the following test script:

using Images, TestImages
testimage("cameraman")

but only if you use [email protected]---with [email protected]everything works normally. The only thing different about those two versions is the addition of some @nospecialize and precompile statements.

I've bisected this to 1.0.4, specifically

7fb55412bb3133be0f9c2b77c55213d1ee37f656 is the first bad commit
commit 7fb55412bb3133be0f9c2b77c55213d1ee37f656
Author: Raghvendra Gupta <[email protected]>
Date:   Mon Dec 17 20:20:13 2018 +0530

    Fix #30006, getindex accessing fields that might not exist (#30405)

    * Fix #30006, range getindex accessing fields that might not exist
    * Add tests for #30006

    (cherry picked from commit 64133f68a68a2bb52a8908bab25c32150a7e84fd)

:040000 040000 aed5b2f6c67d1d6271f99a63e50799f8d674a6d5 af0c598084bed8453fa09045bf89371de44103ff M      test

I will follow this up with a more direct test, because that's a surprising outcome for a file-loading operation.

All 11 comments

Perhaps unsurprisingly, this is not entirely reproducible. For now, do not trust that bisection result.

FWIW, similar segfaults occur in Gadfly.jl with Colors v0.10 or v0.11. (cf. https://github.com/GiovineItalia/Gadfly.jl/pull/1369)

Colors v0.10.0 (w/ precompilation) with:

| FixedPointNumbers | ColorTypes | Test |
|-------------------|-----------------|--------------------|
| v0.6.1 (w/o pc) | v0.8.0 (w/o pc) | :heavy_check_mark: |
| v0.6.1 (w/o pc) | v0.8.1 (w/ pc) | :x: segfault |
| v0.7.0 (w/ pc) | v0.8.0 (w/o pc) | :x: segfault |
| v0.7.0 (w/ pc) | v0.8.1 (w/ pc) | :x: segfault |

Edit:
In the case of Images.jl above, a segfault has occurred even with Colors v0.9.6 (w/o precompilation). Perhaps, the common thing is the multipath in the dependency graph.

Edit2:
In the case of Gadfly.jl, the segfault occurs on Julia v1.0.3 (x86_64-linux-gnu), too.

BTW, I guess that the method redefinition is not completely compatible with the precompilation.
https://github.com/JuliaImages/ImageFiltering.jl/issues/90

Does Gadfly depend on ImageFiltering? (I'm trying to understand the basis for your guess.)

Does Gadfly depend on ImageFiltering?

Probably not. I don't think they are exactly the same issue.

There are still a lot of dependencies, but we have narrowed down the list of possible causes.
https://github.com/kimikage/Issue34121.jl
https://travis-ci.org/kimikage/Issue34121.jl

I believe I have worked around this for JuliaImages with FileIO 1.2.1, which just skips precompilation if the Julia version is < 1.1. Obviously this is a bandaid and does nothing to address the underlying problem, but anyone testing this issue should pin FileIO at 1.2.0.

Even with the set of dependency packages with the exact same versions, the results will be different.:dizzy_face:
I guess that this depends on the order of installation.
https://travis-ci.org/kimikage/Issue34121.jl/builds/631182241

Edit:
The cause of the difference (i.e. non-reproducibility) may be multi-threading.
Edit2:
However, JULIA_NUM_THREADS = 1 seems to have essentially no effect.

I also encountered a segfault in JLD2 on Julia v1.0.5. However, since there is no fuss, I guess the problem depends on other packages or the process of their building/(pre-)compilation. (I'm not going to submit an issue until I am convinced that the problem is due to another cause.)

I believe that the redefinition of Broadcast._bcs1 affects the reproducibility of segfault. However, I guess the reason is that Broadcast._bcs1 is frequently used. There are many possible causes of side effects (e.g. use of dictionaries).

So far, segfaults have occurred using the packages which use native binaries. However, this may be a spurious correlation or a coincidence.

I didn't take this problem very seriously because I expected the segfaults to happen with certain package combinations.
However, the problem was getting worse. The packages which do not occur segfaults are being updated and packages which occur segfaults are being left behind. Although those are the desirable behaviors of the compatibility "cap", most packages do not support backporting.

Therefore, as with FileIO 1.2.1, the workarounds which limit precompilation may be needed in other packages.
The tricky part is choosing the target packages. In addition, when this countermeasure is started, combination explosions will occur, making it difficult to determine the cause. Of course it is possible to pin to the versions before the countermeasure, but be aware that the segfault which we know is probably just the tip of the iceberg.

BTW this seems to be a problem not only on v1.0.5 but also on v1.0.x, so it might be a good idea to update the title.
Also, although I cannot rule out that it may be an accomplice, nospecialize is probably innocent.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

manor picture manor  路  3Comments

Keno picture Keno  路  3Comments

felixrehren picture felixrehren  路  3Comments

wilburtownsend picture wilburtownsend  路  3Comments

sbromberger picture sbromberger  路  3Comments