Julia: Optimizer fails to DCE typeassert with ::Type{<:Union} (string performance)

Created on 20 Aug 2020  ยท  6Comments  ยท  Source: JuliaLang/julia

In eliminating invalidations from AbstractString methods I added type-asserts like these. I assumed that these would be elided in cases where the type was known concretely, but that assumption appears to be wrong. Here's a demo:

julia> code_typed(sizeof, (SubString{String},))
1-element Vector{Any}:
 CodeInfo(
1 โ”€ %1 = Base.getfield(s, :ncodeunits)::Int64
โ”‚        invoke Core.TypeVar(Symbol("#s68")::Symbol, Union{UInt16, UInt32, UInt8}::Any)::TypeVar
โ”‚        invoke Core.TypeVar(Symbol("#s68")::Symbol, Union{UInt16, UInt32, UInt8}::Any)::Core.Compiler.PartialTypeVar(var"#s68"<:Union{UInt16, UInt32, UInt8}, true, true)
โ”‚   %4 = Base.mul_int(%1, 1)::Int64
โ””โ”€โ”€      return %4
) => Int64

You can see that statements 2 and 3 aren't used by any subsequent code, yet they are still there. This is further confirmed by

julia> code_llvm(sizeof, (SubString{String},))

;  @ strings/basic.jl:177 within `sizeof'
define i64 @julia_sizeof_3256({ {}*, i64, i64 }* nocapture nonnull readonly align 8 dereferenceable(24)) {
top:
  %1 = alloca [2 x {}*], align 8
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 0
; โ”Œ @ strings/substring.jl:64 within `ncodeunits'
; โ”‚โ”Œ @ Base.jl:33 within `getproperty'
    %2 = getelementptr inbounds { {}*, i64, i64 }, { {}*, i64, i64 }* %0, i64 0, i32 2
; โ””โ””
; โ”Œ @ strings/substring.jl:65 within `codeunit'
   store {}* inttoptr (i64 140505618344944 to {}*), {}** %.sub, align 8
   %3 = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 1
   store {}* inttoptr (i64 140505642590608 to {}*), {}** %3, align 8
   %4 = call nonnull {}* @j1_TypeVar_3258({}* inttoptr (i64 140505697665616 to {}*), {}** nonnull %.sub, i32 2)
; โ””
  store {}* inttoptr (i64 140505618344944 to {}*), {}** %.sub, align 8
  store {}* inttoptr (i64 140505642590800 to {}*), {}** %3, align 8
  %5 = call nonnull {}* @j1_TypeVar_3259({}* inttoptr (i64 140505697665616 to {}*), {}** nonnull %.sub, i32 2)
; โ”Œ @ int.jl:88 within `*'
   %6 = load i64, i64* %2, align 8
; โ””
  ret i64 %6
}

julia> code_native(sizeof, (SubString{String},))
    .text
; โ”Œ @ basic.jl:177 within `sizeof'
    pushq   %r15
    pushq   %r14
    pushq   %r13
    pushq   %r12
    pushq   %rbx
    subq    $16, %rsp
    movq    %rdi, %rbx
    movabsq $140505618344944, %r12  # imm = 0x7FCA03786BF0
; โ”‚โ”Œ @ substring.jl:65 within `codeunit'
    movq    %r12, (%rsp)
    movabsq $140505639939952, %rax  # imm = 0x7FCA04C1EF70
    movq    %rax, 8(%rsp)
    movabsq $TypeVar, %r13
    movabsq $jl_system_image_data, %r14
    movq    %rsp, %r15
    movq    %r14, %rdi
    movq    %r15, %rsi
    movl    $2, %edx
    callq   *%r13
; โ”‚โ””
    movq    %r12, (%rsp)
    movabsq $140505639940112, %rax  # imm = 0x7FCA04C1F010
    movq    %rax, 8(%rsp)
    movq    %r14, %rdi
    movq    %r15, %rsi
    movl    $2, %edx
    callq   *%r13
; โ”‚โ”Œ @ int.jl:88 within `*'
    movq    16(%rbx), %rax
; โ”‚โ””
    addq    $16, %rsp
    popq    %rbx
    popq    %r12
    popq    %r13
    popq    %r14
    popq    %r15
    retq
    nopw    %cs:(%rax,%rax)
    nop
; โ””

and especially by Profile lines like

  1โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž 19   @Base/strings/basic.jl:177; sizeof
 10โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž  10   @Base/boot.jl:366; TypeVar(n::Symbol, ub::Any)
  1โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž  8    @Base/strings/substring.jl:65; codeunit
  7โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž    โ•Ž   7    @Base/boot.jl:366; TypeVar(n::Symbol, ub::Any)

Contrast with:

julia> fastsizeof(s::AbstractString) = ncodeunits(s) * sizeof(fastcodeunit(s))
fastsizeof (generic function with 1 method)

julia> fastcodeunit(s::SubString) = codeunit(s.string)
fastcodeunit (generic function with 1 method)

julia> code_typed(fastsizeof, (SubString{String},))
1-element Vector{Any}:
 CodeInfo(
1 โ”€ %1 = Base.getfield(s, :ncodeunits)::Int64
โ”‚   %2 = Base.mul_int(%1, 1)::Int64
โ””โ”€โ”€      return %2
) => Int64

julia> code_llvm(fastsizeof, (SubString{String},))

;  @ REPL[30]:1 within `fastsizeof'
define i64 @julia_fastsizeof_3457({ {}*, i64, i64 }* nocapture nonnull readonly align 8 dereferenceable(24)) {
top:
; โ”Œ @ strings/substring.jl:64 within `ncodeunits'
; โ”‚โ”Œ @ Base.jl:33 within `getproperty'
    %1 = getelementptr inbounds { {}*, i64, i64 }, { {}*, i64, i64 }* %0, i64 0, i32 2
; โ””โ””
; โ”Œ @ int.jl:88 within `*'
   %2 = load i64, i64* %1, align 8
; โ””
  ret i64 %2
}

julia> code_native(fastsizeof, (SubString{String},))
    .text
; โ”Œ @ REPL[30]:1 within `fastsizeof'
; โ”‚โ”Œ @ int.jl:88 within `*'
    movq    16(%rdi), %rax
; โ”‚โ””
    retq
    nopw    %cs:(%rax,%rax)
    nop
; โ””

So, it looks like I screwed up the performance of string processing. :cry: Apologies!

However, rather than backing those out it seems the best way to fix this would be in the optimizer? I've tentatively tagged this issue as such.

Possibly-related (though it may be different, i.e., hitting a limit in Union size or something):

julia> f() = (x = rand(); x < 0.25 ? RGB{N0f8}(1, 0, 0) :
                          x < 0.50 ? BGR{N0f8}(1, 0, 0) :
                          x < 0.75 ? HSL(0.0f0, 1.0f0, 0.5f0) :
                                     HSL(0.3, 0.8, 0.2))::Colorant
f (generic function with 1 method)

julia> code_typed(f)
1-element Vector{Any}:
 CodeInfo(
1 โ”€โ”€ %1  = $(Expr(:foreigncall, :(:jl_threadid), Int16, svec(), 0, :(:ccall)))::Int16
โ”‚    %2  = Base.sext_int(Int64, %1)::Int64
โ”‚    %3  = Base.add_int(%2, 1)::Int64
โ”‚    %4  = invoke Random.default_rng(%3::Int64)::Random.MersenneTwister
โ”‚    %5  = Base.getfield(%4, :idxF)::Int64
โ”‚    %6  = Random.MT_CACHE_F::Int64
โ”‚    %7  = (%5 === %6)::Bool
โ””โ”€โ”€โ”€       goto #3 if not %7
2 โ”€โ”€ %9  = $(Expr(:gc_preserve_begin, :(%4)))
โ”‚    %10 = Base.getfield(%4, :state)::Random.DSFMT.DSFMT_state
โ”‚    %11 = Base.getfield(%4, :vals)::Vector{Float64}
โ”‚    %12 = $(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), 0, :(:ccall), :(%11)))::Ptr{Float64}
โ”‚    %13 = Base.getfield(%4, :vals)::Vector{Float64}
โ”‚    %14 = Base.arraylen(%13)::Int64
โ”‚          invoke Random.dsfmt_fill_array_close1_open2!(%10::Random.DSFMT.DSFMT_state, %12::Ptr{Float64}, %14::Int64)::Any
โ”‚          $(Expr(:gc_preserve_end, :(%9)))
โ””โ”€โ”€โ”€       Base.setfield!(%4, :idxF, 0)::Int64
3 โ”„โ”€       goto #4
4 โ”€โ”€ %19 = Base.getfield(%4, :vals)::Vector{Float64}
โ”‚    %20 = Base.getfield(%4, :idxF)::Int64
โ”‚    %21 = Base.add_int(%20, 1)::Int64
โ”‚          Base.setfield!(%4, :idxF, %21)::Int64
โ”‚    %23 = Base.arrayref(false, %19, %21)::Float64
โ””โ”€โ”€โ”€       goto #5
5 โ”€โ”€       goto #6
6 โ”€โ”€ %26 = Base.sub_float(%23, 1.0)::Float64
โ””โ”€โ”€โ”€       goto #7
7 โ”€โ”€       goto #8
8 โ”€โ”€       goto #9
9 โ”€โ”€ %30 = Base.lt_float(%26, 0.25)::Bool
โ””โ”€โ”€โ”€       goto #11 if not %30
10 โ”€       goto #18
11 โ”€ %33 = Base.lt_float(%26, 0.5)::Bool
โ””โ”€โ”€โ”€       goto #13 if not %33
12 โ”€       goto #17
13 โ”€ %36 = Base.lt_float(%26, 0.75)::Bool
โ””โ”€โ”€โ”€       goto #15 if not %36
14 โ”€       goto #16
15 โ”€       nothing::Nothing
16 โ”„ %40 = ฯ† (#14 => $(QuoteNode(HSL{Float32}(0.0f0,1.0f0,0.5f0))), #15 => $(QuoteNode(HSL{Float64}(0.3,0.8,0.2))))::Union{HSL{Float32}, HSL{Float64}}
17 โ”„ %41 = ฯ† (#12 => $(QuoteNode(BGR{N0f8}(1.0,0.0,0.0))), #16 => %40)::Union{BGR{N0f8}, HSL}
18 โ”„ %42 = ฯ† (#10 => $(QuoteNode(RGB{N0f8}(1.0,0.0,0.0))), #17 => %41)::Any
โ”‚          Core.typeassert(%42, Main.Colorant)::Colorant
โ”‚    %44 = ฯ€ (%42, Colorant)
โ””โ”€โ”€โ”€       return %44
) => Colorant

There's no actual need to execute that type-assert (though inference should keep it in preference to Any), though this one doesn't surprise me as much as the first one.

CC @kimikage

optimizer

Most helpful comment

A better way to write types like this is Union{Type{UInt8},Type{UInt16},Type{UInt32}}, which should work around this issue.

All 6 comments

It's not really a DCE issue. The compiler couldn't prove that those functions don't throw. I'm also surprised it didn't inline the Core.TypeVar, which is potentially what is inhibiting the nothrow analysis.

Thanks for the clarification! Thoughts on whether this can be straightforwardly fixed or whether we need to back out those changes?

I don't think it should be that bad to fix. There's basically two ways to go about it. The quick way would be to figure out why the Core.TypeVar isn't inlining and tweaking whatever is preventing that such that our nothrow model for _typevar can kick in. The other would be for inference to start annotating effect-free information on CodeInfos, like we do pure already. I don't think it's terribly hard, but I might not get to it for a little bit.

Thanks for the info. I don't think it's a rush but we should think about it before releasing 1.6. I'll put a milestone on this issue just so we don't forget. Like I said, if it proves to be the expedient solution I'm 100% willing to revert those changes; it's nice to have things perform reasonably in the face of poor inference information, but not at the cost of good performance when inference works well.

A better way to write types like this is Union{Type{UInt8},Type{UInt16},Type{UInt32}}, which should work around this issue.

Original issue is fixed, but to me it seems useful to leave this open for future compiler improvements. I've taken off the 1.6 milestone, though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

helgee picture helgee  ยท  3Comments

sbromberger picture sbromberger  ยท  3Comments

yurivish picture yurivish  ยท  3Comments

manor picture manor  ยท  3Comments

omus picture omus  ยท  3Comments