Julia: keyword arguments are very slow

Created on 2 Jan 2015 · 18Comments · Source: JuliaLang/julia

The script that demonstrate this can be found here. Pasted below

#!/usr/bin/julia -f

f() = nothing

forward_nothing(func) = func()
forward_pos_only(func, args...) = func(args...)
forward_kw(func, args...; kws...) = func(args...; kws...)

do_sth(a, b, c) = begin
    global g = a * b * c
end

macro time_forward(ex)
    quote
        println($(Expr(:quote, ex)))
        gc()
        @time for i in 1:10000000
            $(esc(ex))
        end
    end
end

@time_forward forward_nothing(f)
@time_forward forward_pos_only(f)
@time_forward forward_kw(f)
@time_forward do_sth("a", "b", "c")
@time_forward do_sth(1, 2, 3)
@time_forward do_sth(1.2, 2.3, 3.4)
@time_forward (do_sth(1, 2, 3); do_sth(1.2, 2.3, 3.4))

The output of the script on my laptop

forward_nothing(f)
elapsed time: 0.019369954 seconds (0 bytes allocated)
forward_pos_only(f)
elapsed time: 6.2e-8 seconds (0 bytes allocated)
forward_kw(f)
elapsed time: 0.95345393 seconds (960000000 bytes allocated, 33.34% gc time)
do_sth("a","b","c")
elapsed time: 1.014413397 seconds (800000000 bytes allocated, 28.80% gc time)
do_sth(1,2,3)
elapsed time: 0.04512597 seconds (0 bytes allocated)
do_sth(1.2,2.3,3.4)
elapsed time: 0.125168819 seconds (160000000 bytes allocated, 37.53% gc time)
begin 
    do_sth(1,2,3)
    do_sth(1.2,2.3,3.4)
end
elapsed time: 0.157331348 seconds (160000000 bytes allocated, 30.24% gc time)

There are several interesting things here,

It seems that the compiler can inline f(args...) and figure out it is useless and eliminate the loop altogether (I don't think my cpu can excute a million loops in tens of nanosecond) but it cannot do it for f(). This particular case isn't very interesting but I'm wondering if there's any missing optimizations here.
Forwarding keyword argument is very costy, even when there's nothing to be forwarded (probably a big part of it is memory allocation?). Although it would be nice to make keyword argument fast in general, the slowness of forwarding empty keyword argument means that it is very costy to support keyword arguments. One example (also how I discover this issue) is a version of invoke that supports keyword arguments. This version is ~3 times slower than the builtin invoke and a version without keyword argument support and simply forward to the builtin invoke is as fast as the builtin one.
Just for comparison, I've also included some other random functions and it seems that keyword arguments are much more expensive than global variables and is as expensive as concatenating strings. What I don't quite understand either is that why is it necessary to allocate memory when assigning floating point numbers to a global variable but not for integers? (Edit: it seems that the integer version does no allocation because of the cache of boxed value.)

Anothing thing I've noticed is that the extra keyword argument got by the functions is in a different format (Array{(Symbol, Any), 1}) from the format of argument (Array{Any, 1}) to the .env.kwsorter. Is it necessary to do this conversion? This probably won't affect the performance for empty forwarding but feels inefficient in general.

Test done on current master as of a few hours ago.

julia> versioninfo()
Julia Version 0.4.0-dev+2369
Commit bab9ec5* (2015-01-02 01:47 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libblas
  LAPACK: liblapack
  LIBM: libm
  LLVM: libLLVM-3.3

The llvm version installed is 3.5 and I have no idea why it prints llvm-3.3....

keyword arguments performance

Source

yuyichao

Most helpful comment

This would be one of the few things I had wished would make it for 1.0: indeed, many APIs have been designed with their slowness constraint, so they have been often avoided. I occasionally see a function in base which would benefit from a redesign to use keywords, so a pass could be done before freezing the APIs to update such functions (if keywords get faster).

rfourquet on 18 Nov 2017

👍7

All 18 comments

The performance penalty associated with keyword arguments is documented as a performance tip and is a duplicate of #8843.

jiahao on 2 Jan 2015

I see. Now I remember that I've read those tip before but didn't really understand/remember them very well because I didn't really know how kwargs work at that time......

yuyichao on 2 Jan 2015

Please feel free to rename or open a new one for the other minor issues mentioned above if you think it is not the time yet to optimize keyword argument performance.

yuyichao on 2 Jan 2015

Although actually I think it might be possible to get a lot better performance if the argument to the kwsorter changes to ::Array{(Symbol, Any), 1}. I just hacked my version of invoke that support keyword argument by extending .env.kwsorter directly and the performance is much better (3x).

yuyichao on 2 Jan 2015

The llvm version installed is 3.5 and I have no idea why it prints llvm-3.3....

I fixed this bit in #9566

tkelman on 5 Jan 2015

@tkelman Thanks. I noticed that it prints the right version yesterday but didn't figured out why..... =)

yuyichao on 5 Jan 2015

Slightly off topic: I just stumbled across this as well and, as recommended above, had a look at the performance tips. I found this sentence confusing: "Functions are specialized on the types of keyword arguments, so these declarations will not affect performance of code inside the function." Is it not contradictory? Is there a "not" missing, although, that wouldn't make sense either. If the function is specialized on the types of the keyword it should be faster, no? Anyways, maybe something to clean up.

mauro3 on 7 Feb 2015

AFAIK, what it means is that the function body is specialized so the function body will be compiled to the same level as if those were normal arguments.

julia> f(;a=0, b=1) = a + b
f (generic function with 1 method)

julia> @code_llvm f()

define i64 @julia_f_42853() {
top:
  ret i64 1
}

The overhead here is that when the function is called, it goes through another dispatch function which is slow but that's not part of the "function body" and you don't need to write a separate function with only normal arguments to let julia do type inference and increase performance.

I don't know if this is really what it mean and I do find that confusing too...

yuyichao on 7 Feb 2015

Thanks, that kind of makes sense. But I don't think that sentence helps anyone who doesn't know about the internals of kw-arg functions. And those who know probably don't need to read the performance tips.

(BTW, @code_llvm and friends often don't work properly with kw-args, see #9818, but should be ok for your example)

mauro3 on 7 Feb 2015

Yeah, I figured what I understand now after I figured out how kw-args works.

And yeah, I know @code_* and code_* doesn't work for keyword arguments. Here's a more "complete" example for that...

julia> f(;a=0, b=1) = if a > 0 && b < 1
       a * b + b + a
       else
       a - b * a - b
       end
f (generic function with 1 method)

julia> @code_llvm f()

define i64 @julia_f_42805() {
top:
  %0 = call i64 @"julia___f#0___42806"(i64 0, i64 1)
  ret i64 %0
}                                                                                  

julia> @code_typed f.env.kwsorter(Any[])
1-element Array{Any,1}:                                                            
 :($(Expr(:lambda, Any[symbol("#s2")], Any[Any[:a,:b,symbol("#s8"),symbol("#s9"),symbol("#s10"),symbol("#s1"),symbol("#s3"),symbol("#s4"),:_var1,:_var2,:_var3,:_var0,:_var4],Any[Any[symbol("#s2"),Array{Any,1},0],Any[:a,Any,2],Any[:b,Any,2],Any[symbol("#s8"),UnitRange{Int64},18],Any[symbol("#s9"),Int64,2],Any[symbol("#s10"),(Int64,Int64),18],Any[symbol("#s1"),Int64,18],Any[symbol("#s3"),Int64,18],Any[symbol("#s4"),Any,18],Any[:_var1,Int64,18],Any[:_var2,Int64,18],Any[:_var3,(ASCIIString,Any,ASCIIString),18],Any[:_var0,Int64,18],Any[:_var4,Int64,18]],Any[]], :(begin $(Expr(:line, 0, symbol(""), symbol("")))                                                    
        a = 0                                                                      
        b = 1                                                                      
        _var1 = (top(arraylen))(#s2::Array{Any,1})::Int64                          
        _var2 = (top(box))(Int64,(top(ashr_int))(_var1::Int64,(top(box))(Int32,(top(checked_trunc_sint))(Int32,1))))                                                  
        #s8 = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,_var2::Int64)::Bool,_var2::Int64,(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))                                                         
        #s9 = (top(getfield))(#s8::UnitRange{Int64},:start)::Int64                 
        unless (top(box))(Bool,(top(not_int))(#s9::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(#s8::UnitRange{Int64},:stop)::Int64,1))::Bool)) goto 1   
        2:                                                                         
        _var0 = #s9::Int64
        _var4 = (top(box))(Int64,(top(add_int))(#s9::Int64,1))
        #s1 = _var0::Int64
        #s9 = _var4::Int64
        #s3 = (top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(mul_int))(#s1::Int64,2)),1))
        #s4 = (top(arrayref))(#s2::Array{Any,1},#s3::Int64)::Any
        unless #s4 === :b::Bool goto 4
        b = (top(arrayref))(#s2::Array{Any,1},(top(box))(Int64,(top(add_int))(#s3::Int64,1)))::Any
        goto 6
        4: 
        unless #s4 === :a::Bool goto 5
        a = (top(arrayref))(#s2::Array{Any,1},(top(box))(Int64,(top(add_int))(#s3::Int64,1)))::Any
        goto 6
        5: 
        _var3 = $(Expr(:call1, :(top(tuple)), "unrecognized keyword argument \"", symbol("#s4"), "\""))
        throw(ErrorException((top(_apply))(call,string,_var3::(ASCIIString,Any,ASCIIString)))::ErrorException)
        6: 
        3: 
        unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s9::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(#s8::UnitRange{Int64},:stop)::Int64,1))::Bool)))) goto 2
        1: 
        0: 
        return __f#0__(a,b)::Any
    end::Any))))

julia> code_typed(eval(symbol("__f#0__")), (Int, Int))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:a,:b], Any[Any[symbol("#s13")],Any[Any[:a,Int64,0],Any[:b,Int64,0],Any[symbol("#s13"),Bool,2]],Any[]], :(begin $(Expr(:line, 1, :none, :f))
        unless (top(slt_int))(0,a::Int64)::Bool goto 0
        #s13 = (top(slt_int))(b::Int64,1)::Bool
        goto 1
        0: 
        #s13 = false
        1: 
        unless #s13::Bool goto 2 # line 2:
        return (top(box))(Int64,(top(add_int))((top(box))(Int64,(top(add_int))((top(box))(Int64,(top(mul_int))(a::Int64,b::Int64)),b::Int64)),a::Int64))
        goto 3
        2:  # line 4:
        return (top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(sub_int))(a::Int64,(top(box))(Int64,(top(mul_int))(b::Int64,a::Int64)))),b::Int64))
        3: 
    end::Int64))))

yuyichao on 7 Feb 2015

This would be a breaking feature, so it's not eligible for a point release

vtjnash on 15 Sep 2016

@vtjnash said that:

This would be a breaking feature, so it's not eligible for a point release

So, maybe this issue should change its actual milestone (0.6.x).

diegozea on 4 Feb 2017

What here would be a breaking feature? I'm confused.

JeffBezanson on 4 Feb 2017

Depends how it gets implemented. If we can make them faster without changing semantics at all then it could be backported, but who knows how likely that is at this point.

tkelman on 4 Feb 2017

Keyword args need a redesign anyway, so I don't think it would be worth the effort to optimize this for 0.6, then again after the redesign, so moving milestone.

JeffBezanson on 17 May 2017

Given this is a performance issue, perhaps it can be addressed post 1.0.

ViralBShah on 18 Nov 2017

rfourquet on 18 Nov 2017

👍7

We'll improve keyword argument performance in the future. This may open up some API changes, at which point we can add keyword methods of those functions and keep the positional argument versions around as legacies. We'll probably want an opt-in mechanism to make sure your code is up to the standards of your current Julia version.