The script that demonstrate this can be found here. Pasted below
#!/usr/bin/julia -f
f() = nothing
forward_nothing(func) = func()
forward_pos_only(func, args...) = func(args...)
forward_kw(func, args...; kws...) = func(args...; kws...)
do_sth(a, b, c) = begin
global g = a * b * c
end
macro time_forward(ex)
quote
println($(Expr(:quote, ex)))
gc()
@time for i in 1:10000000
$(esc(ex))
end
end
end
@time_forward forward_nothing(f)
@time_forward forward_pos_only(f)
@time_forward forward_kw(f)
@time_forward do_sth("a", "b", "c")
@time_forward do_sth(1, 2, 3)
@time_forward do_sth(1.2, 2.3, 3.4)
@time_forward (do_sth(1, 2, 3); do_sth(1.2, 2.3, 3.4))
The output of the script on my laptop
forward_nothing(f)
elapsed time: 0.019369954 seconds (0 bytes allocated)
forward_pos_only(f)
elapsed time: 6.2e-8 seconds (0 bytes allocated)
forward_kw(f)
elapsed time: 0.95345393 seconds (960000000 bytes allocated, 33.34% gc time)
do_sth("a","b","c")
elapsed time: 1.014413397 seconds (800000000 bytes allocated, 28.80% gc time)
do_sth(1,2,3)
elapsed time: 0.04512597 seconds (0 bytes allocated)
do_sth(1.2,2.3,3.4)
elapsed time: 0.125168819 seconds (160000000 bytes allocated, 37.53% gc time)
begin
do_sth(1,2,3)
do_sth(1.2,2.3,3.4)
end
elapsed time: 0.157331348 seconds (160000000 bytes allocated, 30.24% gc time)
There are several interesting things here,
f(args...)
and figure out it is useless and eliminate the loop altogether (I don't think my cpu can excute a million loops in tens of nanosecond) but it cannot do it for f()
. This particular case isn't very interesting but I'm wondering if there's any missing optimizations here.invoke
that supports keyword arguments. This version is ~3 times slower than the builtin invoke and a version without keyword argument support and simply forward to the builtin invoke
is as fast as the builtin one.Anothing thing I've noticed is that the extra keyword argument got by the functions is in a different format (Array{(Symbol, Any), 1}
) from the format of argument (Array{Any, 1}
) to the .env.kwsorter
. Is it necessary to do this conversion? This probably won't affect the performance for empty forwarding but feels inefficient in general.
Test done on current master as of a few hours ago.
julia> versioninfo()
Julia Version 0.4.0-dev+2369
Commit bab9ec5* (2015-01-02 01:47 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz
WORD_SIZE: 64
BLAS: libblas
LAPACK: liblapack
LIBM: libm
LLVM: libLLVM-3.3
The llvm version installed is 3.5 and I have no idea why it prints llvm-3.3....
The performance penalty associated with keyword arguments is documented as a performance tip and is a duplicate of #8843.
I see. Now I remember that I've read those tip before but didn't really understand/remember them very well because I didn't really know how kwargs work at that time......
Please feel free to rename or open a new one for the other minor issues mentioned above if you think it is not the time yet to optimize keyword argument performance.
Although actually I think it might be possible to get a lot better performance if the argument to the kwsorter changes to ::Array{(Symbol, Any), 1}
. I just hacked my version of invoke
that support keyword argument by extending .env.kwsorter
directly and the performance is much better (3x).
The llvm version installed is 3.5 and I have no idea why it prints llvm-3.3....
I fixed this bit in #9566
@tkelman Thanks. I noticed that it prints the right version yesterday but didn't figured out why..... =)
Slightly off topic: I just stumbled across this as well and, as recommended above, had a look at the performance tips. I found this sentence confusing: "Functions are specialized on the types of keyword arguments, so these declarations will not affect performance of code inside the function." Is it not contradictory? Is there a "not" missing, although, that wouldn't make sense either. If the function is specialized on the types of the keyword it should be faster, no? Anyways, maybe something to clean up.
AFAIK, what it means is that the function body is specialized so the function body will be compiled to the same level as if those were normal arguments.
julia> f(;a=0, b=1) = a + b
f (generic function with 1 method)
julia> @code_llvm f()
define i64 @julia_f_42853() {
top:
ret i64 1
}
The overhead here is that when the function is called, it goes through another dispatch function which is slow but that's not part of the "function body" and you don't need to write a separate function with only normal arguments to let julia do type inference and increase performance.
I don't know if this is really what it mean and I do find that confusing too...
Thanks, that kind of makes sense. But I don't think that sentence helps anyone who doesn't know about the internals of kw-arg functions. And those who know probably don't need to read the performance tips.
(BTW, @code_llvm
and friends often don't work properly with kw-args, see #9818, but should be ok for your example)
Yeah, I figured what I understand now after I figured out how kw-args works.
And yeah, I know @code_*
and code_*
doesn't work for keyword arguments. Here's a more "complete" example for that...
julia> f(;a=0, b=1) = if a > 0 && b < 1
a * b + b + a
else
a - b * a - b
end
f (generic function with 1 method)
julia> @code_llvm f()
define i64 @julia_f_42805() {
top:
%0 = call i64 @"julia___f#0___42806"(i64 0, i64 1)
ret i64 %0
}
julia> @code_typed f.env.kwsorter(Any[])
1-element Array{Any,1}:
:($(Expr(:lambda, Any[symbol("#s2")], Any[Any[:a,:b,symbol("#s8"),symbol("#s9"),symbol("#s10"),symbol("#s1"),symbol("#s3"),symbol("#s4"),:_var1,:_var2,:_var3,:_var0,:_var4],Any[Any[symbol("#s2"),Array{Any,1},0],Any[:a,Any,2],Any[:b,Any,2],Any[symbol("#s8"),UnitRange{Int64},18],Any[symbol("#s9"),Int64,2],Any[symbol("#s10"),(Int64,Int64),18],Any[symbol("#s1"),Int64,18],Any[symbol("#s3"),Int64,18],Any[symbol("#s4"),Any,18],Any[:_var1,Int64,18],Any[:_var2,Int64,18],Any[:_var3,(ASCIIString,Any,ASCIIString),18],Any[:_var0,Int64,18],Any[:_var4,Int64,18]],Any[]], :(begin $(Expr(:line, 0, symbol(""), symbol("")))
a = 0
b = 1
_var1 = (top(arraylen))(#s2::Array{Any,1})::Int64
_var2 = (top(box))(Int64,(top(ashr_int))(_var1::Int64,(top(box))(Int32,(top(checked_trunc_sint))(Int32,1))))
#s8 = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,_var2::Int64)::Bool,_var2::Int64,(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
#s9 = (top(getfield))(#s8::UnitRange{Int64},:start)::Int64
unless (top(box))(Bool,(top(not_int))(#s9::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(#s8::UnitRange{Int64},:stop)::Int64,1))::Bool)) goto 1
2:
_var0 = #s9::Int64
_var4 = (top(box))(Int64,(top(add_int))(#s9::Int64,1))
#s1 = _var0::Int64
#s9 = _var4::Int64
#s3 = (top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(mul_int))(#s1::Int64,2)),1))
#s4 = (top(arrayref))(#s2::Array{Any,1},#s3::Int64)::Any
unless #s4 === :b::Bool goto 4
b = (top(arrayref))(#s2::Array{Any,1},(top(box))(Int64,(top(add_int))(#s3::Int64,1)))::Any
goto 6
4:
unless #s4 === :a::Bool goto 5
a = (top(arrayref))(#s2::Array{Any,1},(top(box))(Int64,(top(add_int))(#s3::Int64,1)))::Any
goto 6
5:
_var3 = $(Expr(:call1, :(top(tuple)), "unrecognized keyword argument \"", symbol("#s4"), "\""))
throw(ErrorException((top(_apply))(call,string,_var3::(ASCIIString,Any,ASCIIString)))::ErrorException)
6:
3:
unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s9::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(#s8::UnitRange{Int64},:stop)::Int64,1))::Bool)))) goto 2
1:
0:
return __f#0__(a,b)::Any
end::Any))))
julia> code_typed(eval(symbol("__f#0__")), (Int, Int))
1-element Array{Any,1}:
:($(Expr(:lambda, Any[:a,:b], Any[Any[symbol("#s13")],Any[Any[:a,Int64,0],Any[:b,Int64,0],Any[symbol("#s13"),Bool,2]],Any[]], :(begin $(Expr(:line, 1, :none, :f))
unless (top(slt_int))(0,a::Int64)::Bool goto 0
#s13 = (top(slt_int))(b::Int64,1)::Bool
goto 1
0:
#s13 = false
1:
unless #s13::Bool goto 2 # line 2:
return (top(box))(Int64,(top(add_int))((top(box))(Int64,(top(add_int))((top(box))(Int64,(top(mul_int))(a::Int64,b::Int64)),b::Int64)),a::Int64))
goto 3
2: # line 4:
return (top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(sub_int))(a::Int64,(top(box))(Int64,(top(mul_int))(b::Int64,a::Int64)))),b::Int64))
3:
end::Int64))))
This would be a breaking feature, so it's not eligible for a point release
@vtjnash said that:
This would be a breaking feature, so it's not eligible for a point release
So, maybe this issue should change its actual milestone (0.6.x).
What here would be a breaking feature? I'm confused.
Depends how it gets implemented. If we can make them faster without changing semantics at all then it could be backported, but who knows how likely that is at this point.
Keyword args need a redesign anyway, so I don't think it would be worth the effort to optimize this for 0.6, then again after the redesign, so moving milestone.
Given this is a performance issue, perhaps it can be addressed post 1.0.
This would be one of the few things I had wished would make it for 1.0: indeed, many APIs have been designed with their slowness constraint, so they have been often avoided. I occasionally see a function in base which would benefit from a redesign to use keywords, so a pass could be done before freezing the APIs to update such functions (if keywords get faster).
We'll improve keyword argument performance in the future. This may open up some API changes, at which point we can add keyword methods of those functions and keep the positional argument versions around as legacies. We'll probably want an opt-in mechanism to make sure your code is up to the standards of your current Julia version.
Most helpful comment
This would be one of the few things I had wished would make it for 1.0: indeed, many APIs have been designed with their slowness constraint, so they have been often avoided. I occasionally see a function in base which would benefit from a redesign to use keywords, so a pass could be done before freezing the APIs to update such functions (if keywords get faster).