We're increasingly seeing this kind of manual optimization in Base code:
function f(...)
check_cond() || throw(SomeError(...))
end
rewritten as
@noinline throw_some_error(...) = throw(SomeError(...))
function f(...)
check_cond() || throw_some_error(...)
end
It would be great if the compiler could figure this out and do the transformation automatically. I don't think it needs to be terribly complicated either: throw
should never be a common code path in Julia and throwing each kind of error could be its own outlined function, which would avoid an explosion of these automatically outlined throw functions.
Ref https://github.com/JuliaLang/julia/issues/23281
Regarding
throw should never be a common code path in Julia and throwing each kind of error could be its own outlined function
I think one of the complications is in things like
function f(a, b)
check_cond() || throw(SomeError("This was the error $(repr(a)): $b"))
end
which is equivalent to
function f(a, b)
if !check_cond()
_tmp1 = string(repr(a))
_tmp2 = string(b)
_tmp3 = string("This was the error ", _tmp1, ": ", _tmp2)
_tmp4 = SomeError(_tmp3)
throw(_tmp4)
end
end
Clearly, just outlining the throw(_tmp4)
is useless (we want to outline the whole block) so there probably needs to be some escape analysis here showing that the _tmp
values are only used when throwing? And the way the exception is created is not unique for the exception type.
Something analogous to this (I think, correct me if I'm wrong) impacts overflow/underflow checks in various simple operations (e.g. in complex division, which could be made ~20% faster by outlining).
A minimal example is:
function checkedadd(x::Float64, y::Float64)
z=y*rand()
if z>1.0e16 # some _unlikely_ scaling operation
z/=rand()
end
return x+z
end
@noinline function scale_outlined(z::Float64)
return z/rand()
end
function checkedadd_outlined(x::Float64, y::Float64)
z=y*rand()
if z>1.0e16 # some _unlikely_ scaling operation
z=scale_outlined(z)
end
return x+z
end
The straightforward, non-outlined version of checkedadd(x,y)
is slower than the outlined version (for x
and y
values that do not hit the scaling condition):
checkedadd : 5.688 ns
checkedadd_outlined : 4.551 ns
(on the other hand, if the supposedly unlikely condition is actually met, the outlined version is slower---but that won't matter in the vast majority of overflow/underflow-related cases.)
there probably needs to be some escape analysis here
It's a very simple version of escape analysis, no? Could we simply outline all blocks that are determine to end with an unreachable (::Union{}
)? That's something that inference already knows.
scale_outlined!(z)
This will not modify the local variable z
; scale_outlined!
only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.
scale_outlined!(z)
This will not modify the local variable
z
;scale_outlined!
only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.
Ah, thanks - it doesn't change the outcome though. I've updated the previous comment to correctly modify the variable z
.
Clearly, just outlining the
throw(_tmp4)
is useless (we want to outline the whole block) so there probably needs to be some escape analysis here showing that the_tmp
values are only used when throwing?
You can just outline the basic block a throw
occurs in. Here's a slight variation of your example:
function f(a, b)
if rand() < 0.1
_tmp1 = string(repr(a))
_tmp2 = string(b)
_tmp3 = string("This was the error ", _tmp1, ": ", _tmp2)
_tmp4 = ErrorException(_tmp3)
throw(_tmp4)
end
end
```jl
julia> @code_lowered f(1, 2)
CodeInfo(
2 1 โ Core.NewvarNode(:(_tmp1)) โ
โ Core.NewvarNode(:(_tmp2)) โ
โ Core.NewvarNode(:(_tmp3)) โ
โ Core.NewvarNode(:(_tmp4)) โ
โ %5 = (Main.rand)() โ
โ %6 = %5 < 0.1 โ
โโโ goto #3 if not %6 โ
3 2 โ %8 = (Main.repr)(a) โ
โ _tmp1 = (Main.string)(%8) โ
4 โ _tmp2 = (Main.string)(b) โ
5 โ _tmp3 = (Main.string)("This was the error ", _tmp1, ": ", _tmp2) โ
6 โ _tmp4 = (Main.ErrorException)(_tmp3) โ
7 โ %13 = (Main.throw)(_tmp4) โ
โโโ return %13 โ
3 โ return โ
)
The basic block containing the `throw` is this bit:
```jl
3 2 โ %8 = (Main.repr)(a) โ
โ _tmp1 = (Main.string)(%8) โ
4 โ _tmp2 = (Main.string)(b) โ
5 โ _tmp3 = (Main.string)("This was the error ", _tmp1, ": ", _tmp2) โ
6 โ _tmp4 = (Main.ErrorException)(_tmp3) โ
7 โ %13 = (Main.throw)(_tmp4) โ
โโโ return %13 โ
So that's what you would outline into its own function body. This approach seems pretty simple to me. It's not as general as an optimization that figures out when outlining something like scaling would improve performance, but honestly, I think that's a pretty different beast and is a case where it's perfectly reasonable for someone to do some manual outlining.
My comment was mostly regarding
throw should never be a common code path in Julia and throwing each kind of error could be its own outlined function, which would avoid an explosion of these automatically outlined throw functions.
and the note that it is not just the throwing of the error you want to outline, but also the creation of the inputs to the error.
and the note that it is not just the throwing of the error you want to outline, but also the creation of the inputs to the error.
Taking the entire basic block addresses that in most cases: that includes all "straight line" code leading up to the throw. Of course, you may want to do something a little more aggressive and outline any set of basic blocks that can only lead to the throw call. That would also handle cases like this:
function f(a, b)
rand() < 0.1 && throw(ErrorException("Blah $a: $(rand(Bool) ? b : "meh")"))
return a/b
end
julia> @code_lowered f(1, 2)
CodeInfo(
2 1 โ %1 = (Main.rand)() โ
โ %2 = %1 < 0.1 โ
โโโ goto #6 if not %2 โ
2 โ %4 = (Main.rand)(Main.Bool) โ
โโโ goto #4 if not %4 โ
3 โ #temp# = b โ
โโโ goto #5 โ
4 โ #temp# = "meh" โ
5 โ %9 = #temp# โ
โ %10 = (Base.string)("Blah ", a, ": ", %9) โ
โ %11 = (Main.ErrorException)(%10) โ
โ (Main.throw)(%11) โ
โโโ goto #6 โ
3 6 โ %14 = a / b โ
โโโ return %14 โ
)
Basic blocks 2, 3, 4 and 5 should all be outlined. This could be determined by post-domination.
@Liozou did some experiments on outlining? I don't recall if he managed to do the outlining from within the optimizer.
I did some attempts but unfortunately I was far from producing anything concrete really... The two main difficulties were:
throw
+ error input creation} subfunction, I guess it's pretty well-defined so this should be manageable.How about @throw expr
which first identifies the local arguments in expr รก la https://github.com/c42f/FastClosures.jl/blob/master/src/FastClosures.jl#L93, and then outlines the block at macro expansion time? cc @c42f
Name it @outline
and we can call it a day.
That's a pretty clever fast-and-dirty way to do it and way easier than the compiler pass. Could also potentially do caching and deduplication based on the outlined expression AST.
Interesting. Another possible name could be @unlikely
, in analogy to __unlikely
in the C code.
function f(a, b)
@unlikely rand() < 0.1 && throw(ErrorException("Blah $a: $(rand(Bool) ? b : "meh")"))
return a/b
end
Outlining should be strickly harder and less optimum than improving the compiler.
Improving the compiler to do this would be great.
If you do decide to use a macro, feel free to import whatever you like from FastClosures
. It's not complete but it might be a useful starting point.
No, not to improve the compiler to do this, but to make allocation in branch not an issue anymore.
Oh, by creating the GC frame only on the branch? I'd read some earlier comments about that and for some reason I assumed it was already done in 1.0.
Most helpful comment
This will not modify the local variable
z
;scale_outlined!
only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.