Julia: optimization request: automatically outline throws

Created on 17 Oct 2018 · 18Comments · Source: JuliaLang/julia

We're increasingly seeing this kind of manual optimization in Base code:

function f(...)
    check_cond() || throw(SomeError(...))
end

rewritten as

@noinline throw_some_error(...) = throw(SomeError(...))

function f(...)
    check_cond() || throw_some_error(...)
end

It would be great if the compiler could figure this out and do the transformation automatically. I don't think it needs to be terribly complicated either: throw should never be a common code path in Julia and throwing each kind of error could be its own outlined function, which would avoid an explosion of these automatically outlined throw functions.

Source

StefanKarpinski

❤1

Most helpful comment

scale_outlined!(z)

This will not modify the local variable z; scale_outlined! only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.

JeffBezanson on 17 Oct 2018

👍2

All 18 comments

Ref https://github.com/JuliaLang/julia/issues/23281

Regarding

throw should never be a common code path in Julia and throwing each kind of error could be its own outlined function

I think one of the complications is in things like

function f(a, b)
    check_cond() || throw(SomeError("This was the error $(repr(a)): $b"))
end

which is equivalent to

function f(a, b)
    if !check_cond()
        _tmp1 = string(repr(a))
        _tmp2 = string(b)
        _tmp3 = string("This was the error ", _tmp1, ": ", _tmp2)
        _tmp4 = SomeError(_tmp3)
        throw(_tmp4)
    end
end

Clearly, just outlining the throw(_tmp4) is useless (we want to outline the whole block) so there probably needs to be some escape analysis here showing that the _tmp values are only used when throwing? And the way the exception is created is not unique for the exception type.

KristofferC on 17 Oct 2018

Something analogous to this (I think, correct me if I'm wrong) impacts overflow/underflow checks in various simple operations (e.g. in complex division, which could be made ~20% faster by outlining).

A minimal example is:

function checkedadd(x::Float64, y::Float64)
    z=y*rand()
    if z>1.0e16 # some _unlikely_ scaling operation
        z/=rand()
    end
    return x+z
end

@noinline function scale_outlined(z::Float64)
    return z/rand()
end

function checkedadd_outlined(x::Float64, y::Float64)
    z=y*rand()
    if z>1.0e16 # some _unlikely_ scaling operation
        z=scale_outlined(z)
    end
    return x+z
end

The straightforward, non-outlined version of checkedadd(x,y) is slower than the outlined version (for x and y values that do not hit the scaling condition):
checkedadd : 5.688 ns checkedadd_outlined : 4.551 ns
(on the other hand, if the supposedly unlikely condition is actually met, the outlined version is slower---but that won't matter in the vast majority of overflow/underflow-related cases.)

thchr on 17 Oct 2018

there probably needs to be some escape analysis here

It's a very simple version of escape analysis, no? Could we simply outline all blocks that are determine to end with an unreachable (::Union{})? That's something that inference already knows.

mbauman on 17 Oct 2018

scale_outlined!(z)

This will not modify the local variable z; scale_outlined! only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.

JeffBezanson on 17 Oct 2018

👍2

scale_outlined!(z)

This will not modify the local variable z; scale_outlined! only changes the value of its own local binding. I don't know whether this affects performance but it's important to note.

Ah, thanks - it doesn't change the outcome though. I've updated the previous comment to correctly modify the variable z.

thchr on 17 Oct 2018

Clearly, just outlining the throw(_tmp4) is useless (we want to outline the whole block) so there probably needs to be some escape analysis here showing that the _tmp values are only used when throwing?

You can just outline the basic block a throw occurs in. Here's a slight variation of your example:

function f(a, b)
    if rand() < 0.1
        _tmp1 = string(repr(a))
        _tmp2 = string(b)
        _tmp3 = string("This was the error ", _tmp1, ": ", _tmp2)
        _tmp4 = ErrorException(_tmp3)
        throw(_tmp4)
    end
end

```jl
julia> @code_lowered f(1, 2)
CodeInfo(
2 1 ─ Core.NewvarNode(:(_tmp1)) │
│ Core.NewvarNode(:(_tmp2)) │
│ Core.NewvarNode(:(_tmp3)) │
│ Core.NewvarNode(:(_tmp4)) │
│ %5 = (Main.rand)() │
│ %6 = %5 < 0.1 │
└── goto #3 if not %6 │
3 2 ─ %8 = (Main.repr)(a) │
│ _tmp1 = (Main.string)(%8) │
4 │ _tmp2 = (Main.string)(b) │
5 │ _tmp3 = (Main.string)("This was the error ", _tmp1, ": ", _tmp2) │
6 │ _tmp4 = (Main.ErrorException)(_tmp3) │
7 │ %13 = (Main.throw)(_tmp4) │
└── return %13 │
3 ─ return │
)

The basic block containing the `throw` is this bit:
```jl
3 2 ─ %8  = (Main.repr)(a)                                                                          │
  │         _tmp1 = (Main.string)(%8)                                                               │
4 │         _tmp2 = (Main.string)(b)                                                                │
5 │         _tmp3 = (Main.string)("This was the error ", _tmp1, ": ", _tmp2)                        │
6 │         _tmp4 = (Main.ErrorException)(_tmp3)                                                    │
7 │   %13 = (Main.throw)(_tmp4)                                                                     │
  └──       return %13                                                                              │

So that's what you would outline into its own function body. This approach seems pretty simple to me. It's not as general as an optimization that figures out when outlining something like scaling would improve performance, but honestly, I think that's a pretty different beast and is a case where it's perfectly reasonable for someone to do some manual outlining.

StefanKarpinski on 17 Oct 2018

👍1

My comment was mostly regarding

throw should never be a common code path in Julia and throwing each kind of error could be its own outlined function, which would avoid an explosion of these automatically outlined throw functions.

and the note that it is not just the throwing of the error you want to outline, but also the creation of the inputs to the error.

KristofferC on 18 Oct 2018

and the note that it is not just the throwing of the error you want to outline, but also the creation of the inputs to the error.

Taking the entire basic block addresses that in most cases: that includes all "straight line" code leading up to the throw. Of course, you may want to do something a little more aggressive and outline any set of basic blocks that can only lead to the throw call. That would also handle cases like this:

function f(a, b)
    rand() < 0.1 && throw(ErrorException("Blah $a: $(rand(Bool) ? b : "meh")"))
    return a/b
end

julia> @code_lowered f(1, 2)
CodeInfo(
2 1 ─ %1  = (Main.rand)()                                                            │
  │   %2  = %1 < 0.1                                                                 │
  └──       goto #6 if not %2                                                        │
  2 ─ %4  = (Main.rand)(Main.Bool)                                                   │
  └──       goto #4 if not %4                                                        │
  3 ─       #temp# = b                                                               │
  └──       goto #5                                                                  │
  4 ─       #temp# = "meh"                                                           │
  5 ┄ %9  = #temp#                                                                   │
  │   %10 = (Base.string)("Blah ", a, ": ", %9)                                      │
  │   %11 = (Main.ErrorException)(%10)                                               │
  │         (Main.throw)(%11)                                                        │
  └──       goto #6                                                                  │
3 6 ─ %14 = a / b                                                                    │
  └──       return %14                                                               │
)

Basic blocks 2, 3, 4 and 5 should all be outlined. This could be determined by post-domination.

StefanKarpinski on 18 Oct 2018

@Liozou did some experiments on outlining? I don't recall if he managed to do the outlining from within the optimizer.

vchuravy on 18 Oct 2018

I did some attempts but unfortunately I was far from producing anything concrete really... The two main difficulties were:

finding which variables needed to be passed as arguments to the outlined function. In this case, if you only want to outline the {throw + error input creation} subfunction, I guess it's pretty well-defined so this should be manageable.
creating a function at compile-time, because that messes with the world age system. I'm not sure we ever really found a definite answer to that one.

Liozou on 18 Oct 2018

How about @throw expr which first identifies the local arguments in expr á la https://github.com/c42f/FastClosures.jl/blob/master/src/FastClosures.jl#L93, and then outlines the block at macro expansion time? cc @c42f

KristofferC on 31 Oct 2018

👍1

Name it @outline and we can call it a day.

mbauman on 31 Oct 2018

That's a pretty clever fast-and-dirty way to do it and way easier than the compiler pass. Could also potentially do caching and deduplication based on the outlined expression AST.

StefanKarpinski on 31 Oct 2018

👍1

Interesting. Another possible name could be @unlikely, in analogy to __unlikely in the C code.

function f(a, b)
    @unlikely rand() < 0.1 && throw(ErrorException("Blah $a: $(rand(Bool) ? b : "meh")"))
    return a/b
end

c42f on 1 Nov 2018

Outlining should be strickly harder and less optimum than improving the compiler.

yuyichao on 1 Nov 2018

Improving the compiler to do this would be great.

If you do decide to use a macro, feel free to import whatever you like from FastClosures. It's not complete but it might be a useful starting point.

c42f on 1 Nov 2018

No, not to improve the compiler to do this, but to make allocation in branch not an issue anymore.

yuyichao on 1 Nov 2018

Oh, by creating the GC frame only on the branch? I'd read some earlier comments about that and for some reason I assumed it was already done in 1.0.

c42f on 1 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings