When optimizing code, I often run up against cases in which the compiler is missing a key fact. Sometimes it could but does not infer it; sometimes there’s no reasonable way for the compiler to know.
In those cases, there is nothing to do but drop to assembly. (In normal code you could write if !x { panic }
, but in many of these cases, that is prohibitively expense.)
I propose that we add unsafe.Assume. It accepts a boolean expression. The expression is typechecked but never evaluated. However, the compiler may assume that it evaluates to true when compiling other code.
I imagine the most common uses would be things like unsafe.Assume(p != nil)
, unsafe.Assume(0 <= i && i < len(s))
, and unsafe.Assume(x < 64)
, for nil checks, bounds checks, and shift amounts, respectively.
What happens if the assertion does not hold?
The compiler may generate code that does all sorts of awful things. It is decidedly unsafe.
To me, this sounds like "Assume". I would think "Assert" means if the assertion fails, the program crashes (more like the C assert function).
@cherrymui thanks, proposal updated.
Another approach would be unsafe.Unreachable()
. The compiler can assume that no code path will lead to a call of that function. This effectively lets you write
if p == nil {
unsafe.Unreachable()
} else {
// Here the compiler knows that p != nil.
}
and then at some point during code generation the compiler removes the p == nil
test and the Unreachable
call.
@ianlancetaylor I like it. One nice thing about that is that it mirrors the not-unsafe way to write it:
if p != nil {
panic("unreachable")
}
becomes
if p != nil {
unsafe.Unreachable()
}
And the specification is simpler.
The only downside I see is that it doesn't fit on one line. :P It also doesn't scream unsafe at me the way unsafe.Assume
does, even though they are expressively equivalent...but I guess the package name suffices for that.
What happens if you ask the compiler to assume something it knows is false? Like something that reduces to
x := -1
unsafe.Assume(x > 0)
What happens if you ask the compiler to assume something it knows is false?
Then you're in UB world: It depends on the compiler and the details. I'm guessing that in practice, with cmd/compile right now, probably not too much. With gccgo, you probably start spewing random numbers until you generate the kubernetes code base.
If the compiler doesn't know that x <= -1
, sure UB, but, if the compiler has proved that x <= -1
and you ask it to assume that x > 0
, it seems like it should error out.
it seems like it should error out.
Perhaps optionally? I think that a compiler that simply ignored unsafe.Unreachable
/unsafe.Assume
should be spec-compliant.
I like this a lot for my code, I'm terrified of what happens if other people have it.
I'm not sure we should go down this road. This is basically a "performance annotation" and it's one more thing everyone needs to understand. I'm happy to give up 5% in performance to never have to make (or see) any such annotations.
There's a lot of other annotations people could want. unsafe.Unlikely
, unsafe.UnrollThisLoop
, etc.
If we do this I like Ian's API.
(In normal code you could write
if !x { panic }
, but in many of these cases, that is prohibitively expense.)
@josharian could you clarify what part is expensive? The added code size? Or perhaps making some small functions non-inlineable with the current compiler?
Another idea that comes to mind is if the compiler treated panic("unreachable")
in a special way, generating less code for it as there would likely be many identical calls within a single binary.
Change https://golang.org/cl/165358 mentions this issue: cmd/compile: add unsafe.Unreachable
I wrote a rudimentary implementation so that I could experiment with it: https://golang.org/cl/165358
I'm happy to give up 5% in performance to never have to make (or see) any such annotations.
Some data points, for addVV_g
in math/big
, based just on what I could hack in a half hour...
Going from my recently-optimized CLs to adding a few unsafe.Unreachable annotations:
name old time/op new time/op delta
AddVV/1-8 5.49ns ± 5% 4.36ns ± 1% -20.58% (p=0.000 n=45+41)
AddVV/2-8 6.95ns ± 4% 4.88ns ± 1% -29.81% (p=0.000 n=48+43)
AddVV/3-8 8.25ns ± 6% 6.18ns ± 1% -25.14% (p=0.000 n=44+46)
AddVV/4-8 9.34ns ± 4% 7.01ns ± 2% -24.94% (p=0.000 n=47+47)
AddVV/5-8 10.7ns ± 2% 7.7ns ± 4% -28.48% (p=0.000 n=45+44)
AddVV/10-8 17.7ns ± 4% 11.1ns ± 2% -37.22% (p=0.000 n=50+39)
AddVV/100-8 159ns ± 1% 118ns ± 2% -25.78% (p=0.000 n=34+42)
AddVV/1000-8 1.43µs ± 5% 1.14µs ± 3% -19.99% (p=0.000 n=47+49)
AddVV/10000-8 14.2µs ± 5% 11.4µs ± 2% -20.15% (p=0.000 n=49+50)
AddVV/100000-8 143µs ± 4% 113µs ± 2% -21.01% (p=0.000 n=48+48)
Going from regular to an unrolled loop, which requires yet more annotations:
name old time/op new time/op delta
AddVV/1-8 6.38ns ± 1% 4.31ns ± 1% -32.50% (p=0.000 n=38+43)
AddVV/2-8 9.52ns ± 1% 5.17ns ± 1% -45.67% (p=0.000 n=43+44)
AddVV/3-8 11.4ns ± 1% 6.3ns ± 2% -44.14% (p=0.000 n=46+43)
AddVV/4-8 13.6ns ± 1% 7.2ns ± 1% -47.09% (p=0.000 n=44+40)
AddVV/5-8 22.5ns ± 3% 9.8ns ± 2% -56.62% (p=0.000 n=45+43)
AddVV/10-8 40.6ns ± 4% 14.5ns ± 4% -64.35% (p=0.000 n=48+49)
AddVV/100-8 414ns ± 2% 163ns ± 2% -60.61% (p=0.000 n=49+50)
AddVV/1000-8 4.04µs ± 2% 1.62µs ± 4% -59.89% (p=0.000 n=35+46)
AddVV/10000-8 40.3µs ± 1% 16.8µs ± 4% -58.38% (p=0.000 n=32+49)
AddVV/100000-8 403µs ± 2% 173µs ± 8% -57.18% (p=0.000 n=34+49)
This gets us within striking distance of the hand-optimized assembly for small n. Going from my unrolled Go+Unreachable code to assembly:
name old time/op new time/op delta
AddVV/1-8 4.07ns ± 3% 4.31ns ± 1% +5.76% (p=0.000 n=42+43)
AddVV/2-8 4.78ns ± 2% 5.17ns ± 1% +8.17% (p=0.000 n=42+44)
AddVV/3-8 5.84ns ± 2% 6.34ns ± 2% +8.65% (p=0.000 n=46+43)
AddVV/4-8 6.75ns ± 4% 7.18ns ± 1% +6.35% (p=0.000 n=50+40)
AddVV/5-8 7.51ns ± 2% 9.76ns ± 2% +29.86% (p=0.000 n=47+43)
AddVV/10-8 9.84ns ± 4% 14.48ns ± 4% +47.11% (p=0.000 n=49+49)
AddVV/100-8 49.5ns ± 5% 163.1ns ± 2% +229.25% (p=0.000 n=48+50)
AddVV/1000-8 434ns ± 4% 1622ns ± 4% +273.53% (p=0.000 n=50+46)
AddVV/10000-8 5.50µs ± 4% 16.79µs ± 4% +204.95% (p=0.000 n=41+49)
AddVV/100000-8 61.0µs ± 9% 172.8µs ± 8% +183.11% (p=0.000 n=49+49)
And the main difference now is that the hand-rolled assembly can do ADC mem reg
, rather than loading from memory into a register and then ADC reg reg
. That's fixable straightforwardly in the compiler. So I think it is possible this could let us remove the assembly entirely.
I plan to do the ADC upgrade anyway. I'll report back once I've done it; might be a few days.
None of the unsafe.Unreachable
annotations I've provided could ever be inferred by the compiler; they come from upstream invariants.
And if you want to see what the simple, hinted code looks like:
func addVV_g(z, x, y []Word) (c Word) {
for i := 0; i < len(z); i++ {
if i >= len(x) || i >= len(y) {
// The calling code ensures that x and y are no longer than z.
// The compiler doesn't see this; this hint prevents bounds
// checks in the bits.Add call below.
unsafe.Unreachable()
}
zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
z[i] = Word(zi)
c = Word(cc)
}
return
}
There's a lot of other annotations people could want. unsafe.Unlikely, unsafe.UnrollThisLoop, etc.
Fair enough. That said, I think this is plausibly qualitatively different:
I can see wanting a way to hint about branch likeliness, although that's not actually unsafe. And there have been myriad performance compiler requests around nil checks, BCE, etc., and almost none around branch likeliness.
Loop unrolling can be done manually when you really need it; these annotations cannot.
I'd be interested to see the benchmarks for addVV_g
with the bounds checks hoisted out of the loop:
func addVV_g(z, x, y []Word) (c Word) {
n := len(z)
x, y = x[:n:n], y[:n:n]
for i := 0; i < n; i++ {
zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
z[i] = Word(zi)
c = Word(cc)
}
return
}
I suspect the code using unsafe.Unreachable()
is faster for a small number of iterations. I would expect, however, for the performance of the two different implementations to converge as the number of iterations is increased.
Also, if the compiler inlined code using for
loops (https://github.com/golang/go/issues/14768) then we might see bounds checks optimized/merged away in real code (probably not in micro-benchmarks). Inlining seems likely to be a big win for addVV_g
which has a very short and simple loop. It is also a massive advantage that pure Go implementations of small functions like this have over their assembly equivalents.
I can see annotations like unsafe.Unreachable()
being useful when the slice index is coming from an external source that is known in-range. This happens a lot in the compiler. But this also seems like a scenario where we really do want a bounds check to catch bugs.
EDIT: dropped the length check panics, I don't think they are necessary for this use case and with them the function doesn't work when len(z) == 0
.
I did a few benchmarks on the generic code to see what I got. Hoisting the bounds check on y seemed to make very little difference. Which is interesting because they did remove the bounds check from the loop, implying the critical path is the carry chain (I am using an i5 3570k).
Secondly, I tried applying David's for loop inlining experiment CL 148777. I still applied the bounds check hoisting change to get rid of the range
statement since David's patch doesn't handle that. This did seem to make a big difference to small loop lengths, which makes sense since it saves writing 9 words to the stack (3 for each slice):
name old time/op new time/op delta
AddVV/1 5.70ns ± 0% 1.91ns ± 3% -66.56% (p=0.000 n=7+10)
AddVV/2 6.83ns ± 0% 2.73ns ± 1% -60.08% (p=0.000 n=10+8)
AddVV/3 7.75ns ± 0% 3.59ns ± 1% -53.69% (p=0.000 n=9+10)
AddVV/4 9.20ns ± 0% 4.59ns ± 0% -50.05% (p=0.000 n=10+10)
AddVV/5 10.6ns ± 0% 5.5ns ± 2% -48.16% (p=0.000 n=10+10)
AddVV/10 16.8ns ± 0% 10.5ns ± 1% -37.63% (p=0.000 n=10+9)
AddVV/100 128ns ± 0% 120ns ± 1% -6.38% (p=0.000 n=10+10)
AddVV/1000 1.13µs ± 0% 1.16µs ± 1% +2.14% (p=0.000 n=10+10)
AddVV/10000 11.5µs ± 0% 11.8µs ± 1% +2.89% (p=0.000 n=10+7)
AddVV/100000 117µs ± 0% 120µs ± 2% +2.68% (p=0.000 n=9+10)
I suspect the slightly slower speed limit is down to alignment noise or a similar effect.
Patch to remove range statement and bounds check on y in the loop:
--- a/src/math/big/arith.go
+++ b/src/math/big/arith.go
@@ -54,7 +54,9 @@ func divWW_g(u1, u0, v Word) (q, r Word) {
// The resulting carry c is either 0 or 1.
func addVV_g(z, x, y []Word) (c Word) {
- for i := range x[:len(z)] {
+ n := len(z)
+ x, y = x[:n:n], y[:n:n]
+ for i := 0; i < n; i++ {
zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
z[i] = Word(zi)
c = Word(cc)
The expression is typechecked but never evaluated. However, the compiler may assume that it evaluates to true when compiling other code.
That seems dangerous to me: it adds a large space of undefined behavior (that is, any program that violates the assumption), but no way to validate that the program does not actually exhibit that behavior (contrast -fsanitize=undefined
).
To me, the biggest advantage (by far!) of undefined behavior is that it can be sanitized: unlike a Go panic
(which can be recover
ed and relied upon for control flow, as in the case of fmt
with nil pointers), a violation of an unsafe.Assume
condition _always_ indicates a programmer error — and can be reported as such.
So instead, I would propose that the expression _always_ be evaluated, and the result discarded except in sanitized (perhaps race-detector-enabled?) builds. That makes it possible to optimize away arithmetic expressions and pure functions (by far the common use-cases), but allows a sanitizer to verify the assumptions — without producing inconsistent behavior if the expression has side-effects.
unsafe.Unreachable
would also address the sanitization problem: the compiler can always choose to insert dynamic checks — and report violations — on any path marked unreachable.
So perhaps that's another argument in favor of Unreachable
over Assume
.
I think there is unanimity at this point that Unreachable is the better API.
I’m not really sure I see a use case here besides supporting assert()
like behavior. I personally like assertions and don’t really buy into the opinions in the FAQ here. I specifically disagree that assertions are a way to not handle an error, because compiling out the assertion code is only a (production) compile-time optimization.
Perhaps we may want to suggest that assertions:
panic
on assertion failure as uncovered code (for purposes of test coverage reporting).My thinking for suggesting we not compile assertions out is to stay in line with the rationale for including bounds checking in arrays, etc.
The use case here is in guiding the compiler in making optimization choices, when the programmer has information unavailable to the compiler. See the comments above at https://github.com/golang/go/issues/30582#issuecomment-469805570 and https://github.com/golang/go/issues/30582#issuecomment-469810081 , where @josharian used his knowledge that z
was at least as long as x
or y
to get better code. That fact is ensured by the callers of the unexported function, but the compiler is not able to deduce it.
I apologize that my previous comment were a bit unclear.
As you targeted this proposal at the unsafe
package, I believe we agree that its direct goal runs counter to the Go philosophy of being a “memory-safe” language and that it tries to increase undefined behavior in return for increased performance. If this proposal had been made specifically to remove bounds-checks that a programmer decided were unnecessary, I believe the proposal would have been dismissed out of hand, as it harkens back to the C philosophy of trusting the programmer to not do something stupid. In this era of cheap and abundant CPU power, I think these micro-optimizations are foolhardy. I think time is better spent helping equip the compiler to predict these more complicated relationships itself.
This leads me to the leap that I made to turn this into a meaningful assert()
invariant and the exciting compiler optimizations possible if the compiler took full advantage of these assertion expressions. (I realize this extrapolation is somewhat of a 180° from the original intent of this proposal, but may result in similarly optimized code that is actually safer.)
Firstly, let me clarify my expectations of assertions:
So, how can adding assertions make the code more optimized?
I realize this is essentially a counter-proposal to the spirit of the one you were proposing, and requires much deeper compiler inference to fully realize it’s full potential, but I think it can offer many of the gains you were looking for in a much safer way.
I welcome your comments and would like to hear alternate viewpoints defending the original proposal as well.
You can build assert
using build tags, panic
, and unsafe.Unreachable
.
Improving the compiler is always welcome. We do some of the things you suggest: we use the information provided by assertions when possible and (via inlining) we cull unnecessary assertions and checks. (These assertions take many forms, among them nil pointer checks, bounds checks, and many plain old branches.)
We generate compile errors when some assertions are known to fail, such as indexing with a constant that is known to be out of bounds. But we can't increase the set of compilation errors much without it being a language change.
Improving the compiler further isn't always an option. We may not have the time. We may not know how. We may decide that it isn't worth the concomitant compiler slowdown. And sometimes humans know things the compiler simply never will. This proposal is intended to act as a pressure release valve in such scenarios, when necessary for performance in critical sections of code.
The question of whether there is sufficient value in adding a language construct for checked pre- and post-conditions is a very interesting one. But it is not this proposal, nor is it really an alternative to this proposal. From the compiler's perspective, you can simulate a checked pre-condition with an if
statement and a panic
; if that sufficed, I would not have filed this proposal. (Checked post-conditions are more complicated, but also less common and usually less relevant to compiler optimizations.)
If you'd like to pursue your assertion proposal ideas further, I suggest you file a new issue to discuss them, and link to it here.
@josharian, I agree with you on all points.
I do think that it is worth pointing out where these two proposals do overlap:
I do concede this is a very small slice of the Ven diagram, and I expect your proposal to bear fruit long before the compiler’s static analysis subsystem is rich enough to make the goals I described above remotely achievable.
I look forward to submit a compiler-aware assertion mechanism as a future proposal.
Also, several ranged value type proposals that I have already read may dove-tail very nicely with these.
I was thinking some more about the properties of unsafe.Unreachable
. Specifically:
Unreachable
statement is ever reached, and should do so by default.stderr
output of such a crash should indicate where the invariant violation occurred.Unreachable
statement.It occurs to me that we actually already have a statement that satisfies those properties:
func Unreachable() {
go panic("unreachable")
}
Since the panic
occurs a new goroutine, the compiler should know that it cannot be recovered and the program will terminate, and the compiler and runtime are allowed to schedule and run that goroutine immediately. The traceback for a panic currently indicates where the panic occurs, and to my knowledge nothing prevents us from adding more information about how the goroutine was created.
On the other hand, since the new goroutine does not communicate or synchronize with any other goroutine in the program — and, in particular, does not have any _happens-before_ relationship with any remaining statement in main.main
— the spec (by my reading) does not require the compiler or runtime to ever schedule it at all, and thus a very aggressive optimizing compiler could omit it entirely.
I'm simultaneously awed and horrified. Congratudolences?
@bcmills that’s a very clever bit of language-lawyering. But I think I’d rather be explicit. And force importing unsafe, with all the signaling and extra-linguistic guardrails that come with that.
Interesting, but the idea that there should be a difference between debug mode and release mode is false. I have spent years doing life critical software in a large, famous, company, and there the rule was the production binary must be the tested binary, which almost always meant the binary with debug information available, in order to be able to do postmortem analysis in case of a crash.
Nevertheless unsafe.Unreachable
seems to give some substantial performance increases. But I think one of the strong points of Go is that is has few undefined behavior. So I think unsafe.Unreachable can still serve as a performance hint for statement that follow it, but it should always cause a panic in case it is reached, and the check itself that leads to unafe.Unreachable should not be optimized out in any case.
So I think unsafe.Unreachable can still serve as a performance hint for statement that follow it, but it should always cause a panic in case it is reached, and the check itself that leads to unafe.Unreachable should not be optimized out in any case.
We already have this: panic
. In the cases under discussion, it is the check itself that is expensive, performance-wise.
Most helpful comment
I'm simultaneously awed and horrified. Congratudolences?