The TestFloat64SpecialCases test in math/big (ratconv_test.go) is currently broken on the android/arm64 builder, with log:
https://build.golang.org/log/fbc581637a0375532f0d819ae67bc9b9abbe8ba9
Breakage started at e4ba400 (math/big: accept non-decimal floats with Rat.SetString).
cc @griesemer
Hm. It's only broken on android-arm-wikofever. @eliasnaur, it looks like you're the owner of that builder. Is there anything special about that arm device? It's a Cortex-A53 CPU?
I don't think it's a special device, other than it being an Android phone.
The test also broke on darwin/arm64 (iPhone):
https://build.golang.org/log/0b86dd35833a34f28785d335153946ed9a5a3691
which I assume uses an entirely different CPU.
Indeed, and that appears to be an iPhone 7+. Interesting.
Here's a smaller test case: https://play.golang.org/p/7OUzcEgoVGk
In the playground, and elsewhere, running this code produces the correct result:
r = 1/100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (true)
len(r) = 308
f = 1e-305 (false)
e = 1e-305
On android-arm-wikofever (and presumably darwin/arm64) we get:
r = 1/96541187637860552075819821540903887880024095043179244007497829804714413865809870901655967120022719342752763287637146850362464306041721929996152029995888654682648990460561537640800770715572834310810731676381392326036599721803873644105423265719091992037347983408132957025607386110373485836090285769528130906264626714751533056 (true)
len(r) = 325
f = 1e-323 (false)
e = 1e-305
That is the fraction is completely wrong.
The fraction is correct for 1e-164, and then wrong for 1e165.
Change https://golang.org/cl/169721 mentions this issue: math/big: temporarily disable buggy shlVU assembly for arm64
I've tracked this down a bit more. Here is a stand-alone test file (to be added to the math/big test files) that exposes the problem:
package big
import (
"strings"
"testing"
)
func TestSHLVU(t *testing.T) {
// compute 10^n via 5^n << n
const n = 165
p := nat(nil).expNN(nat{5}, nat{n}, nil)
p = p.shl(p, uint(n)) // <<< this doesn't work using 路shlVU
// p = nat(nil).shl(p, uint(n)) // <<< this works
got := string(p.utoa(10))
want := "1" + strings.Repeat("0", n)
if got != want {
t.Errorf("got %s\nwant %s\n", got, want)
}
}
When run with the 路shlVU assembly routine (in arith_arm64.s) enabled, this test fails on arm:
$ gomote run -path '$PATH,$WORKDIR/go/bin' $MOTE go/bin/go test -run SHLVU ../src/math/big
go_android_exec: adb wait-for-device exec-out while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done;
go_android_exec: adb push /var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test /data/local/tmp/go_android_exec/big.test-89906/big.test
[ 1%] /data/local/tmp/go_android_exec/big.test-89906/big.test
...
[100%] /data/local/tmp/go_android_exec/big.test-89906/big.test
/var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test: 1 file pushed. 10.1 MB/s (6329512 bytes in 0.597s)
go_android_exec: adb exec-out export TMPDIR="/data/local/tmp/go_android_exec/big.test-89906"; export GOROOT="/data/local/tmp/go_android_exec/goroot"; export GOPATH="/data/local/tmp/go_android_exec/big.test-89906/gopath"; export CGO_ENABLED=0; export GOPROXY=; export GOCACHE="/data/local/tmp/go_android_exec/gocache"; export PATH=$PATH:"/data/local/tmp/go_android_exec/goroot/bin"; cd "/data/local/tmp/go_android_exec/goroot/src/math/big"; '/data/local/tmp/go_android_exec/big.test-89906/big.test' -test.run=SHLVU; echo -n exitcode=$?
--- FAIL: TestSHLVU (0.00s)
shl_test.go:18: got 999999999999999999999999999999999999997461262869164377797676543286813257059020658007941683889923711150197932805888983316150450477342246492361833965695145908050591744
want 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
FAIL
exitcode=1go_android_exec: adb exec-out rm -rf /data/local/tmp/go_android_exec/big.test-89906
FAIL math/big 0.988s
Error running run: exit status 1
When the assembly routine is disabled, the code works fine.
When the assembly routine is enabled but the test case is tweaked slightly such that it uses a nat(nil) receiver rather than p, then the code also seems to work. This is likely an indicator as to what might be wrong with the assembly code.
When fixing this, one should also look into 路shrVU and verify that it doesn't suffer from the same problem.
cc: @erifan @josharian who seem to have written some of this code
Looks like this routine was written by @erifan. The problem is definitely not obvious at a first read through the code. Let me know if you'd like me to take a closer look; otherwise, I will (tentatively) leave this for @erifan.
Also, the shift==0 case is missing an optimization. If z.ptr == x.ptr, there's no need to do a copy at all; you can simply return.
The pure Go code uses the copy built-in to do the copy, which does this optimization under the hood. amd64 is missing the shift==0 optimization entirely. I'll file a separate issue for that.
It's definitively subtle. The problem doesn't appear for n < 165, at least for https://play.golang.org/p/7OUzcEgoVGk code.
Sorry, I'll fix this bug.
The problem is that the loop in the assembler code goes forward through the arrays. It has to go backward. Otherwise the output overwrites the input.
The same is true of shrVU.
Of course I'm wrong about shrVU, which has to go forward. Only shlVU has to go backward.
@bradfitz I can't imagine this doesn't fail on all arm64 CPUs, yet I see no arm64 builders on build.golang.org apart from the mobiles. Are they gone or am I blind?
With generic arm64 builders, gri wouldn't have to go through that painful debugging session.
The code has been written and I will submit it immediately.
Change https://golang.org/cl/169780 mentions this issue: math/big: fix the bug in assembly implementation of shlVU on arm64
I found that the implementation of shlVU on amd64 does not consider the case where the high byte of z overlaps with the low byte of x, so I also did not consider this point. Do we need to consider this situation?
We only have to consider word level overlap, not byte level overlap, if that is what you mean. But we do have to consider the case where the high word of z overlaps with the low word of x. I don't see any obvious problem in the amd64 implementation but maybe there is one.
Byte-overlap should never happen, it's either one or more words, or none - otherwise we'd have a serious bug elsewhere. You can safely assume that all pointers are word-aligned.
The assembly code should behave essentially the same as shlVU_g in arith.go.
Yes I mean Word level overlap. And I can construct an negative case for this situation, which can prove that the current pure Go implementation are defective.
func TestShlVUOverlap(t *testing.T) {
a := argVW{nat{2, 2, 2, 2, 2, _M << 1 &_M, 1}, nat(nil).set(nat{1, 1, 1, 1, 1, 1, 1, 1, 1, _M}), 1, 1}
// save a.x for error message because it will be overrided.
b := nat(nil).make(10)
copy(b, a.x)
got := a.x[0:6].shl(a.x[4:10], uint(a.y))
for i, zi := range got {
if zi != a.z[i] {
t.Errorf("special shl(%v, %v)\n\tgot z[%d] = %#x; want %#x", b[4:10], a.y, i, zi, a.z[i])
break
}
}
}
Add this test to file math/big/arith_test.go, and run
$ ../../../bin/go test -run TestShlVU
With the pure Go implementation, we get:
--- FAIL: TestShlVUOverlap (0.00s)
arith_test.go:260: special shl([1 1 1 1 1 18446744073709551615], 1)
got z[0] = 0x4; want 0x2
FAIL
exit status 1
FAIL math/big 0.072s
The current use cases of function shl didn't expose this bug, and as function nat.shl is not a public function, so I wonder if we need to consider this situation when implementing this function or is there some mechanisms that can guarantee this situation never happen?
shlVU is an internal function, so we don't need to consider all possible ways to call it. We only need to consider uses of the exported methods. We can trust that the code in the math/big package understands the limitations of shlVU.
We can trust that the code in the math/big package understands the limitations of
shlVU.
The code might, but I couldn't tell you those limitations offhand. It'd be good to document them.
Once they're documented, we can start fuzzing them using those documented limitations. Since there are multiple implementations, these are good fuzz candidates. (I have started on this.)
Change https://golang.org/cl/168257 mentions this issue: math/big: add comprehensive aliasing tests (and minor fixes to Exp, Rand)
@gopherbot Please open backport issue to 1.13
Backport issue(s) opened: #32940 (for 1.13).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
@erifan Why do you think this should be backported to Go 1.13? Go 1.13 hasn't been released yet, it can't have backports. Did you mean another release?
Hi @dmitshur , I just hope that this fix can go into Go1.13. This fix was merged before the release date of Go1.12.6, but it did not enter the Go1.12.6. I'm not sure if it will go into Go.1.13, so I raised this request.
It would be better if it could get into an earlier release, because some programs are now affected by this bug, such as https://github.com/shopspring/decimal.
@erifan Since the fix is in master now, it will be a part of Go 1.13 when we release that. We create new release branches and the cherry-pick process begins at the time of the first release candidate, but we haven't cut that yet (only the first beta has been made by now). (See https://golang.org/s/release for more info about the Go release cycle.)
I'll close the 1.13 cherry-pick issue since it's not necessary.
@dmitshur Ok, thanks.
Excuse me @dmitshur , There's another question. Will there be a Go1.12.7 version before Go1.13? If so, I hope this fix can be ported in Go1.12.7. Or, if there is Go1.12.7, no backport is needed and this fix will be included? Thanks.
There will be a Go 1.12.7 release soon (we aim to make minor releases monthly). We can consider cherry-picking this fix to 1.12 if it meets the backporting criteria (described here). If it doesn't, then it'll be available starting with 1.13.
@erifan According to the original issue, this breakage started with commit e4ba400. That commit isn't a part of 1.12. So does this problem exist in 1.12?
@dmitshur In Go, the breakage started from commit https://github.com/golang/go/commit/e4ba40030f9ba4b61bb28dbf78bb41a7b14e6788, but for user code, the affected version may be earlier than 1.12, and there is currently no release version that fixes this issue. The user of Project https://github.com/shopspring/decimal complained to me about this issue. Since it is a library function, users will use it frequently, so I think this is a security issue. If possible, please port it to Go1.12.7, thank you.
Ok, thanks @erifan. I've repurposed #32940 as the 1.12 backport issue.
I'll discuss whether it meets the backporting criteria with the team tomorrow. If it gets approved for a cherry-pick, we'll need to make a cherry-pick CL for it as described here; would you like to work on that?
@dmitshur Yeah, I'd like to, thanks.
Change https://golang.org/cl/185041 mentions this issue: [release-branch.go1.12] math/big: fix the bug in assembly implementation of shlVU on arm64