strconv.FormatInt is quite slow, we can simply looping divide 10 to get length.

Maybe we can use a table such [9, 99, 999, 9999, 99999, ...] to get the len by comparison.
It seems faster than devide.
@ngaut interesting... how did you get the time, by using some "go benchmark" thing? I haven't searched or tried it yet. Perhaps later I can try to see if I can also run some benchmark and output the time you did now. (Did you get the time while running a specific test or...?)
You can use go tool pprof to get the profile, and use the command "list functionName" to get the source list. I was running sysbench point get, the full command is:
sysbench/1.0.15/share/sysbench/oltp_point_select.lua --db-driver=mysql --mysql-host=IP --mysql-port=4000 --mysql-user=root --mysql-password="" --mysql-db=test --tables=1 --table-size=10000000 --report-interval=3 --threads=512 --time=600 run
@liufuyang
@ngaut 是不是主要是170行用时最多? 我是不是也可以先简单点用go benchmark去测一下170行的用时, 看看能否找到其他办法(比如你说的looping)而不用len(strcov.FormatInt()) 来节省一些时间?
You are right. please go ahead.
@ngaut Perhaps something like this?
https://github.com/pingcap/tidb/pull/9726
Hi, I think that binary search is much more faster than any implementations now, and more stable for integers in any range. For any Uint64, we only need exactly 4 or 5 compare operation to get the decimal length.
I write the code and benchmark it with version A (FormatUint now), version B(#9726) and version C(Binary search), and version C is much faster than A and B.
Here is my implementation, test and benchmark code, https://paste.ubuntu.com/p/wFDch63QRz/ , just put it in any where in GOPATH and use go bench you can run it easily.
Here is the test result on my mac:
goos: darwin
goarch: amd64
pkg: github.com/TennyZhuang/tmp/fastlen
BenchmarkFastLenUint/A-8 1 34276262694 ns/op 15995662720 B/op 500000003 allocs/op
BenchmarkFastLenUint/B-8 1 8764096826 ns/op 0 B/op 0 allocs/op
BenchmarkFastLenUint/C-8 1 3525812797 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/TennyZhuang/tmp/fastlen 59.293s
I will clean the code and make a PR in the later.
@TennyZhuang I am not 100% sure but I guess in real cases the input might be a distribution that is heavily skewed on small values. With this assumption the binary search might not be the fastest method? 🤔
@liufuyang I think that stable performance is more important.
And for skewed small values, /=10 directly is still a good solution, we can use some strategies to reduce the cost for large value, like fast pass by (x/=10000, cnt+=4), etc. Then we don't need 20 compare operations for large value.
Usually, the value is less than 64k, so we can just iterator the array instead of binary search. And most of value may less than 512. @TennyZhuang
For 512,we need 3 compare operation in iterator, and we need only 4 compare
operation in binary search. I don’t think it’s worth to lost the stable
performance.
We can do some optimization for small value and keep stable performance is
use skewed pivot in binary search, which result that:
for smaller value(Len <= 8), we need 4 comparation, for bigger value(Len
8), we need
goroutine notifications@github.com于2019年3月19日 周二10:25写道:
Usually, the value is less than 64k, so we can just iterator the array
instead of binary search. And most of value may less than 512.
@TennyZhuang https://github.com/TennyZhuang—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pingcap/tidb/issues/9699#issuecomment-474173055, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIvK3vC8-sQ8Bs5GzR7zbTvIws673bqSks5vYEqugaJpZM4bs3AT
.
Good idea. Could you file a PR?
@TennyZhuang I just made a small change (add two simple if condition) like this to reduce some comparison time. It seems working pretty well and the benchmark result got reduced as well. I think it would probably work well in real case as a number is between 0 to 9999999 the comparison time is around 3 to 5 times and no more than that. Also the code looks quite simple.
func LenOfUint64Fast(x uint64) int {
i := 1
// reduce comparison time for large numbers
if x > 999 {
i = 4
}
if x > 9999999 {
i = 8
}
for ; ; i++ {
if x <= uintSizeTable[i] {
return i
}
}
}
So I have tried this code
Result - for testing int values range from 0 to 64000:
BenchmarkFastLenUint/A-Original___________-8 1 22944293769 ns/op
BenchmarkFastLenUint/B-ForLoopTable_______-8 1 4294984461 ns/op
BenchmarkFastLenUint/B-ForLoopTable-faster-8 1 3068998904 ns/op
BenchmarkFastLenUint/C-BinarySearch_______-8 1 2984798412 ns/op
Result - for testing int values range from 0 to uint64_max:
BenchmarkFastLenUint/A-Original___________-8 1 36181359550 ns/op
BenchmarkFastLenUint/B-ForLoopTable_______-8 1 11351055496 ns/op
BenchmarkFastLenUint/B-ForLoopTable-faster-8 1 8545260401 ns/op
BenchmarkFastLenUint/C-BinarySearch_______-8 1 4496741431 ns/op
It seems to me that, as long as we know the most input values are small, then I think a simple solution might be good enough?
bits.LenXX is binary length, not the decimal length
Yiding Cui notifications@github.com于2019年3月19日 周二17:27写道:
For unsigned ones. Golang has implement bits.LenXX() which has a good
performance in my memory.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pingcap/tidb/issues/9699#issuecomment-474262369, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIvK3vT8RtHrtxwWTilJSBS4-zc4hSpMks5vYK1tgaJpZM4bs3AT
.
@liufuyang Why binary is faster in 0 to 64000, or even in 0-512 in my test...
goos: darwin
goarch: amd64
pkg: github.com/TennyZhuang/tmp/fastlen
BenchmarkFastLenUint/A-8 1 12867804162 ns/op 1287473488 B/op 402335460 allocs/op
BenchmarkFastLenUint/B-8 1 2651512545 ns/op 0 B/op 0 allocs/op
BenchmarkFastLenUint/C-8 1 2448256680 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/TennyZhuang/tmp/fastlen 30.197s
Test result for rand.Uint64() % 512
@liufuyang I don't think the complexity of code is important in this function, because it's never need to be modified again (If needed, just rewrite it).
And in the fact, binary search also can be written in 3-4 line codes and I really know how to implement it, but because Golang has no promise to do loop expansion optimization, and has no macro support, I expand it manually for extreme best performance.
@TennyZhuang Sounds cool 😃Do you want to open a PR or you want to use this PR?
If you open a PR, feel free to use/sherry-pick some of my commits here (or you can make a PR to merge into my branch as well, if it is helpful or simpler in anyway. I added a few more tests here, they might be useful.)
@liufuyang I can make some change on your PR if I can get the permission to commit on it~
Maybe at the weekend
@TennyZhuang Thanks. I think it might be cumbersome to give permissions to others to commit onto it 🤔 But I think you can use your fork, add my fork to your git remote, make a branch out of my branch, then to open PR onto my repo, then I can merge your change into my current PR. Let me know whether it works :)
@TennyZhuang I slightly changed the code like this And the result for range 0 to 64000 is faster than the current binary search code.
BenchmarkFastLenUint/A-Original___________-8 1 21884530878 ns/op
BenchmarkFastLenUint/B-ForLoopTable_______-8 1 3826016657 ns/op
BenchmarkFastLenUint/B-ForLoopTable-faster-8 1 2499548809 ns/op
BenchmarkFastLenUint/C-BinarySearch_______-8 1 2698554376 ns/op
But I guess if you further change your binary search to optimise for 0 to 64000 range you can get it faster. Then we have to see how others think about the code complexity vs performance gained :)
@TennyZhuang @ngaut I have tried to make the binary search also optimised for uint range from 0 to 99999. And now seems this is the fastest version I can get. testing code
Revison 3 output:
go test -bench BenchmarkFastLenUint
goos: darwin
goarch: amd64
pkg: github.com/liufuyang/tmp
BenchmarkFastLenUint_full_range/A-Original___________-8 1 37365669595 ns/op
BenchmarkFastLenUint_full_range/B-ForLoopTable_______-8 1 11693312380 ns/op
BenchmarkFastLenUint_full_range/B-ForLoopTable-faster-8 1 8057200995 ns/op
BenchmarkFastLenUint_full_range/C-BinarySearch_______-8 1 4395312270 ns/op
BenchmarkFastLenUint_0_to_64000/A-Original___________-8 1 20238129710 ns/op
BenchmarkFastLenUint_0_to_64000/B-ForLoopTable_______-8 1 4140235502 ns/op
BenchmarkFastLenUint_0_to_64000/B-ForLoopTable-faster-8 1 2459727521 ns/op
BenchmarkFastLenUint_0_to_64000/C-BinarySearch_______-8 1 2396862771 ns/op
BenchmarkFastLenUint_0_to_512/A-Original___________-8 1 16671585365 ns/op
BenchmarkFastLenUint_0_to_512/B-ForLoopTable_______-8 1 3296527923 ns/op
BenchmarkFastLenUint_0_to_512/B-ForLoopTable-faster-8 1 3285367634 ns/op
BenchmarkFastLenUint_0_to_512/C-BinarySearch_______-8 1 2499339741 ns/op
PASS
ok github.com/liufuyang/tmp 154.225s
I will make a PR onto the existing PR to discuss whether to use this version.