The results of the parse_int benchmark microbenchmark reported on http://julialang.org/ are somewhat misleading because the C version used as a baseline is written in an inefficient way. I see at least two obvious problems with the C implementation:
1) strlen
is unnecessarily called in https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.c#L30. Moreover it is called at every iteration! An idiomatic C code for that would be something along the lines of:
long parse_int(const char *s, long base) {
long n = 0;
for (; *s; ++s) { // note that strlen is not called
char c = *s;
long d = 0;
if (c >= '0' && c <= '9') d = c-'0';
else if (c >= 'A' && c <= 'Z') d = c-'A' + (int) 10;
else if (c >= 'a' && c <= 'z') d = c-'a' + (int) 10;
else exit(-1);
if (base <= d) exit(-1);
n = n*base + d;
}
return n;
}
Note that the code is not only more efficient but also smaller and doesn't contain useless calls to strlen
.
2) The benchmark reports not only string to integer conversion (parsing) time but also integer to string conversion time. But that's not really a problem if the tests for other languages are done in the same way. The problem is that conversion from integer to string in the C benchmark is done using sprintf
which is one of the most inefficient ways to it. In http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html I made a comparison of several methods (I used C++ but the point is true for C as well excluding the methods that use std::string
). Even a naive implementation of ltoa (https://github.com/vitaut/format-benchmark/blob/master/ltoa.c) is ~70% faster than sprintf
. The reason is that sprintf
is not an integer to string conversion function, but a general formatting function. So comparing its performance to that of integer to string conversion functions in other languages such as hex
in Python or toString
in Javascript doesn't make much sense. For instance, the most direct counterpart to sprintf
in Python is str.format
and not hex
.
I like the improvement described in (1). As for (2), my understanding is that sprintf
is pretty much the only portable idiomatic way to convert an integer to a string in standard C without writing a custom routine to do it.
This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!
I see a 16% improvement in parse_int
performance with this change. I'm gonna commit it.
Feel free to submit a pull request for a better ltoa implementation.
I see from your blog, @vitaut, that you've done a lot of work on fast formatting and integer printing in various languages. If you're interested, I think that our routines are respectable, but we always value performance and I'm sure they could use some further tuning since they were originally written.
Most helpful comment
This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!