Julia: The results of the parse_int microbenchmark are misleading

Created on 28 Oct 2013  路  5Comments  路  Source: JuliaLang/julia

The results of the parse_int benchmark microbenchmark reported on http://julialang.org/ are somewhat misleading because the C version used as a baseline is written in an inefficient way. I see at least two obvious problems with the C implementation:

1) strlen is unnecessarily called in https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.c#L30. Moreover it is called at every iteration! An idiomatic C code for that would be something along the lines of:

long parse_int(const char *s, long base) {
    long n = 0;
    for (; *s; ++s) { // note that strlen is not called
        char c = *s;
        long d = 0;
        if (c >= '0' && c <= '9') d = c-'0';
        else if (c >= 'A' && c <= 'Z') d = c-'A' + (int) 10;
        else if (c >= 'a' && c <= 'z') d = c-'a' + (int) 10;
        else exit(-1);

        if (base <= d) exit(-1);
        n = n*base + d;
    }
    return n;
}

Note that the code is not only more efficient but also smaller and doesn't contain useless calls to strlen.

2) The benchmark reports not only string to integer conversion (parsing) time but also integer to string conversion time. But that's not really a problem if the tests for other languages are done in the same way. The problem is that conversion from integer to string in the C benchmark is done using sprintf which is one of the most inefficient ways to it. In http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html I made a comparison of several methods (I used C++ but the point is true for C as well excluding the methods that use std::string). Even a naive implementation of ltoa (https://github.com/vitaut/format-benchmark/blob/master/ltoa.c) is ~70% faster than sprintf. The reason is that sprintf is not an integer to string conversion function, but a general formatting function. So comparing its performance to that of integer to string conversion functions in other languages such as hex in Python or toString in Javascript doesn't make much sense. For instance, the most direct counterpart to sprintf in Python is str.format and not hex.

performance

Most helpful comment

This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!

All 5 comments

I like the improvement described in (1). As for (2), my understanding is that sprintf is pretty much the only portable idiomatic way to convert an integer to a string in standard C without writing a custom routine to do it.

This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!

I see a 16% improvement in parse_int performance with this change. I'm gonna commit it.

Feel free to submit a pull request for a better ltoa implementation.

I see from your blog, @vitaut, that you've done a lot of work on fast formatting and integer printing in various languages. If you're interested, I think that our routines are respectable, but we always value performance and I'm sure they could use some further tuning since they were originally written.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omus picture omus  路  3Comments

dpsanders picture dpsanders  路  3Comments

manor picture manor  路  3Comments

iamed2 picture iamed2  路  3Comments

musm picture musm  路  3Comments