Julia: Possible performance improvements to Char + UInt8?

Created on 15 Aug 2018 · 10Comments · Source: JuliaLang/julia

https://github.com/JuliaLang/julia/pull/28670 and https://github.com/JuliaLang/julia/pull/28661 makes a bit of an ugly hack to avoid a Char + UInt8, which seems to be quite slow now, in order to significantly improve performance. Perhaps we can improve it?

https://github.com/JuliaLang/julia/pull/28661#issuecomment-413124171 seems to help but the complete performance impact of that is yet not tested.

performance regression strings

Source

KristofferC

Most helpful comment

desire to improve the performance of Char+Int.

If that is possible so that we don't have to manually change '0' + c to 48 + UInt32(c) etc, that would be a nice fix to this issue.

KristofferC on 22 Aug 2018

👍4

All 10 comments

using Printf
using BenchmarkTools
const DIGITS = zeros(UInt8, 20)

function decode_dec(d::Integer, ::Val{N}) where {N}
   neg, x = Base.Printf.handlenegative(d)
   pt = i = Base.ndigits0z(x)
   while i > 0
       DIGITS[i] = (N == 0 ? '0' : 48) + rem(x,10)
       x = div(x,10)
       i -= 1
   end
    return
end

julia> @btime decode_dec(12323321, Val(0))
  65.534 ns (0 allocations: 0 bytes)

julia> @btime decode_dec(12323321, Val(1))
  17.997 ns (0 allocations: 0 bytes)

In the lowered code we see that the plus between the int and char didn't inline:

invoke Main.:+(%16::Char, %17::UInt64)::Char

KristofferC on 16 Aug 2018

For Val(0), '0' is converted to an Int32 and the result is converted back to Char by +, then it is converted to a UInt8 by the assignment. Even if inlining and constant propagation worked here, only the first Char to Int32 conversion could be done at compile time. For Val(1), only one conversion from Int to UInt8 is needed. So maybe this isn't the fairest comparsion. (Note also that UInt8('0'+x) and UInt8(48+x) are not functionally equivalent, as the former verifies the intermediate value is a valid codepoint.)

martinholters on 22 Aug 2018

Sure, but the exact details here are not important. The point of this issue is that removing Char arithmetic in different string codes in Base consistently gives significant performance improvements (https://github.com/JuliaLang/julia/pull/28787, https://github.com/JuliaLang/julia/pull/28670, https://github.com/JuliaLang/julia/pull/28661). That might be fine and we just need to audit the Base code for when Char arithmetic is used unnecessarily and document that it is slowish.

KristofferC on 22 Aug 2018

How difficult do you think it would be to optimize Char arithmetic for these sorts of cases eventually? Char literals are often more self-documenting than the equivalent integers and it's a bit sorrowful to see readability have to trade off with performance.

yurivish on 22 Aug 2018

👍1

Ok, I misunderstood this as a request/desire to improve the performance of Char+Int.

martinholters on 22 Aug 2018

desire to improve the performance of Char+Int.

If that is possible so that we don't have to manually change '0' + c to 48 + UInt32(c) etc, that would be a nice fix to this issue.

KristofferC on 22 Aug 2018

👍4

It should be possible to compute this more efficiently. Let me think about it. @bkamins is also pretty good at this kind of thing and might be interested 😁

StefanKarpinski on 22 Aug 2018

Would this help at least in most common cases of ASCIII?

@inline function Base.:+(x::Char, y::Integer)
    u = reinterpret(UInt32, x)
    if u < 0x80000000
        z = Int32(u>>24) + Int32(y)
        return z < 0x80 ? reinterpret(Char, z << 24) : Char(z)
    end
    Char(Int32(x) + Int32(y))
end

bkamins on 22 Aug 2018

Another option could be to keep precomputed arrays mapping:

Char (reinterpreted to UInt32) to UInt32 (EDIT: now I realized that this mapping is more problematic than the other - I have to think if it is doable efficiently - but even if not then the other is not problematic)
UInt32 to Char

This would eat up ~16MB, so I am not fully sure how feasible this would be.
But this would speed up conversion and addition/substraction should be fast.
Additionally branching would happen at bounds checking and thus it could be disabled if one knew that the operation is safe. For instance in the code above you could use @inbounds as it would be known to be safe.

EDIT: I have initially benchmarked it and it is not faster (unless I am doing something incorrectly)

bkamins on 23 Aug 2018

I know there is reluctance towards having different CharX types, but since ascii is a rather important subset of utf8, how about adding a type parameter Char{T} where T would be true only if the character is known to be ascii in order to help dispatching to optimized ascii-operations. The detection could happen at parsing time and it would not change the underlying storage type.