https://github.com/JuliaLang/julia/pull/28670 and https://github.com/JuliaLang/julia/pull/28661 makes a bit of an ugly hack to avoid a Char + UInt8
, which seems to be quite slow now, in order to significantly improve performance. Perhaps we can improve it?
https://github.com/JuliaLang/julia/pull/28661#issuecomment-413124171 seems to help but the complete performance impact of that is yet not tested.
using Printf
using BenchmarkTools
const DIGITS = zeros(UInt8, 20)
function decode_dec(d::Integer, ::Val{N}) where {N}
neg, x = Base.Printf.handlenegative(d)
pt = i = Base.ndigits0z(x)
while i > 0
DIGITS[i] = (N == 0 ? '0' : 48) + rem(x,10)
x = div(x,10)
i -= 1
end
return
end
julia> @btime decode_dec(12323321, Val(0))
65.534 ns (0 allocations: 0 bytes)
julia> @btime decode_dec(12323321, Val(1))
17.997 ns (0 allocations: 0 bytes)
In the lowered code we see that the plus between the int and char didn't inline:
invoke Main.:+(%16::Char, %17::UInt64)::Char
For Val(0)
, '0'
is converted to an Int32
and the result is converted back to Char
by +
, then it is converted to a UInt8
by the assignment. Even if inlining and constant propagation worked here, only the first Char
to Int32
conversion could be done at compile time. For Val(1)
, only one conversion from Int
to UInt8
is needed. So maybe this isn't the fairest comparsion. (Note also that UInt8('0'+x)
and UInt8(48+x)
are not functionally equivalent, as the former verifies the intermediate value is a valid codepoint.)
Sure, but the exact details here are not important. The point of this issue is that removing Char arithmetic in different string codes in Base consistently gives significant performance improvements (https://github.com/JuliaLang/julia/pull/28787, https://github.com/JuliaLang/julia/pull/28670, https://github.com/JuliaLang/julia/pull/28661). That might be fine and we just need to audit the Base code for when Char
arithmetic is used unnecessarily and document that it is slowish.
How difficult do you think it would be to optimize Char
arithmetic for these sorts of cases eventually? Char
literals are often more self-documenting than the equivalent integers and it's a bit sorrowful to see readability have to trade off with performance.
Ok, I misunderstood this as a request/desire to improve the performance of Char+Int.
desire to improve the performance of Char+Int.
If that is possible so that we don't have to manually change '0' + c
to 48 + UInt32(c)
etc, that would be a nice fix to this issue.
It should be possible to compute this more efficiently. Let me think about it. @bkamins is also pretty good at this kind of thing and might be interested 😁
Would this help at least in most common cases of ASCIII?
@inline function Base.:+(x::Char, y::Integer)
u = reinterpret(UInt32, x)
if u < 0x80000000
z = Int32(u>>24) + Int32(y)
return z < 0x80 ? reinterpret(Char, z << 24) : Char(z)
end
Char(Int32(x) + Int32(y))
end
Another option could be to keep precomputed arrays mapping:
Char
(reinterpreted to UInt32
) to UInt32
(EDIT: now I realized that this mapping is more problematic than the other - I have to think if it is doable efficiently - but even if not then the other is not problematic)UInt32
to Char
This would eat up ~16MB, so I am not fully sure how feasible this would be.
But this would speed up conversion and addition/substraction should be fast.
Additionally branching would happen at bounds checking and thus it could be disabled if one knew that the operation is safe. For instance in the code above you could use @inbounds
as it would be known to be safe.
EDIT: I have initially benchmarked it and it is not faster (unless I am doing something incorrectly)
I know there is reluctance towards having different CharX
types, but since ascii is a rather important subset of utf8, how about adding a type parameter Char{T}
where T
would be true
only if the character is known to be ascii in order to help dispatching to optimized ascii-operations. The detection could happen at parsing time and it would not change the underlying storage type.
Most helpful comment
If that is possible so that we don't have to manually change
'0' + c
to48 + UInt32(c)
etc, that would be a nice fix to this issue.