Compiled on Stable with Release target.
#[no_mangle]
pub fn foo(a: u128, b: u128) -> bool
{
a == b
}
#[no_mangle]
pub fn foo2(a: u128, b: u128) -> bool
{
((a >> 64) as u64) == ((b >> 64) as u64) && (a as u64) == (b as u64)
}
#[no_mangle]
pub fn foo3(a: u128, b: u128) -> bool
{
bar((a >> 64) as u64, a as u64, (b >> 64) as u64, b as u64)
}
#[no_mangle]
pub fn bar(a1: u64, a2: u64, b1: u64, b2: u64) -> bool
{
a1 == b1 && a2 == b2
}
Produces:
foo:
movq xmm0, rcx
movq xmm1, rdx
punpcklqdq xmm1, xmm0
movq xmm0, rsi
movq xmm2, rdi
punpcklqdq xmm2, xmm0
pcmpeqb xmm2, xmm1
pmovmskb eax, xmm2
cmp eax, 65535
sete al
ret
foo2:
xor rsi, rcx
xor rdi, rdx
or rdi, rsi
sete al
ret
foo3:
xor rsi, rcx
xor rdi, rdx
or rdi, rsi
sete al
ret
bar:
xor rdi, rdx
xor rsi, rcx
or rsi, rdi
sete al
ret
It seems like all three methods should produce the same ASM.
I'm definitely not an expert on this, but my understanding is that the 128-bit integers are being passed in as 2 64-bit registers each. As such, all 3 functions should be performing equivalent work, so I think all 3 (with optimizations on) should be producing the same instructions, unless the SIMD version is somehow more optimal. At a glance it seems like both more work and more instructions though, so I suspect it's not?
Kind of surprised nobody noticed this bad codegen before...
llc trunk: https://godbolt.org/z/9FqY3w
Seems to be a regression between llvm 4.0.1 and llvm 5.
LLVM-4.0.1:
foo: # @foo
# BB#0:
xor rsi, rcx
xor rdi, rdx
or rdi, rsi
sete al
ret
LLVM-5.0
foo: # @foo
# BB#0:
movq xmm0, rcx
movq xmm1, rdx
punpcklqdq xmm1, xmm0 # xmm1 = xmm1[0],xmm0[0]
movq xmm0, rsi
movq xmm2, rdi
punpcklqdq xmm2, xmm0 # xmm2 = xmm2[0],xmm0[0]
pcmpeqb xmm2, xmm1
pmovmskb eax, xmm2
cmp eax, 65535
sete al
ret
# -- End function
I was going to say that perhaps the xmm
version has better timing characteristics still, but it does not, even on architectures where movq
s are free.
Reported upstream: https://bugs.llvm.org/show_bug.cgi?id=41971
Fixed on nightly: https://godbolt.org/z/1KHG1S
Most helpful comment
Fixed upstream: https://github.com/llvm/llvm-project/commit/15df05152d3d6dcaf92b3456a079357e2d876e79