Linux: Clang's integrated assembler does not support the movzxw instruction mnemonic

Created on 28 Apr 2020  Â·  31Comments  Â·  Source: ClangBuiltLinux/linux

arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173:2: error: invalid instruction mnemonic 'movzxw'
 movzxw (%rdi, %rax, 2), %rsi
 ^~~~~~
  AR      certs/built-in.a
  CC      arch/x86/realmode/rm/video-mode.o
make[2]: *** [scripts/Makefile.build:350: arch/x86/crypto/crc32c-pcl-intel-asm_64.o] Error 1
make[1]: *** [scripts/Makefile.build:505: arch/x86/crypto] Error 2
make[1]: *** Waiting for unfinished jobs....
[ARCH] x86_64 [FIXED][LINUX] 5.9 [TOOL] integrated-as

Most helpful comment

@arndb
You want to send a v2 with some updates?

Link: https://github.com/ClangBuiltLinux/linux/issues/1010
Reviewed-by: Nathan Chancellor <[email protected]>
Reported-by: Sedat Dilek <[email protected]>
Tested-by: Sedat Dilek <[email protected]>

Credits for (Suggested-by: ?) : Diab Neiroukh <[email protected]>

All 31 comments

based on

  1. https://bugs.freedesktop.org/show_bug.cgi?id=33386
  2. https://github.com/jtkukunas/zlib/commit/4026219d33914e221d4d9765e67310afdf54caec
    it looks like movzxw is a psuedo instruction that expands to different instructions based on targeting 32b vs 64b x86. cc @jcai19

The gcc support of movzxw seems to be a hack as it does not support the other suffixes of movzx,

$ cat bad.s
movzxw (%rdi, %rax, 2), %rsi

$ gcc -c good.s -o good.o
$ echo $?
0
$ objdump -d good.o

good.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
0: 48 0f b7 34 47 movzwq (%rdi,%rax,2),%rsi

$ cat bad.s
movzxq (%rdi, %rax, 2), %rsi
$ gcc -c bad.s -o bad.o
bad.s: Assembler messages:
bad.s:1: Error: unsupported syntax for `movzx'

Maybe we should fix this in the kernel instead of the integrated assembler. Also, may I know which config I should use to reproduce? Thanks.

In bad.s, you have movzxq (no w), which is not the same as the disassembly of good.o's movzwq (yes w). Please triple check.

Yes, that was my point. GCC supports movzx with 'w' suffix, but not other suffix such as 'q'. Is that intentional?

@jcai19 maybe I'm wrong but doesn't movzxw expand to movzwl _or_ movzwq depending on the platform?

@jcai19 maybe I'm wrong but doesn't movzxw expand to movzwl or movzwq depending on the platform?

That's precisely how this psuedo behaves.

➜  /tmp gcc -m64 foo.S -o foo.o -c
➜  /tmp objdump -d foo.o          

foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 0f b7 34 47          movzwq (%rdi,%rax,2),%rsi
➜  /tmp gcc -m32 foo.S -o foo.o -c
➜  /tmp objdump -d foo.o          

foo.o:     file format elf32-i386


Disassembly of section .text:

00000000 <.text>:
   0:   0f b7 34 47             movzwl (%edi,%eax,2),%esi
➜  /tmp cat foo.S 
#ifdef __x86_64__
movzxw (%rdi, %rax, 2), %rsi
#else
movzxw (%edi, %eax, 2), %esi
#endif

Thanks @lazerl0rd and @nickdesaulniers for the clarification and example. I guess my confusion came from the fact gnu as supported different suffixes of instructions like mov (e.g. movw and movl) and generated consistent opcodes, but for movzw it only supported its 'w' form but not the others, and treated it as a pseudo instruction and lowered it into different opcodes, depending on the target architecture and the type of operands -- on x64, gnu as assembles movzxw into movzxl if the registers used are 32-bit, or movzxq for 64-bit.

$ cat foo32.s
movw 0x8, %ax
movl $0x000F, %eax
movzxw (%edi, %eax, 2), %esi

$ gcc -m32 -c foo32.s -o foo32.o
$ objdump -d foo32.o

foo32.o: file format elf32-i386

Disassembly of section .text:

00000000 <.text>:
0: 66 a1 08 00 00 00 mov 0x8,%ax
6: b8 0f 00 00 00 mov $0xf,%eax
b: 0f b7 34 47 movzwl (%edi,%eax,2),%esi

$ cat foo64.s
movw 0x8, %ax
movl $0x000F, %eax
movzxw (%edi, %eax, 2), %esi
movzxw (%rdi, %rax, 2), %rsi

$ gcc -m64 -c foo64.s -o foo64.o
$ objdump -d foo64.o

foo64.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
0: 66 8b 04 25 08 00 00 mov 0x8,%ax
7: 00
8: b8 0f 00 00 00 mov $0xf,%eax
d: 67 0f b7 34 47 movzwl (%edi,%eax,2),%esi
12: 48 0f b7 34 47 movzwq (%rdi,%rax,2),%rsi

IMO it is clearer to change this in the kernel to explicitly use an instruction instead of a pseudo one, as suggested in the last comment. Also, not sure if this is the right source but it seems movzxw is not explicitly supported by x86 assembly.

After multiple formatting tries, I seem to sit comfortably at https://github.com/ZestProjects/linux/commit/65d285ac51d89a02d8acd4901b763211de6706da.

Thanks for the update! FWIW, gnu as assembles movzxw into either movzxl or movzxq if the operands are 32-bit or 64-bit respectively, as in the example.

nice @lazerl0rd , please send it.

Arnd independently sent a patch: https://lore.kernel.org/lkml/[email protected]/

Arnd independently sent a patch: https://lore.kernel.org/lkml/[email protected]/

No worries, I've been having email issues and will be for a week or two so I'm kinda MIA.

Hmm, with clang-10 and @arndb patch I do not get this fixed...

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -170,7 +170,7 @@ continue_block:

        ## branch into array
        lea     jump_table(%rip), %bufp
-       movzxw  (%bufp, %rax, 2), len
+       movzxq  (%bufp, %rax, 2), len
        lea     crc_array(%rip), %bufp
        lea     (%bufp, len, 1), %bufp
        JMP_NOSPEC bufp

...and I see now:

> clang-10 -Wp,-MD,arch/x86/crypto/.crc32c-pcl-intel-asm_64.o.d -nostdinc -isystem /home/dileks/src/llvm-toolchain/install/lib/clang/10.0.1/include -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -D__ASSEMBLY__ -fno-PIE -Werror=unknown-warning-option -Wa,-gdwarf-4 -m64 -Wa,-gdwarf-4 -Wa,--compress-debug-sections=zlib -DCC_USING_FENTRY  -DMODULE  -c -o arch/x86/crypto/crc32c-pcl-intel-asm_64.o arch/x86/crypto/crc32c-pcl-intel-asm_64.S

arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173:2: error: invalid instruction mnemonic 'movzxq'
 movzxq (%rdi, %rax, 2), %rsi
 ^~~~~~

@jcai19

Can you look at this please?
I am here on Linux v5.7.2 and llvm-toolchain v10.0.1-rc+.

I build with -no-integrated-as and looked at the generated object-file.

0000000000000414 <continue_block>:
     414:       48 81 fe c8 00 00 00    cmp    rsi,0xc8
     41b:       0f 82 4d 0f 00 00       jb     136e <small>
     421:       48 c7 c0 ab 0a 00 00    mov    rax,0xaab
     428:       f7 e6                   mul    esi
     42a:       48 c1 e8 10             shr    rax,0x10
     42e:       48 8d 0c c1             lea    rcx,[rcx+rax*8]
     432:       48 8d 14 c1             lea    rdx,[rcx+rax*8]
     436:       4c 8d 1c c2             lea    r11,[rdx+rax*8]
     43a:       4d 31 c9                xor    r9,r9
     43d:       4d 31 d2                xor    r10,r10
     440:       48 8d 3d 00 00 00 00    lea    rdi,[rip+0x0]        # 447 <continue_block+0x33>
     447:       48 0f b7 34 47          movzx  rsi,WORD PTR [rdi+rax*2]
     44c:       48 8d 3d 29 00 00 00    lea    rdi,[rip+0x29]        # 47c <crc_128>
     453:       48 8d 3c 37             lea    rdi,[rdi+rsi*1]
     457:       ff e7                   jmp    rdi
     459:       90                      nop
     45a:       90                      nop
     45b:       90                      nop

000000000000045c <full_block>:
...

Hmm, when using make OBJDUMP=llvm-objdump:

$ llvm-objdump -d ./arch/x86/crypto/crc32c-intel.o | grep -A50 continue_block
0000000000000414 continue_block:
     414: 48 81 fe c8 00 00 00          cmpq    $200, %rsi
     41b: 0f 82 4d 0f 00 00             jb      3917 <small>
     421: 48 c7 c0 ab 0a 00 00          movq    $2731, %rax
     428: f7 e6                         mull    %esi
     42a: 48 c1 e8 10                   shrq    $16, %rax
     42e: 48 8d 0c c1                   leaq    (%rcx,%rax,8), %rcx
     432: 48 8d 14 c1                   leaq    (%rcx,%rax,8), %rdx
     436: 4c 8d 1c c2                   leaq    (%rdx,%rax,8), %r11
     43a: 4d 31 c9                      xorq    %r9, %r9
     43d: 4d 31 d2                      xorq    %r10, %r10
     440: 48 8d 3d 00 00 00 00          leaq    (%rip), %rdi
     447: 48 0f b7 34 47                movzwq  (%rdi,%rax,2), %rsi
     44c: 48 8d 3d 29 00 00 00          leaq    41(%rip), %rdi
     453: 48 8d 3c 37                   leaq    (%rdi,%rsi), %rdi
     457: ff e7                         jmpq    *%rdi
     459: 90                            nop
     45a: 90                            nop
     45b: 90                            nop

000000000000045c full_block:

OK, with make LLVM_IAS=1 OBJDUMP=llvm-objdump I am able to compile with @arndb patch.

$ ll arch/x86/crypto/crc32c-pcl-intel-asm_64.o
-rw-r--r-- 1 dileks dileks 18K Jun 12 16:24 arch/x86/crypto/crc32c-pcl-intel-asm_64.o

Objdump:

$ llvm-objdump -d arch/x86/crypto/crc32c-pcl-intel-asm_64.o
0000000000000044 continue_block:
      44: 48 81 fe c8 00 00 00          cmpq    $200, %rsi
      4b: 0f 82 4d 0f 00 00             jb      3917 <small>
      51: 48 c7 c0 ab 0a 00 00          movq    $2731, %rax
      58: f7 e6                         mull    %esi
      5a: 48 c1 e8 10                   shrq    $16, %rax
      5e: 48 8d 0c c1                   leaq    (%rcx,%rax,8), %rcx
      62: 48 8d 14 c1                   leaq    (%rcx,%rax,8), %rdx
      66: 4c 8d 1c c2                   leaq    (%rdx,%rax,8), %r11
      6a: 4d 31 c9                      xorq    %r9, %r9
      6d: 4d 31 d2                      xorq    %r10, %r10
      70: 48 8d 3d 00 00 00 00          leaq    (%rip), %rdi
      77: 48 0f b7 34 47                movzwq  (%rdi,%rax,2), %rsi
      7c: 48 8d 3d 29 00 00 00          leaq    41(%rip), %rdi
      83: 48 8d 3c 37                   leaq    (%rdi,%rsi), %rdi
      87: ff e7                         jmpq    *%rdi
      89: 90                            nop
      8a: 90                            nop
      8b: 90                            nop

000000000000008c full_block:

Hmm, I have respinned my LLVM_IAS=1 patches and see this again:

  clang-10 -Wp,-MD,arch/x86/crypto/.crc32c-pcl-intel-asm_64.o.d -nostdinc -isystem /home/dileks/src/llvm-toolchain/install/lib/clang/10.0.1/include -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -no-integrated-as -D__ASSEMBLY__ -fno-PIE -Werror=unknown-warning-option -Wa,-gdwarf-4 -m64 -Wa,-gdwarf-4 -Wa,--compress-debug-sections=zlib -DCC_USING_FENTRY  -DMODULE  -c -o arch/x86/crypto/crc32c-pcl-intel-asm_64.o arch/x86/crypto/crc32c-pcl-intel-asm_64.S
arch/x86/crypto/crc32c-pcl-intel-asm_64.S: Assembler messages:
arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173: Error: unsupported syntax for `movzx'
make[5]: *** [scripts/Makefile.build:349: arch/x86/crypto/crc32c-pcl-intel-asm_64.o] Error 1

UPDATE: Aaargh forget to throw out -no-integrated-as.

NO, still seeing the Error.

More coffee. Everything OK.

@dileks go chase it upstream ;) it was sent to our mailing list if, you're subscribed.

@arndb
You want to send a v2 with some updates?

Link: https://github.com/ClangBuiltLinux/linux/issues/1010
Reviewed-by: Nathan Chancellor <[email protected]>
Reported-by: Sedat Dilek <[email protected]>
Tested-by: Sedat Dilek <[email protected]>

Credits for (Suggested-by: ?) : Diab Neiroukh <[email protected]>

@lazerl0rd

Sorry if I have ignored your patch and of course this should have your S-o-b.

@lazerl0rd

Sorry if I have ignored your patch and of course this should have your S-o-b.

No worries, it's just great to see this bug resolved :).

@dileks thanks for noticing, commenting, and tagging. Don't forget to close out the bug once a FIXED tag is applied!

No I did not forget. I thought when I close you will not notice. But on "closed" you get a message via GitHub, so next time I close myself.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tpimh picture tpimh  Â·  3Comments

nathanchance picture nathanchance  Â·  3Comments

nathanchance picture nathanchance  Â·  3Comments

nathanchance picture nathanchance  Â·  3Comments

tpimh picture tpimh  Â·  4Comments