Esp-idf: Bug in xtensa-esp32-elf-gcc generated machine code (L32R a6, 0x????) (IDFGH-913)

Created on 5 Apr 2019 · 6Comments · Source: espressif/esp-idf

Environment

Module or chip used: ESP32-WROOM-32
IDF version (run git describe --tags to find it): v4.0-dev-225-g2f8b6cfc7
Build System: Make
Compiler version (run xtensa-esp32-elf-gcc --version to find it):
// crosstool-ng-1.22.0-80-g6c4433a5
Operating System: Linux

Problem Description

xtensa-esp32-elf-gcc emits wrong machine code in some situations. The particular problem I noticed is the generated machine code for l32r a6, 0x???? instruction has not only the wrong byte order, but also lacks the immediate number field.

//Detailed problem description goes here.
When I was trying to study the machine code of ESP32, I discovered a strange pattern in the disassembly that I couldn't explain. After some searching and guesswork, I roughly figured out what might have gone wrong with the compiler. see section "Steps to repropduce" for detail.

Expected Behavior

for all instructions like l32r a6, 0xabcd, the generated machine code should be 0xabcd61.

Actual Behavior

for some l32r a6, 0xabcd, the machine code turned out to be 0x610000, which is coincidentally another instruction xsr.lbeg a0. Interestingly, the linker will silently ignore the problem during the linking and relocation stage.

Steps to repropduce

I have no time to create my own program that exactly reproduces the problem. However, the binary wifi library shipped with esp-idf should give you enough information about the problem.
Here I attached a snippet of the disassembly of the function esp_wifi_get_channel before and after relocation (linking).

after_relocation.txt
before_relocation.txt

Search for 610000 or xsr.lbeg to locate the problem.

The disassembly of this function is particularly useful for showing the problem. The function is supposed to take 2 arguments (stored in register a2 and a3). The register a6 is never being written to throughout the function. However, it is being read several times. The only possible place for it being assigned value is where 610000 resides, which looks shockingly similar to 000061, which stands for l32r a6, offset.

EDIT: Double checked the machine code with hex editor. The problem is indeed on the compiler instead of objdump.
bug

EDIT: Also note the garbage bytes cf, ff that follow the 610000 in before_relocation.txt and the garbage instruction 400e8745: fd4c movi.n a13, 79 that follows xsr.lbeg a0 in after_relocation.txt. I have no idea what is happening to the compiler and linker here.

Source

zekunhao1995

Most helpful comment

Search for 610000 or xsr.lbeg to locate the problem.

I've found the following:

  c8:   f01d        retw.n
  ca:   610000          xsr.lbeg    a0
            cc: R_XTENSA_SLOT0_OP   .text.esp_wifi_get_channel+0x8
  cd:   cf              .byte 0xcf
  ce:   ff              .byte 0xff
  cf:   0a1c        movi.n  a10, 16
  d1:   0628        l32i.n  a2, a6, 0

and I can tell two things:
first: bytes at 0xca and 0xcb are two padding zeros.
second: relocation at address 0xcc applies to a whole instruction.

The code above is fine, the disassembly is buggy because the disassembler got out of sync with instruction stream.

A patch that fixes this objdump issue is available here: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=4b8e28c79356265b2c111e044142fb6d6d2db44e
It is a part of binutils since release 2.31.

jcmvbkbc on 5 Apr 2019

👍2

All 6 comments

Search for 610000 or xsr.lbeg to locate the problem.

I've found the following:

  c8:   f01d        retw.n
  ca:   610000          xsr.lbeg    a0
            cc: R_XTENSA_SLOT0_OP   .text.esp_wifi_get_channel+0x8
  cd:   cf              .byte 0xcf
  ce:   ff              .byte 0xff
  cf:   0a1c        movi.n  a10, 16
  d1:   0628        l32i.n  a2, a6, 0

and I can tell two things:
first: bytes at 0xca and 0xcb are two padding zeros.
second: relocation at address 0xcc applies to a whole instruction.

The code above is fine, the disassembly is buggy because the disassembler got out of sync with instruction stream.

jcmvbkbc on 5 Apr 2019

👍2

xtensa-esp32-elf-gcc emits wrong machine code in some situations.

The compiler does not emit machine code, it only emits assembly code. The assembler turns it into machine code. You can invoke compiler with -S instead of -c to get the assembly output and see what's there.

jcmvbkbc on 5 Apr 2019

Search for 610000 or xsr.lbeg to locate the problem.

I've found the following:
  c8: f01d        retw.n
  ca: 610000          xsr.lbeg    a0
          cc: R_XTENSA_SLOT0_OP   .text.esp_wifi_get_channel+0x8
  cd: cf              .byte 0xcf
  ce: ff              .byte 0xff
  cf: 0a1c        movi.n  a10, 16
  d1: 0628        l32i.n  a2, a6, 0
and I can tell two things:
first: bytes at 0xca and 0xcb are two padding zeros.
second: relocation at address 0xcc applies to a whole instruction.

The code above is fine, the disassembly is buggy because the disassembler got out of sync with instruction stream.

A patch that fixes this objdump issue is available here: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=4b8e28c79356265b2c111e044142fb6d6d2db44e
It is a part of binutils since release 2.31.

Thanks for the reply. However, note the order of the three bytes. Shouldn't the machine code at ca be 000061 instead of 610000 according to Xtensa ISA? In other words, shouldn't be the three bytes in linear order be 61 00 00 in the attached hex editor screenshot?

According to ISA, XSR.* instructions have the following form:
23 <- 0110 0001 sr t 0000 -> 0, where sr = 0000 0000 for lbeg, t = 0000 for a0.
Due to little-endianness, this should translate to 00 00 61 in hex editor.

However,L32R at imm16 instruction is like this:
23 <- imm16 t 0001 -> 0, where t= 0110 for a6
So the correct form should be 61 00 00 in hex editor, which is definitely not the case currently...
Also the trailing zeros are usually not the convention of GCC's assembler. For my experience, it usually fills the offset with some plausible number and change later during the relocation phase.

EDIT: You can also verify this with other disassemblers such as:
https://onlinedisassembler.com/
Here are the screenshots:

Original buggy version:
bug_oda1

After correction:
nobug

EDIT: __Also note that the code is still not correct even after relocation.__ after_relocation.txt shows the same piece of code after linking. It is pulled from the disassembly of the full rom image. See the end of my problem description for detail. The instruction 400e8745: fd4c movi.n a13, 79 that follows the problematic xsr instruction is useless since the a13 register is never being used after that. Also that instruction is not in the object file. I suspect it is a part of the erroneous machine code that coincidentally resembled a movi instruction.

zekunhao1995 on 5 Apr 2019

Ahh, I realized what went wrong here. It is the problem of objdump indeed. The code is correct.
Instead of being 0x610000, the instruction is actually 0xffcf61, which interprets to l32r a6, -0xC3. The two zero bytes are for alignment purpose since most of the branching instructions can only branch to 32bit aligned addresses.

/* There is a branching to 0xcc elsewhere */
  c8:   f01d        retw.n                 /* This is a return instruction. Will never execute pass this point */
  ca:   610000          xsr.lbeg    a0     /* The 0x61 is at 0xcc, which is a 32bit-aligned address thanks to those zeros */
            cc: R_XTENSA_SLOT0_OP   .text.esp_wifi_get_channel+0x8
  cd:   cf              .byte 0xcf
  ce:   ff              .byte 0xff             /* 0xffcf61 is the actual instruction */
  cf:   0a1c        movi.n  a10, 16
  d1:   0628        l32i.n  a2, a6, 0

zekunhao1995 on 5 Apr 2019

🎉1

The two zero bytes are for alignment purpose since most of the branching instructions can only branch to 32bit aligned addresses.

Only call{0,4,8,12} instructions require 4-bytes-aligned addresses on xtensa, ordinary branches and jumps don't require any alignment for the target address. But sometimes it is a bit faster when branch target is aligned, so the assembler does that. This behavior may be disabled with the --no-target-align option to the assembler.
There are additional alignment complexities with the loop instruction and the first instruction of the loop body.

jcmvbkbc on 5 Apr 2019

Preview release of the toolchain which includes updated binutils version is now available, see https://www.esp32.com/viewtopic.php?f=10&t=7400&p=31257#p41667.