After working around issue #509, an arm64 defconfig still fails to link with ThinLTO with the following LLD error:
ld.lld: error: lto.tmp: SHF_MERGE section size must be a multiple of sh_entsize
While the error message is not very informative, the section it refers to is .init.rodata and the size is 56 bytes, while sh_entsize is 16 bytes.
Clearly there's something funky going on with ThinLTO, because according to the ELF specification, LLD is correct here. However, changing shouldMerge in lld/ELF/InputFiles.cpp to return false instead of immediately aborting when the section size is not a multiple of sh_entsize produces a kernel that boots.
cc @rui314 @GeorgiiR @smithp35 for input on the LLD side.
Apologies for the delay in replying.
The requirement in ELF is for the SHF_MERGE section to consist of an array of identically sized entries. LLD's position in giving an error when a SHF_MERGE section doesn't conform to that is conservative. If the compiler has taken advantage of the entry size, such as when the constants are loaded with vector instructions with alignment requirements, LLD could be protecting against a run-time error. If the compiler hasn't then LLD could just treat the SHF_MERGE section as a normal Data section (returning false) and there won't be a problem.
I don't know what happens in the case with the kernel, it will be safe to return false if the data in the section is never used. I don't know whether the boot process touches it or not.
In practice the uses of SHF_MERGE in clang and gcc tends to use these sections for 8, 16 and 32 byte aligned data that is often loaded by vector instructions, in these cases I think that a malformed SHF_MERGE section is likely going to cause strange runtime behaviour if the data is accessed.
A case could be made for only giving an error message if the sh_entsize was <= sh_align as this would imply that the compiler has taken advantage of the structure of the data, however given this is the first time I've seen this error occur, I think it is more indicative of a real problem than LLD being fussy.
Thanks for the response, that makes sense.
I managed to track down the code that generates this section. It's in drivers/soc/tegra/fuse/speedo-tegra210.c:
static const u32 __initconst cpu_process_speedos[][CPU_PROCESS_CORNERS] = {
{ 2119, UINT_MAX },
{ 2119, UINT_MAX },
};
static const u32 __initconst gpu_process_speedos[][GPU_PROCESS_CORNERS] = {
{ UINT_MAX, UINT_MAX },
{ UINT_MAX, UINT_MAX },
};
static const u32 __initconst soc_process_speedos[][SOC_PROCESS_CORNERS] = {
{ 1950, 2100, UINT_MAX },
{ 1950, 2100, UINT_MAX },
}
Where __initconst is simply sets the section to .init.rodata. The generated ELF file contains the following:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
...
[ 9] .init.rodata PROGBITS 0000000000000000 00046c 000038 10 AM 0 0 4
...
A case could be made for only giving an error message if the sh_entsize was <= sh_align as this would imply that the compiler has taken advantage of the structure of the data
Note that this appears not to be the case here. sh_entsize is 16 bytes, but sh_align is 4.
We can work around the problem by disabling CONFIG_ARCH_TEGRA_210_SOC for now.
Here's a stand-alone reproducer:
test.c:
#include <stdio.h>
#define SIZE_AB 2
#define SIZE_C 3
#define __initconst __attribute__((__section__(".init.rodata")))
static const unsigned int __initconst a[][SIZE_AB] = {
{ 0, 0 },
{ 0, 0 },
};
static const unsigned int __initconst b[][SIZE_AB] = {
{ 0, 0 },
{ 0, 0 },
};
static const unsigned int __initconst c[][SIZE_C] = {
{ 0, 0, 0 },
{ 0, 0, 0 },
};
int main() {
for (int i = 0; i < 2; i++) {
printf("%u\n", a[i][0]);
printf("%u\n", b[i][0]);
printf("%u\n", c[i][0]);
}
return 0;
}
Following the kernel build process:
$ clang -flto=thin -fvisibility=default -c -o test.o test.c
$ ld.lld -r -o all.o test.o
ld.lld: error: lto.tmp: SHF_MERGE section size must be a multiple of sh_entsize
The error message is now more informative:
ld.lld-11: error: drivers/built-in.a(soc/tegra/fuse/speedo-tegra210.o):(.init.rodata): SHF_MERGE section size (56) must be a multiple of sh_entsize (16)
This issue is currently the only one preventing arm64 defconfig to be built without binutils: travis log.
I just tested https://reviews.llvm.org/D72194 at Diff 252101 and it DID fix the issue for me.
@MaskRay please help prioritize review of https://reviews.llvm.org/D72194 with me.
Note: we also hit a ton of #716 warnings. Also, #939 which I'm not enthused with the patch in hand.
https://reviews.llvm.org/D72194 just landed upstream. @samitolvanen can you please test ToT LLVM and close this issue if we're all set? Then we can follow up on re-enabling 210 in Android.
Confirmed, I don't see the problem anymore with ToT LLVM. Thanks!
@qperret do we still have CONFIG_ARCH_TEGRA_210_SOC disabled in ACK? Once we upgrade the kernel toolchain to a point that contains https://reviews.llvm.org/D72194, then we should be able to remove it (but not today).
https://bugs.llvm.org/show_bug.cgi?id=43627 should probably be closed too.
https://bugs.llvm.org/show_bug.cgi?id=43627 should probably be closed too.
Done.