Cilium: "'stddef.h' file not found" in dev. VM

Created on 30 Apr 2020 · 27Comments · Source: cilium/cilium

Issue

Trying to run the unit tests from inside the dev. VM results in the following error:

In file included from unit-test.c:8:
/usr/include/stdlib.h:31:10: fatal error: 'stddef.h' file not found
#include <stddef.h>

That error is happening because we recently switched the compiler for the BPF unit tests to Clang. The dev. VM images (4.9 and net-next) don't include a full Clang, but only the strict minimum required to compile BPF programs.

Workaround

The following steps, proposed by André for a similar issue on v1.7, work as a quick fix:

sudo mv /usr/bin/clang{,.bak}
sudo mv /usr/bin/llc{,.bak}
sudo apt-get install -y clang-7 llvm-7
sudo update-alternatives --install /usr/bin/clang clang /usr/lib/llvm-7/bin/clang 1000
sudo update-alternatives --install /usr/bin/llc llc /usr/lib/llvm-7/bin/llc 1000

To revert:

sudo mv /usr/bin/clang{.bak,}
sudo mv /usr/bin/llc{.bak,}

Proper Fix

The proper fix requires to revert part of cilium/packer-ci-build#200, to be able to compile on x86 with Clang.

Reported-by: Nate Sweet nathanjsweet@pm.me

kinbug

Source

pchaigno

All 27 comments

I get segmentation fault on bpf unit test on master with the above workaround on fresh dev VM:

make[2]: Entering directory '/home/vagrant/go/src/github.com/cilium/cilium/test/bpf'
clang -Wall -Wextra -Werror -Wshadow -Wno-unused-parameter -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-sized-type-not-at-end -Wdeclaration-after-statement -I../../bpf/ -I../../bpf/include -I. -D__NR_CPUS__=2 -O2 -target bpf -std=gnu89 -nostdinc -emit-llvm -c elf-demo.c -o - | llc -march=bpf -mcpu=probe -filetype=obj -o elf-demo.o
clang -Wall -Wextra -Werror -Wshadow -Wno-unused-parameter -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-sized-type-not-at-end -Wdeclaration-after-statement -I../../bpf/ -I../../bpf/include -I. -D__NR_CPUS__=2 -O2 -I../../bpf/ unit-test.c -o unit-test
make[2]: Leaving directory '/home/vagrant/go/src/github.com/cilium/cilium/test/bpf'
test/bpf/unit-test
Makefile:216: recipe for target 'unit-tests' failed
make[1]: *** [unit-tests] Segmentation fault
make[1]: Leaving directory '/home/vagrant/go/src/github.com/cilium/cilium'
Makefile:201: recipe for target 'tests' failed
make: *** [tests] Error 2

jrajahalme on 27 May 2020

(gdb) run
Starting program: /home/vagrant/go/src/github.com/cilium/cilium/test/bpf/unit-test 

Program received signal SIGSEGV, Segmentation fault.
0x0000000000470698 in memcmp (x=<error reading variable: Cannot access memory at address 0x7fffff7fefc8>, y=<error reading variable: Cannot access memory at address 0x7fffff7fefc0>, 
    len=<error reading variable: Cannot access memory at address 0x7fffff7fefb8>) at ../../bpf/include/linux/../bpf/builtins.h:322
322 {

jrajahalme on 27 May 2020

Some sort of memcmp loop:

#0  0x0000000000470698 in memcmp (x=<error reading variable: Cannot access memory at address 0x7fffff7fefc8>, y=<error reading variable: Cannot access memory at address 0x7fffff7fefc0>,
    len=<error reading variable: Cannot access memory at address 0x7fffff7fefb8>) at ../../bpf/include/linux/../bpf/builtins.h:322
#1  0x00000000004706e5 in __bpf_memcmp_builtin (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:252
#2  __bpf_memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:301
#3  memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:323
#4  0x00000000004706e5 in __bpf_memcmp_builtin (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:252
#5  __bpf_memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:301
#6  memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:323
#7  0x00000000004706e5 in __bpf_memcmp_builtin (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:252
#8  __bpf_memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:301
#9  memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:323
#10 0x00000000004706e5 in __bpf_memcmp_builtin (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:252
#11 __bpf_memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:301
#12 memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:323
#13 0x00000000004706e5 in __bpf_memcmp_builtin (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:252
#14 __bpf_memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:301
#15 memcmp (x=0x7fffffffd268, y=0x7fffffffd260, len=1) at ../../bpf/include/linux/../bpf/builtins.h:323

jrajahalme on 27 May 2020

Line 252 is the #else of this: │258 #if __clang_major__ >= 10 │

jrajahalme on 27 May 2020

@borkmann Any way to get the bpf unit test to pass on the dev VM?

jrajahalme on 27 May 2020

@borkmann Any way to get the bpf unit test to pass on the dev VM?

Hm, weird. I'm not using the dev VM; what is different in this environment from say Travis CI where it seems to pass? Would upgrading to clang-10 fix it, though bit puzzled why SIGSEGV is hit?

borkmann on 27 May 2020

Maybe it runs out of stack? The infinite loop/recursion between the three memcmp functions is conditional on __clang_major__ being less than 10, so I'd think updating clang would help.

jrajahalme on 27 May 2020

@borkmann ^^^

jrajahalme on 27 May 2020

Maybe it runs out of stack? The infinite loop/recursion between the three memcmp functions is conditional on __clang_major__ being less than 10, so I'd think updating clang would help.

Ok, what happens if you change #if __clang_major__ >= 10 to #if 0 and run the builtin memcmp also with clang-10? Still crashing or not?

borkmann on 27 May 2020

@borkmann Can't run clang 10 or 11 on the dev VM, see description of this issue. I was applying the workaround provided by @aanm above to downgrade to clang-7.

jrajahalme on 27 May 2020

@borkmann Can't run clang 10 or 11 on the dev VM, see description of this issue. I was applying the workaround provided by @aanm above to downgrade to clang-7.

So precompiled clang-10 version like we pull in travis [0] is not an option here? Would it help alternatively if the packer-ci boxes would build also with x86 backend? Given the cilium-runtime is different from packer-ci, we could enable both backends there.

[0] https://github.com/cilium/cilium/blob/master/.travis/prepare.sh
[1] https://github.com/cilium/packer-ci-build/blob/master/provision/ubuntu/install.sh

borkmann on 27 May 2020

I don't see why using .travis/prepare.sh would not work. Can't test that right now though.

jrajahalme on 27 May 2020

Enabling x86 backend in packer-ci would be better, though, less downloading at dev VM start time, if the vagrant box is already available.

jrajahalme on 27 May 2020

@borkmann Maybe do both, so that [0] would bridge us over whenever [1] is out of date?

jrajahalme on 27 May 2020

Using the precompiled clang-10 version "fixes" the segfault. Not sure what is happening here. I'll try to send the PR to update the VM image today.

pchaigno on 27 May 2020

👍1

Using the precompiled clang-10 version "fixes" the segfault. Not sure what is happening here. I'll try to send the PR to update the VM image today.

Even if you do the #if 0 ...

Ok, what happens if you change #if __clang_major__ >= 10 to #if 0 and run the builtin memcmp also with clang-10? Still crashing or not?

... with clang-10?

borkmann on 27 May 2020

Ok, what happens if you change #if clang_major >= 10 to #if 0 and run the builtin memcmp also with clang-10? Still crashing or not?

Still crashing, even outside the dev. VM. Like Jarno said, there seem to be a memcmp loop, but I'm not sure why __bpf_memcmp_builtin() ends up calling memcmp()... Other builtins are fine.

pchaigno on 27 May 2020

Ok, what happens if you change #if clang_major >= 10 to #if 0 and run the builtin memcmp also with clang-10? Still crashing or not?

Still crashing, even outside the dev. VM. Like Jarno said, there seem to be a memcmp loop, but I'm not sure why __bpf_memcmp_builtin() ends up calling memcmp()... Other builtins are fine.

Afaik, for __builtin_memcmp() the compiler could still decide to make a call to glibc's memcmp(), for example. But I'm puzzled why it would segfault. Does the same happen when using gcc for x86 compilation?

borkmann on 27 May 2020

Afaik, for __builtin_memcmp() the compiler could still decide to make a call to glibc's memcmp(), for example.

Here it's calling our memcmp() implementation, hence the loop.

pchaigno on 27 May 2020

Afaik, for __builtin_memcmp() the compiler could still decide to make a call to glibc's memcmp(), for example.

Here it's calling our memcmp() implementation, hence the loop.

Got it, just reproduced locally, will look into it.

borkmann on 27 May 2020

👍1

Afaik, for __builtin_memcmp() the compiler could still decide to make a call to glibc's memcmp(), for example.

Here it's calling our memcmp() implementation, hence the loop.

Got it, just reproduced locally, will look into it.

So renaming our internal memcmp() into memcmp2() fixes the issue. Looks like from the __builtin_memcmp() clang decides to avoid inlining and instead call an available memcmp() and it picks the one we have where we end up in this loop as you mentioned as well. Interesting. :)

borkmann on 27 May 2020

Then I'm guessing the only reason it doesn't break on other builtins is because Clang does inline them. The rules for that seem to be a bit different between __builtin_memcmp() and others. I tried switching from void to char because of the following doc. statement but it didn't help:

Constant evaluation support for the __builtin_mem* functions is provided only for arrays of char, signed char, unsigned char, or char8_t, despite these functions accepting an argument of type const void*.

pchaigno on 27 May 2020

Then I'm guessing the only reason it doesn't break on other builtins is because Clang does inline them. The rules for that seem to be a bit different between __builtin_memcmp() and others. I tried switching from void to char because of the following doc. statement but it didn't help:

Constant evaluation support for the __builtin_mem* functions is provided only for arrays of char, signed char, unsigned char, or char8_t, despite these functions accepting an argument of type const void*.

Yeah, tried that as well earlier. I just switched to __builtin_bcmp() in the PR.

borkmann on 27 May 2020

https://github.com/cilium/packer-ci-build/pull/218 fixes the issue in our VM images but we haven't updated vagrant_box_defaults.rb to use the new images yet. Planning to do that today.

pchaigno on 24 Jun 2020

I am still able to reproduce the issue :disappointed:

pchaigno on 26 Jun 2020

I'm also hitting this issue in the dev VM. @pchaigno Are you planning to check in your fix?

aditighag on 11 Aug 2020

The fix for this is currently blocked by https://github.com/cilium/packer-ci-build/issues/230. I'll try to update the 4.9 and 4.19 VM images to unblock at least these.

pchaigno on 11 Aug 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings