Tl;dr: The warnings being spit out when compiling neovim in release mode are not bogus. They are bugs in gcc's optimizer. This effectively re-opens issue #5061.
nvim --version
: NVIM v0.2.2
Build type: RelWithDebInfo
LuaJIT 2.0.5
Compilation: /usr/lib64/ccache/cc -Wconversion -DNVIM_MSGPACK_HAS_FLOAT32 -DNVIM_UNIBI_HAS_VAR_FROM -O2 -g -DMIN_LOG_
LEVEL=3 -Og -g -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wvla -fstack-protector-s
trong -fdiagnostics-color=auto -DINCLUDE_GENERATED_DECLARATIONS -D_GNU_SOURCE <removed private -I directory stuff>
~/neovim> uname -a
Linux <snip> 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
~/neovim> lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.4 (Maipo)
Release: 7.4
Codename: Maipo
$TERM
: n/agcc --version
: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
gdb --version
:GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
This GDB was configured as "x86_64-redhat-linux-gnu".
I decided to look into the compiler warnings generated while compiling neovim in release mode. When inspecting the code, I found it to be well-formed but not fully clear; the possible control paths were convoluted or not-obvious. Every control path led to a properly initialized value.
Digging deeper, I looked at the disassembly for a particular function: server_stop
from here. I worked through it instruction by instruction, and I found the bug. Looking at the disassembly:
Dump of assembler code for function server_stop:
0x0000000000530edc <+0>: push %r12
0x0000000000530ede <+2>: push %rbp
0x0000000000530edf <+3>: push %rbx
0x0000000000530ee0 <+4>: sub $0x120,%rsp
0x0000000000530ee7 <+11>: mov %rdi,%rsi
0x0000000000530eea <+14>: mov %fs:0x28,%rax
0x0000000000530ef3 <+23>: mov %rax,0x118(%rsp)
0x0000000000530efb <+31>: xor %eax,%eax
0x0000000000530efd <+33>: mov $0x100,%edx
0x0000000000530f02 <+38>: lea 0x10(%rsp),%rdi
0x0000000000530f07 <+43>: callq 0x51cf5f <xstrlcpy>
0x0000000000530f0c <+48>: mov $0x0,%ebx
0x0000000000530f11 <+53>: jmp 0x530f38 <server_stop+92>
0x0000000000530f13 <+55>: movslq %ebx,%rax
0x0000000000530f16 <+58>: shl $0x3,%rax
0x0000000000530f1a <+62>: add 0x43902f(%rip),%rax # 0x969f50 <watchers+16>
0x0000000000530f21 <+69>: mov (%rax),%r12
0x0000000000530f24 <+72>: mov %r12,%rsi
0x0000000000530f27 <+75>: lea 0x10(%rsp),%rdi
0x0000000000530f2c <+80>: callq 0x436d30 <strcmp@plt>
0x0000000000530f31 <+85>: test %eax,%eax
0x0000000000530f33 <+87>: je 0x530f42 <server_stop+102>
0x0000000000530f35 <+89>: add $0x1,%ebx
0x0000000000530f38 <+92>: mov 0x439002(%rip),%ebp # 0x969f40 <watchers>
0x0000000000530f3e <+98>: cmp %ebx,%ebp
0x0000000000530f40 <+100>: jg 0x530f13 <server_stop+55>
0x0000000000530f42 <+102>: cmp %ebp,%ebx
0x0000000000530f44 <+104>: jl 0x530f7e <server_stop+162>
0x0000000000530f46 <+106>: lea 0x10(%rsp),%rax
0x0000000000530f4b <+111>: mov %rax,(%rsp)
0x0000000000530f4f <+115>: mov $0x6d0c44,%r9d
0x0000000000530f55 <+121>: mov $0x1,%r8d
0x0000000000530f5b <+127>: mov $0xc5,%ecx
0x0000000000530f60 <+132>: mov $0x6d0c58,%edx
0x0000000000530f65 <+137>: mov $0x0,%esi
0x0000000000530f6a <+142>: mov $0x3,%edi
0x0000000000530f6f <+147>: mov $0x0,%eax
0x0000000000530f74 <+152>: callq 0x504d14 <do_log>
0x0000000000530f79 <+157>: jmpq 0x531003 <server_stop+295>
0x0000000000530f7e <+162>: mov $0x6d0c30,%edi
0x0000000000530f83 <+167>: callq 0x555573 <os_getenv>
0x0000000000530f88 <+172>: test %rax,%rax
0x0000000000530f8b <+175>: je 0x530fa8 <server_stop+204>
0x0000000000530f8d <+177>: mov %rax,%rsi
etc...
The important lines are here:
<+98>: cmp %ebx,%ebp
<+100>: jg 0x530f13 <server_stop+55>
<+102>: cmp %ebp,%ebx
<+104>: jl 0x530f7e <server_stop+162>
These come from the original code:
SocketWatcher *watcher;
// <snip>
int i = 0; // Index of the server whose address equals addr.
for (; i < watchers.ga_len; i++) {
watcher = ((SocketWatcher **)watchers.ga_data)[i];
// <snip>
}
if (i >= watchers.ga_len) {
ELOG("Not listening on %s", addr);
return;
}
Normally, the if statement _should_ catch the only control path where watcher
is not assigned: watchers.ga_len <= 0
. When compiled, the assembly lines 98
and 100
correspond to checking if i < watchers.ga_len
, and the lines 102
and 104
correspond to checking if i >= watchers.ga_len
. The assembly seems to compare ebp
(which is watchers.ga_len
) with ebx
(which is i
), and jump if greater; then do the same comparison and jump if less. This is where gcc makes a mistake: it flips the order of the cmp
instruction. This means that the REAL behavior is first check if i < watchers.ga_len
and then check if i < watchers.ga_len
. Which means the code inside the if statement is NEVER executed; no combination of i
and watchers.ga_len
will ever trigger ELOG()
.
So not only is this a use of an uninitialized value if watchers.ga_len == 0
(or technically, if it's less than zero too), it also clobbers any error detection if the for loop reaches the last entry (which would normally cause i == watchers.ga_len
too).
It took me a few hours to go through this process, so I haven't looked at any of the other Wmaybe-uninitialized errors. I suspect they may be similar bugs, though.
The first thing I suggest is to put more test coverage around the warning areas. Gcc is clearly doing some weird stuff.
Since this is a bug in GCC itself, we should probably submit a bug report. I have no idea where to even begin, because it may have already been reported. Searching google for stuff like "gcc wmaybe-uninitialized bug" just comes up with people complaining about false positives, not actual code generation bugs.
I would suggest checking the fall-through logic on for loops. While that if statement (if (i >= watchers.ga_len)
) technically handles the case where watchers.ga_len == 0
, I don't think it was intended to do so?
I would also suggest to use unsigned
variables where appropriate. It may help gcc figure out these not-obvious control flow issues, and helps readability. Though I understand that this is treated as a matter of style and not correctness in the open source community, for whatever reason, unless there's something I'm missing.
rm -r build
make clean
make CMAKE_BUILD_TYPE=RelWithDebInfo
gdb build/bin/nvim
disassemble server_stop
n/a
n/a
I would also suggest to use unsigned variables where appropriate. It may help gcc figure out these not-obvious control flow issues, and helps readability. Though I understand that this is treated as a matter of style and not correctness in the open source community, for whatever reason, unless there's something I'm missing.
This is for whatever reason defined in the style guide, but our style guide suggests using size_t
there and not int
. If garray_T was not used throughout the whole code base where new code normally uses kvec (don鈥檛 know why it was not used here, it is not like msgpack_rpc/server.c could possibly be inherited from Vim; maybe that code just was merged in before merging in kvec.h), it could be refactored to use size_t, but there are simply too many places where garray_T is already used inherited from Vim. And the very same style guide suggests that iterator variables like i
should have the same type as integer they are compared with.
Thanks @Phlosioneer . Do you plan to send a patch?
I've fixed the bug for server_stop
and submitted a PR. I'm going to investigate the other warnings and add their fixes to the PR. Should I create a tracking issue for them? They're likely to be as complicated and long-winded as this bug, so putting them in the comments here doesn't seem correct.
Most helpful comment
I've fixed the bug for
server_stop
and submitted a PR. I'm going to investigate the other warnings and add their fixes to the PR. Should I create a tracking issue for them? They're likely to be as complicated and long-winded as this bug, so putting them in the comments here doesn't seem correct.