Zig: MSVC Compiler 19.22.xxxxx generates invalid code which leads to inability to build libuserland

Created on 7 Aug 2019  ยท  7Comments  ยท  Source: ziglang/zig

Related to #2740

With the upgrade from 19.21 (VS 2017) to 19.22, Microsoft seems to have broken something related to our FNV hash implementation. Building libuserland leads to this error:

invalid builtin function: 'hasDecl'

The workaround is to use VS 2017 or build tools older than 19.22.

os-windows stage1 upstream

Most helpful comment

Tested this again with

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob2 /D NDEBUG")

and

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob3 /D NDEBUG")

(Ob3 is a new flag in VS 2019) on compilers 19.24.28314 and 19.24.28316 without walking into the issue anymore. Maybe we can close this (I'll make a PR to reset the optimization flags to default) ?

All 7 comments

going to put this here for posterity:

STATUS | CL.EXE | VS | CMAKE | INLINE
:-: | :-: | :-: | :-: | :-:
โœ“ | 19.21.27702 | 2019 16.0 | Release | /Ob2
โŒ | 19.22.27905 | 2019 16.2 | Release | /Ob2
โœ“ ||| MinSizeRel | /Ob1
โœ“ ||| Release | /Ob1 (hacked)
โœ“ | 19.23.28105.4 | 2019 16.3.0-pre.4.0 | MinSizeRel | /Ob1
โŒ ||| Release | /Ob2
โœ“ || 2019 16.3.2 | MinSizeRel | /Ob1
โŒ ||| Release | /Ob2
โœ“ | 19.24.28117.0 | 2019 16.4.0-pre.1.0 | MinSizeRel | /Ob1
โŒ ||| Release | /Ob2
โœ“ | 19.24.28314.0 | 2019 16.4.2 | MinSizeRel | /Ob1
โŒ ||| Release | /Ob2
โœ“ || 2019 16.5.0-pre.1.0 | MinSizeRel | /Ob1
โŒ ||| Release | /Ob2
โœ“ | 19.24.28316.0 | 2019 16.4.4 | Release | /Ob2
โœ“ | 19.25.28508.3 | 2019 16.5.0-pre.2.0 | Release | /Ob2

Can it be related to the optimizations? In radare2 we met with memmove optimization bug: https://developercommunity.visualstudio.com/content/problem/583227/vs-2019-cl-1921277022-memmove-instrinsic-optimizat.html

Based on @XVilka's link, that could very well be our issue. Microsoft has acknowledged the bug and so eventually Azure will be updated with a patched MSVC. Until then I am testing the MinSizeRel workaround.

The CI is using MinSizeRel to work around this issue. Thanks @mikdusan.

I'm not actually able to reproduce it locally. Let's keep this issue open until the fix is widely available.

I investigated this with

$ cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.23.28105.4 for x64

at 17f2af10b56dd508ebd4a22834fd594cb5a49864.

The root cause is that building the global builtin function table corrupts memory while adding entries. They will then later not be available when trying to access them.

The bug only occurs with CMake's Release target, so in order to get debug symbols I had to hack a bit using https://gitlab.kitware.com/cmake/community/wikis/FAQ#make-override-files. In my case I used a RelWithDebInfo and overriding flags to be sure to have /O2 /Ob2.

set(CMAKE_USER_MAKE_RULES_OVERRIDE_CXX
   ${CMAKE_CURRENT_SOURCE_DIR}/cxx_flag_overrides.cmake)

```cmake
if(MSVC)
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO_INIT "/MD /Zi /O2 /Ob2 /D NDEBUG")
endif()

Building target `libuserland` (with the command `zig0.exe build --override-lib-dir <path>/<to>/zig/lib libuserland install -Doutput-dir=<path>/<to>/zig/build -Drelease=true -Dlib-files-only --prefix <path>/<to>/Install/zig`) will then fail.

So I launched it via the debugger. Now, while running `define_builtin_fns` from `codegen.cpp`, we can observe `g->builtin_fn_table._entries` being corrupted (it's a sparse array in memory) when executing `create_builtin_fn(..., "errorReturnTrace");`.

The disasm is

```asm
                  create_builtin_fn(g, BuiltinFnIdErrorReturnTrace, "errorReturnTrace", 0);
00007FF625EE5219  lea         edx,[r14+28h]  
00007FF625EE521D  lea         ecx,[rdx-27h]  
00007FF625EE5220  call        qword ptr [__imp_calloc (07FF62C995030h)]  
00007FF625EE5226  mov         rsi,rax  
00007FF625EE5229  test        rax,rax  
00007FF625EE522C  je          define_builtin_fns+2999h (07FF625EE5B59h)  
00007FF625EE5232  lea         rbx,[rax+8]  
00007FF625EE5236  mov         qword ptr [rbp+28h],rax  
00007FF625EE523A  mov         rcx,rbx  
00007FF625EE523D  lea         r8d,[r14+10h]  
00007FF625EE5241  lea         rdx,[string "errorReturnTrace" (07FF62A907A28h)]  
00007FF625EE5248  call        buf_init_from_mem (07FF625EDCE10h)  
00007FF625EE524D  lea         r8,[rbp+28h]  
00007FF625EE5251  mov         qword ptr [rbp+30h],rbx  
00007FF625EE5255  lea         rdx,[rbp+30h]  
00007FF625EE5259  mov         rcx,rdi  
00007FF625EE525C  mov         dword ptr [rax],65h  
00007FF625EE5262  mov         qword ptr [rax+20h],r14  
00007FF625EE5266  call        HashMap<Buf * __ptr64,BuiltinFnEntry * __ptr64,&buf_hash,&buf_eql_buf>::put (07FF625DA17F2h)  

This is basically create_builtin_fn inlined, see we're creating the builtin entry, its name and inserting it into the hashmap.
Near the end, at mov dword ptr [rax],65h the memory pointed to by builtin_fn->name.list.items (rax) is stomped and so the key for the hashmap is garbage.

If you look at the asm generated for the previous builtins, the part after the call to buf_init_from_mem is nearly identical, and then changes slightly starting from "errorReturnTrace".

I'm pretty sure this is a codegen bug, but maybe not the one linked above. An easy workaround is to surround one of these two functions with #pragma optimize("", off/on) and the bug will disappear. I guess splitting defined_builtin_fns in two might also help. Anyways we should probably file a bug report to Microsoft.

Tested this again with

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob2 /D NDEBUG")

and

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob3 /D NDEBUG")

(Ob3 is a new flag in VS 2019) on compilers 19.24.28314 and 19.24.28316 without walking into the issue anymore. Maybe we can close this (I'll make a PR to reset the optimization flags to default) ?

@Sahnvour,

compilers 19.24.28314 and 19.24.28316

Something odd here. At my last update I tested 19.24.28314 and the issue remained. This was building against the same LLVM binary bundle used by CI.

I speculated that maybe this issue can be resolved by rebuilding LLVM with new compiler. Unfortunately I ran into difficulties rebuilding LLVM on my system and left that unresolved.

Are you using your own LLVM dependency?

update: 19.24.28316.0 and 19.25.28508.3 work for me with Release and /Ob2 so assuming CI is running minimal version we should be fine going back to /Ob2.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dobkeratops picture dobkeratops  ยท  3Comments

andrewrk picture andrewrk  ยท  3Comments

S0urc3C0de picture S0urc3C0de  ยท  3Comments

andrewrk picture andrewrk  ยท  3Comments

andrewrk picture andrewrk  ยท  3Comments