Go: all: port to Windows/ARM32

Created on 29 Jun 2018 · 62Comments · Source: golang/go

Hi everyone, we will be submitting a patch in the near future that adds Windows/ARM32 support to GO. All but a few tests are passing, and this implementation has been used to compile GO itself and run Docker containers on Windows/ARM32. We look forward to working with the community to iron out the last remaining issues and get this merged!

NeedsInvestigation OS-Windows

Source

jordanrh1

👍62 ❤26 🎉8

Most helpful comment

I'm curious why was the port developed out-of-tree? I'm asking because I'm afraid it's much more complicated to properly review such a big change, as an port to new architecture is, in a single CL.

In any case, please check https://github.com/golang/go/blob/master/CONTRIBUTING.md and the linked from there https://golang.org/doc/contribute.html for more details.

(Personal wish: Please spell the name of the language as 'Go'.)

cznic on 30 Jun 2018

🎉9 👍5

All 62 comments

I'm curious why was the port developed out-of-tree? I'm asking because I'm afraid it's much more complicated to properly review such a big change, as an port to new architecture is, in a single CL.

In any case, please check https://github.com/golang/go/blob/master/CONTRIBUTING.md and the linked from there https://golang.org/doc/contribute.html for more details.

(Personal wish: Please spell the name of the language as 'Go'.)

cznic on 30 Jun 2018

🎉9 👍5

@jordanrh1 you should read https://github.com/golang/go/wiki/PortingPolicy

Alex

alexbrainman on 30 Jun 2018

@cznic We were under a tight deadline and did not know if it was feasible. This is the earliest I could engage the community. We can break down the change into smaller chunks if necessary.

@alexbrainman Thanks for the link. We are committed to meeting the requirements outlined in the doc.

jordanrh1 on 30 Jun 2018

👍10

We are committed to meeting the requirements outlined in the doc.

SGTM.

Alex

alexbrainman on 30 Jun 2018

👍3 👎2

Change https://golang.org/cl/125643 mentions this issue: dashboard: add windows-arm builder

gopherbot on 24 Jul 2018

Change https://golang.org/cl/125648 mentions this issue: cmd/link: support windows/arm

gopherbot on 25 Jul 2018

Change https://golang.org/cl/125649 mentions this issue: debug/pe: support windows/arm

gopherbot on 25 Jul 2018

Change https://golang.org/cl/125645 mentions this issue: runtime: add definitions for windows/arm

gopherbot on 25 Jul 2018

Change https://golang.org/cl/125647 mentions this issue: cmd/api: support windows/arm

gopherbot on 25 Jul 2018

Change https://golang.org/cl/125646 mentions this issue: cmd/dist: support windows/arm

gopherbot on 25 Jul 2018

Change https://golang.org/cl/128715 mentions this issue: runtime: support windows/arm

gopherbot on 9 Aug 2018

Change https://golang.org/cl/128716 mentions this issue: syscall: support windows/arm

gopherbot on 9 Aug 2018

Change https://golang.org/cl/130056 mentions this issue: cmd/dist: support windows/arm

gopherbot on 20 Aug 2018

Change https://golang.org/cl/131282 mentions this issue: cmd/vet: remove exclusions for callbackasm

gopherbot on 25 Aug 2018

Change https://golang.org/cl/127665 mentions this issue: windows: add support for windows/arm

gopherbot on 24 Sep 2018

Change https://golang.org/cl/137015 mentions this issue: cmd/vendor: update golang.org/x/sys/windows for windows/arm support

gopherbot on 24 Sep 2018

Sounds like the builder is still failing (#28854). @jordanrh1, what work remains to have this port stable for the release?

bcmills on 19 Nov 2018

These are the failing runtime tests:

TestPanicInlined
TestAbort
TestArenaCollision (out of memory)
TestSmhasherTwoNonzero (out of memory)
TestSmhasherSparse
TestSmhasherPermutation
TestStackWrapperStackPanic/sigpanic/CallersFrames
TestNetpollDeadlock (out of memory)

Some are due to out of memory, and the rest I believe are due to issues with traceback.

It looks like there are some new failures in html/template.

jordanrh1 on 19 Nov 2018

Some are due to out of memory,

I am pretty sure it is OK to skip tests that take too much memory on RPI.

and the rest I believe are due to issues with traceback.

Please, see if you can debug and fix it. Feel free to ask questions - everyone will help.

It looks like there are some new failures in html/template.

I can see from your comment https://github.com/golang/go/issues/28854#issuecomment-440520454 it could be same bug.

Alex

alexbrainman on 21 Nov 2018

Change https://golang.org/cl/153518 mentions this issue: builders: skip writing snapshop for windows/arm builder

gopherbot on 11 Dec 2018

Change https://golang.org/cl/153719 mentions this issue: runtime/pprof: fix TestCPUProfileWithFork on Windows/ARM

gopherbot on 12 Dec 2018

Change https://golang.org/cl/153718 mentions this issue: runtime: fix profiling on windows/ARM

gopherbot on 12 Dec 2018

Change https://golang.org/cl/153839 mentions this issue: runtime: pass LR to sigprof on windows/arm

gopherbot on 12 Dec 2018

Change https://golang.org/cl/153844 mentions this issue: link/internal/ld: fix TestRuntimeTypeAttrInternal on windows/arm

gopherbot on 12 Dec 2018

Change https://golang.org/cl/153830 mentions this issue: cmd/link: skip TestRuntimeTypeAttrInternal on windows/arm

gopherbot on 13 Dec 2018

Change https://golang.org/cl/154357 mentions this issue: cmd/nm: report windows/arm as relocatable in TestGoExec

gopherbot on 15 Dec 2018

@hyangah Does the order of regions matter in TestAnalyzeAnnotations? This test is failing on windows/arm because the order of regions is different from what is expected.

On Linux:

        task 1: task0
                start: 33721 end: 62861 complete: true
                2 goroutines
                4 regions:
                        task0.region0(goid=25)
                        (goid=26)
                        task0.region1(goid=26)
                        task0.region2(goid=26)
                1 children:
                        task1

On Windows:

task 1: task0
            start: 1006000 end: 673010992 complete: true
            2 goroutines
            4 regions:
                task0.region0(goid=43)
                (goid=44)
                task0.region2(goid=44)
                task0.region1(goid=44)
            1 children:
                task1

On Linux, the order of regions is 0, (unnamed), 1, 2. On Windows, the order of regions is 0, (unnamed), 2, 1. Is ordering significant or should the test be changed to accept an arbitrary ordering of regions?

jordanrh1 on 17 Dec 2018

Change https://golang.org/cl/154560 mentions this issue: windows: use netevent.dll in TestFormatMessage for windows/arm

gopherbot on 17 Dec 2018

@jordanrh1 the regions are supposed to be sorted based on the timestamp so I think it needs investigation.

https://github.com/golang/go/blob/a1aafd8b28ada0d40e2cb25fb0762ae171eec558/src/cmd/trace/annotations.go#L349

Can you capture the trace with the command and share it?
go test -run=TestAnalyzeAnnotations runtime/trace -savetraces

hyangah on 18 Dec 2018

@hyangah, Here is the trace, thank you for looking :)
TestAnalyzeAnnotations.trace.zip

jordanrh1 on 18 Dec 2018

Change https://golang.org/cl/154777 mentions this issue: cmd/trace: force regions order in TestAnalyzeAnnotations

gopherbot on 18 Dec 2018

@jordanrh1, thanks you for the trace.
According to the trace, the system didn't seem to provide high-enough time resolution required for this testing or for the execution tracer to be useful. All the traced events were timestamped either 0 or 1013008ns. So, sorting based on the timestamp doesn't guarantee a unique ordering this test assumes.

I sent the cl/154777 to slow the traced program (hoping that 1ms gap between events is sufficient).
Alternatively, we can just skip the test on windows.

hyangah on 18 Dec 2018

@hyangah Thanks for looking. The high resolution clock runs at 1Mhz on this machine, which I think would be fast enough to get meaningful timestamps. There could be a bug in the timestamp code. I will investigate.

jordanrh1 on 18 Dec 2018

@jordanrh1 I see. The traced code is just creating the events back to back, so requiring them to have distinct timestamps is unreasonable. They all can happen within 1usec and end up with the same timestamp. So, I think the assumption the test makes is wrong. I will just sort the region list based on the name.

hyangah on 18 Dec 2018

Change https://golang.org/cl/154817 mentions this issue: windows/svc: use wevtutil.exe instead of powershell for windows/arm

gopherbot on 18 Dec 2018

@hyangah I'm seeing a resolution of about 1ms in nanotime() on windows/arm. I think there is a problem with the implementation.

jordanrh1 on 18 Dec 2018

Change https://golang.org/cl/154758 mentions this issue: runtime: use QPC for nanotime and time.now on windows/arm

gopherbot on 18 Dec 2018

@hyangah FYI, the test is passing after fixing the time functions.

jordanrh1 on 18 Dec 2018

👍1

Change https://golang.org/cl/154761 mentions this issue: Revert "runtime: use QPC for nanotime and time.now on windows/arm"

gopherbot on 19 Dec 2018

Change https://golang.org/cl/154762 mentions this issue: runtime: use QPC to implement cputicks() on windows/arm

gopherbot on 19 Dec 2018

Getting closer:

The failing test is TestCallbackPanic(). I'm suspecting stack corruption.

I also want to modify cputicks() to call onosstack(_QueryPerformanceCounter, ...), and I want to modify onosstack() to not modify any global data. These routines are called deep in the scheduler and I am concerned they could be corrupting state.

jordanrh1 on 27 Dec 2018

👍1

Here is the issue with TestCallbackPanic().

TestCallbackPanic() tests what happens when Go code running in a callback panics. Go loads a native dll (user32.dll), calls a function that is expected to invoke a callback (EnumWindows), and then panics within the callback. It registers deferrals at several points, and verifies that the deferrals run and that the panic can be recovered from. In the Go function that calls EnumWindows, it registers a deferral to unload user32.dll when the function returns. That way, user32.dll will be unloaded no matter what, even if EnumWindows panics. This function looks like this:

// nestedCall calls into Windows, back into Go, and finally to f.
func nestedCall(t *testing.T, f func()) {
        c := syscall.NewCallback(callback)
        d := GetDLL(t, "user32.dll")
        defer d.Release()
        d.Proc("EnumWindows").Call(c, uintptr(*(*unsafe.Pointer)(unsafe.Pointer(&f))))
}

The problem arises in the interaction between EnumWindows and Go's panic unwinding logic. When Go encounters a panic in a callback, it unwinds the g0 stack to the point before the outgoing cgo call that invoked the callback was made. That means in this case it will reset the g0 stack pointer to what it was just before the call to EnumWindows. If you look at the stack pointer on entry of EnumWindows, and on entry of FreeLibrary (which is called by defer d.Release()), they are the same value. It is my understanding that this is by design. The logic that restores the stack pointer is in func unwindm:

                 // Restore sp saved by cgocallback during
                // unwind of g's stack (see comment at top of file).
                mp := acquirem()
                sched := &mp.g0.sched
                switch GOARCH {
                default:
                        throw("unwindm not implemented")
                case "386", "amd64", "arm", "ppc64", "ppc64le", "mips64", "mips64le", "s390x", "mips", "mipsle":
                        sched.sp = *(*uintptr)(unsafe.Pointer(sched.sp + sys.MinFrameSize)) // <-- restore sp from saved value on the stack
                case "arm64":
                        sched.sp = *(*uintptr)(unsafe.Pointer(sched.sp + 16))
                }

It restores the g0 stack pointer from the value saved on the stack by cgocallback. cgocallback saves m->g0->sched.sp to the stack, which was previously written by cgocall when EnumWindows was called.

havem:
        // Now there's a valid m, and we're running on its m->g0.
        // Save current m->g0->sched.sp on stack and then set it to SP.
        // Save current sp in m->g0->sched.sp in preparation for
        // switch back to m->curg stack.
        // NOTE: unwindm knows that the saved g->sched.sp is at 4(R13) aka savedsp-8(SP).
        MOVW    m_g0(R8), R3
        MOVW    (g_sched+gobuf_sp)(R3), R4     <--- load last saved value of g0's SP (when EnumWindows was invoked)
        MOVW    R4, savedsp-8(SP)              <--- save this value to the stack, where unwindm knows where to find it
        MOVW    R13, (g_sched+gobuf_sp)(R3)

So, the fact that we see FreeLibrary execute with the same stack pointer as EnumWindows seems to be by design. EnumWindows was not allowed to complete since a panic occurred while it was calling back into Go code, and it's stack was deallocated. This shouldn't be an issue, since it is generally an error to maintain references to stack local memory that outlive the stack frame.

This is where things get weird. EnumWindows isn't as straightforward as you might think. It allocates memory from the heap before invoking the callback, and frees this memory before it returns. It uses two mechanisms to ensure this memory gets cleaned up in case the callback experiences an error:

it tracks the allocations with a heap tracking mechanism. Bookkeeping for this allocation is stored on the stack
it has an SEH cleanup handler that frees the allocations and unlinks the bookkeeping information from the global tracking database

When the user32 module is unloaded, its garbage collector runs and scans for outstanding resources, which include outgoing calls that have not been completed, and their associated resources. Since the SEH handler never ran (because an explicit call to panic() does not trigger SEH handlers), the global tracking database still has a reference to the bookkeeping information that was stored on the stack. However, this stack space was deallocated, and new code is running over it, which corrupts the data. The bookkeeping code faults when it loads a bad pointer from the stack.

As you can see, this error occurs due to very specific interaction between Go's deferral mechanism, Go's callback mechanism, and user32's resource tracking mechanism. In order for C code to be go-panic-safe, it should not allocate any resources, and should not maintain any references to stack memory.

In terms of fixes, I'm not sure there's a way to ensure that C code which relies on SEH or C++ exceptions has a chance to clean up in case of a go panic. It seems best to recover() in the callback, and not panic across callback boundaries. Another option is to find a replacement for EnumWindows that is safe to panic across and use that for the test instead. @alexbrainman @ianlancetaylor , what do you think?

jordanrh1 on 28 Dec 2018

Any idea why the test passes on x86? What's the key difference?

ianlancetaylor on 28 Dec 2018

I think it has to do with how subsequent code overwrites the stack. If the data structure on the stack doesn't get overwritten, the error will not manifest. amd64 and x86 have the same underlying issue, but they must not corrupt the bookkeeping datastructure.

jordanrh1 on 29 Dec 2018

I'm not sure there's a way to ensure that C code which relies on SEH or C++ exceptions has a chance to clean up in case of a go panic.

I agree. This is not implemented, and it won't be simple to implement.

It seems best to recover() in the callback, and not panic across callback boundaries.

I agree in general, but I don't think it is acceptable for our runtime tests. I think Go users write code that panics from callback and into call site function. We should try and make it work, if possible.

Another option is to find a replacement for EnumWindows that is safe to panic across and use that for the test instead

That sounds good. I don't remember who came up with EnumWindows to test it all, but anything else should be fine too. As long as it works and easily available on any Windows computer that is supported by Go.

Alex

alexbrainman on 29 Dec 2018

@ianlancetaylor One of the reasons it doesn't readily manifest on amd64/386 is that amd64/386 tests run on Windows Desktop, which has a different implementation of user32 than IoT Core, which is where the ARM tests run. Iot Core uses 'minuser', which is a completely rewritten window manager. I was able to reproduce the exact crash on AMD64 by running the test on IoT Core. In the deferral, I called a function that used a large chunk of g0 stack (in a legal way), to ensure that any stack based data structures still referenced would be overwritten with garbage data. Later, when the user32 garbage collector crashed, it crashed in exactly the same place as ARM, because the stack-based bookkeeping structure had been overwritten in the deferral.

On ARM, to further confirm that stack corruption was the issue, I inserted code in callbackasm1 to save the current g0 SP so that when unwindm restores the g0 SP, it will restore it to a value that is safely below the stack for EnumWindows. With this change, the crash went away because EnumWindow's stack was no longer being overwritten in the deferral. The test ran many times successfully.

Given these findings I have increased confidence that this is the issue.

jordanrh1 on 30 Dec 2018

Change https://golang.org/cl/155923 mentions this issue: runtime: use EnumTimeFormatsEx instead of EnumWindows in callback tests

gopherbot on 30 Dec 2018

Exciting stuff:

jordanrh1 on 31 Dec 2018

🚀2 ❤2

@jordanrh1 Is there an issue for porting to Windows/ARM64 too? Thanks.

petemoore on 7 Jan 2019

@petemoore There is not.

jordanrh1 on 7 Jan 2019

It seems exes built with GOOS=windows GOARCH=arm with the current master won't run on Windows RT 8.1, failing with "This app can't run on your PC". This is with code signature checking subverted. I haven't looked into it yet so I'm unsure of details and I know this platform is out of support, but is this a configuration that theoretically should be working? By the way, fantastic work.

0xacf on 19 Feb 2019

You also need to specify GOARM=7. I'll look into whether there's a way to do this by default.

jordanrh1 on 19 Feb 2019

Unfortunately it's still failing with the same error. I made sure to clear the Go build cache and delete the output .exe, so I'm fairly sure it is getting rebuilt properly. I've put my test file / command below.

$ cat hello.go
package main

import "fmt"

func main() {
  fmt.Println("hello, world")
}

$ GOOS=windows GOARCH=arm GOARM=7 ./go/bin/go build hello.go

0xacf on 19 Feb 2019

If you have time, try starting it under the usermode debugger, and see if it offers any clues:

windbg.exe hello.exe

If it fails to start, then it's probably being rejected by the loader. To troubleshoot loader errors, connect kernel debugger and turn on loader snaps,

!gflag +sls

Then try running the program. Loader spew will be printed to the windbg window.

jordanrh1 on 20 Feb 2019

I definitely have time to do some tests and try to fix this, but likewise, don't worry about this platform too much if nobody considers it important!

Unfortunately it appears security restrictions in Windows RT prevent the debugger from creating or attaching to processes. If you know of any other way to get log messages out of the linker / loader I'll happily try that, otherwise I'll reply later if I end up getting the debugger working somehow.

Edit: Running the program a different way, I can see it produces ERROR_BAD_EXE_FORMAT (%1 is not a valid Win32 application.)

0xacf on 20 Feb 2019

@jordanrh1 I managed to run windows-arm on one of my RPIs. The only bits I am still missing is git command. What do you do for git on windows-arm?

@bradfitz is there a git command that is written in pure Go? I would just compile it to windows-arm and use it there. I would not need anything fancy, just basic git functionality.

Thank you.

Alex

alexbrainman on 24 Mar 2019

@alexbrainman, the builders don't use git. They only run the x/build/cmd/buildlet binary and the cmd/coordinator pushes code in tarballs at them.

bradfitz on 24 Mar 2019

@alexbrainman, the builders don't use git.

Fair enough. I suppose I have to find a way to develop on this thing.

Alex

alexbrainman on 24 Mar 2019

I do all my development on linux and cross compile to windows/arm.

jordanrh1 on 25 Mar 2019

I do all my development on linux and cross compile to windows/arm.

I do too.

But I was trying to fix #30711. And that involved running cmd/go.TestScript, and cmd/go.TestScript runs go.exe executable. So I needed all moving parts. I used $GOROOT/src/bootstrap.bash (luckily for me it worked without a hitch) to create a snapshot of current Go tree with executables and built packages, and copied all these files onto RPI. And then run the test.

I did not have Git, so I had to be careful when changing files to fix #30711. The fix was small enough for me to manage without Git. But I would not use that setup for anything bigger than that.

RPI runs standard Windows file and printer sharing services, and I was able to mount shares from RPI on my Windows PC. Maybe (I did not try that) I could just use Git on my Windows PC to manage Go repo installed on RPI. I will see next time I use that.

Another thing. While debugging, I searched for windows-arm prebuilt image on https://golang.org/dl and it is not there. @bradfitz maybe we should put latest windows-arm image at https://golang.org/dl. I doubt there is need for ,msi file, but we should put .zip file alongside with 386 and amd64.

Also @jordanrh1 I struggled to make Windows 10 running on some of my RPIs.

I have RPI 3 B+. I downloaded "IoT Dashboard", and it gives me option of installing "Windows 10 IoT Core (17763)". I created SD card (ScanDisk Ultra 16GB), but RPI does not boot - red power light comes on, and I get rainbow image on my HDMI screen, but nothing happens.

I also have RPI 2 B. I tried booting this with the same image - red light is on, green light flashes 4 slow times followed by 4 quick times, same rainbow image on the screen.

I managed to run Windows on RPI 3 B. But I would like to make my other RPIs run Windows. Do you have any suggestions?

Also I noticed that windows-arm builder is not listed on https://build.golang.org Can you, please, investigate what happened to it?

I also wonder, if we could run windows-arm build on multiple RPI devices to make it faster?

Maybe we could even run it on virtual hardware (crazy idea). Like

https://www.newventuresoftware.com/blog/running-windows-10-iot-core-in-a-virtual-machine

https://stackoverflow.com/questions/30276610/how-to-deploy-windows-10-iot-rasp-pi-image-as-a-virtual-machine

Thank you.

Alex

alexbrainman on 26 Mar 2019

@jordanrh1, per http://golang.org/wiki/PortingPolicy#requirements-for-a-new-port, “[t]he builder must already be running” for new ports. I don't see any results for the windows-arm builder on https://build.golang.org since March 20.

Has this port been abandoned?

bcmills on 20 May 2019

Hi - there is already a port available at https://github.com/thongtech/go-windows-arm64 - does this help ?