Vamiga: new Universal App doesn't start emulation on m1

Created on 25 Nov 2020  路  35Comments  路  Source: dirkwhoffmann/vAmiga

v0.9.15 doesn't start the emulation on m1 Mac mini. I can open the app, configure everything, but as soon as I start the Emulation it will stay at a blank screen. If I start the app by checking "open with Rosetta" it will work, although the emulation is way to fast (discussed on other issue with virtualC64)

Compatibility

Most helpful comment

Great news and great teamwork!

I'll upload version v0.16.2 tomorrow.

All 35 comments

Xcode output just before it stops:

2020-11-25 09:50:48.739121+0100 vAmiga[1948:75168] setup
DiskMountDialog.120::awakeFromNib()
DialogController.68::awakeFromNib()
DiskMountDialog.239::windowDidResize(_:)
DiskMountDialog.239::windowDidResize(_:)
DiskMountDialog.199::insertDiskAction(_:): insertDiskAction df0
MyDocument.439::loadScreenshots(): Seeking screenshots for disk with id 13183632172158702109
MyDocument.448::loadScreenshots(): 0 screenshots loaded
Animation.197::zoomIn(steps:): Zooming in...

v0.9.15.2 emulation does not start on Apple Silicon.

There are some loads of misaligned addresses that probably should be fixed.

Quick fix:
(vAmiga) > Edit Scheme... > Diagnostics > Enable "Thread Sanitizer" and "Undefined Behavior Sanitizer"

Replacing read16 and read32 in Serialization.h with this quick and dirty hack gets rid of the misaligned address accesses.

inline u16 read16(u8 *& buffer)
{
    u8 b1 = *buffer;
    buffer += 1;
    u8 b2 = *buffer;
    buffer += 1;
    u16 result = (b1 << 8) | b2;
    return result;
}

inline u32 read32(u8 *& buffer)
{
    u8 b1 = *buffer;
    buffer += 1;
    u8 b2 = *buffer;
    buffer += 1;
    u8 b3 = *buffer;
    buffer += 1;
    u8 b4 = *buffer;
    buffer += 1;

    u32 result = (b1 << 24) + (b2 << 16) + (b3 << 8) | b4;
    return result;
}

But several data race conditions make the emulator go stop on Apple silicon. That also occurs on x86 machines. But this does not cause any problems there.

data race conditions make the emulator go stop on Apple silicon. That also occurs on x64 machines.

Thanks a lot for digging into that!

How did you detect the race conditions on x64 machines? Using the Sanitizer setting mentioned above?

Being able to debug these conditions on x64 Macs would be brilliant news. It would mean that I can rule out (hopefully all) Mac Silicon bugs without having such a machine in my possession.

Yes, use the settings above. And thanks for your brilliant work!

To tackle the memory alignment issue, let's do some benchmarking first. Here is my example code:

#include <arpa/inet.h>

unsigned char a[1024];

unsigned short read16(int i) {
    return a[i] << 8 | a[i+1]; 
}

unsigned long read32(int i) {
    return a[i] << 24 | a[i+1] << 16 | a[i+2] << 8 | a[i+3]; 
}

unsigned short read16_2(int i) {
    return htons(((unsigned short *)a)[i]);
}

unsigned long read32_2(int i) {
    return htonl(((unsigned long *)a)[i]);
}

Code produced by x86-64 gcc (-O3):

read16(int):
        movsx   rax, edi
        add     edi, 1
        movzx   edx, BYTE PTR a[rax]
        movsx   rdi, edi
        movzx   eax, BYTE PTR a[rdi]
        sal     edx, 8
        or      eax, edx
        ret
read32(int):
        movsx   rax, edi
        lea     edx, [rdi+3]
        movzx   eax, BYTE PTR a[rax]
        movsx   rdx, edx
        movzx   edx, BYTE PTR a[rdx]
        sal     eax, 24
        or      eax, edx
        lea     edx, [rdi+1]
        add     edi, 2
        movsx   rdx, edx
        movsx   rdi, edi
        movzx   edx, BYTE PTR a[rdx]
        sal     edx, 16
        or      eax, edx
        movzx   edx, BYTE PTR a[rdi]
        sal     edx, 8
        or      eax, edx
        cdqe
        ret
read16_2(int):
        movsx   rdi, edi
        movzx   eax, WORD PTR a[rdi+rdi]
        rol     ax, 8
        ret
read32_2(int):
        movsx   rdi, edi
        mov     rax, QWORD PTR a[0+rdi*8]
        bswap   eax
        mov     eax, eax
        ret
a:
        .zero   1024

Code produced by x86-64 clang (-O3):

read16(int):                             # @read16(int)
        movsxd  rax, edi
        movzx   eax, word ptr [rax + a]
        rol     ax, 8
        ret
read32(int):                             # @read32(int)
        movsxd  rax, edi
        movzx   ecx, byte ptr [rax + a]
        shl     ecx, 24
        movzx   edx, byte ptr [rax + a+1]
        shl     rdx, 16
        movsxd  rcx, ecx
        or      rcx, rdx
        movzx   edx, byte ptr [rax + a+2]
        shl     rdx, 8
        or      rdx, rcx
        movzx   eax, byte ptr [rax + a+3]
        or      rax, rdx
        ret
read16_2(int):                           # @read16_2(int)
        movsxd  rax, edi
        movzx   eax, word ptr [rax + rax + a]
        rol     ax, 8
        ret
read32_2(int):                           # @read32_2(int)
        movsxd  rax, edi
        mov     eax, dword ptr [8*rax + a]
        bswap   eax
        ret
a:
        .zero   1024

Bottom line:

  • gcc: The original code produces more compact (and most likely faster) code for both word and long word accesses
  • clang: The original code produces more compact (and most likely faster) code for long word accesses. Word accesses are pretty much the same.

In v0.9.16.1, I've introduced new macros for big endian memory access:

//
// Accessing memory
//

// Reads a value in big-endian format
#define R8BE(a)  (*(u8 *)(a))
#define R16BE(a) HI_LO(*(u8 *)(a), *(u8 *)((a)+1))
#define R32BE(a) HI_HI_LO_LO(*(u8 *)(a), *(u8 *)((a)+1), *(u8 *)((a)+2), *(u8 *)((a)+3))

#define R8BE_ALIGNED(a)  (*(u8 *)(a))
#define R16BE_ALIGNED(a) (htons(*(u16 *)(a)))
#define R32BE_ALIGNED(a) (htonl(*(u32 *)(a)))

// Writes a value in big-endian format
#define W8BE(a,v)  { *(u8 *)(a) = (v); }
#define W16BE(a,v) { *(u8 *)(a) = HI_BYTE(v); *(u8 *)((a)+1) = LO_BYTE(v); }
#define W32BE(a,v) { W16BE(a,HI_WORD(v)); W16BE((a)+2,LO_WORD(v)); }

#define W8BE_ALIGNED(a,v)  { *(u8 *)(a) = (u8)(v); }
#define W16BE_ALIGNED(a,v) { *(u16 *)(a) = ntohs((u16)v); }
#define W32BE_ALIGNED(a,v) { *(u32 *)(a) = ntohl((u32)v); }

The faster _ALIGNED variants are used for accessing the Amiga memory. They will work on ARM as well, because all accesses will happen at aligned memory locations. The other variants are utilized by the Serializer since the values inside a snapshot are not aligned in general.

v0.9.16.1 still no joy on Apple silicon. a few days ago I had it running with roger's fixes though...

I do have a spare Macmini9,1. if you want me to, I can setup a VPN / Screen Sharing account for you. that's how roger was able to do some debugging on M1.

Hmm I am uncertain whether the cause of the failure on m1 is found .... It is not clear to me whether it did run ok with rogers fixes on m1 or not ....

When it did run ok with the above fixes then zip the project and put it into here ... dirk can then run a file compare and patch it into the version at github ....

unfortunately, I already have wiped the Mac mini on which roger did the debugging. I only kept the working .app which runs slow like molasses if startet without the "Thread Sanitizer" of Xcode.

Oh I see ... and also there seems to be no ARM emulator on Intel Macs to test the ArmBuild of vAmiga or vC64 directly ... apple wants us to buy the real hardware 馃槵

Another way to test the ARM build of the emulator code would be to make a blank iOS Project and copy over the Emulator source-code-folder only into that empty iOS project ... Then to instantiate the C64 or Amiga Object and to depoy and run the emulator on an iPhone/iPad with an AXX processor ... which shares the same ABI as the M1 processor ... so in theory the same problems should arise in an iPad/iPhone build of the vAmiga/C64 emulator .. no?

I have a mac mini with M1. I also get the same issue that emulation doesn't seem to start.

I have an iPhone with an AppleSilicon A9 processor. And emulation seems to start on that machine... I see the DF0 drive head clicking ...

grafik

question the M1 owners ... do you at least see the hand disk image ?

next I will try to boot defender of the crown on the A9 processor ...

I only see a black screen when I try to boot with both 1.3 kick or the built-in AROS

the emulator itself taken from master branch without its GUI loads defender of the crown on an iPhone AppleSilicon A9 processor

(the first seconds of the video nothing happens ...because XCode loads the code package onto the AppleSilicon powered iPhone... wait a bit)

https://user-images.githubusercontent.com/17108995/103284306-2b6a8a00-49db-11eb-8083-add2c8ac4e6d.mp4

Good news. I got it to boot :)

Screenshot 2020-12-29 at 14 58 03

There seems to be a problem with this code

void
Oscillator::waitUntil(u64 deadline)
{
#ifdef __MACH__

   // mach_wait_until(deadline);

#else

    assert(false);
    // TODO: MISSING IMPLEMENTATION

#endif
}

If I comment out the wait code above thing seems to start running (of course too fast) but the sleep value that comes in here is wayyy too big so the code just sits sleeping in there it looks like.

deadline    u64 515659668836899

but the sleep value that comes in here is wayyy too big

Wow, that's great news! So it's due to the Mach conversion unit thing that was mentioned in another thread (where I replied the code would already take care of it 馃檮).

Yeah. I think the code needs some extra checks :) I can run more tests if you find something.

@emoon good news that m1 is back in game 馃ぉ!!

The timing bug on applesilicon and its background was mentioned in this thread see second post https://github.com/dirkwhoffmann/virtualc64/issues/592

That perfectly explains now why the isolated emulator on A9 AppleSilicon needed the warp mode set to true !!!

I first spent no attention the importance or meaning that I had to set it to true ... to make the isolated emulator code to work on the iPhone... 馃檮

if I now uncomment the setWarp(true) like this

    //armAmiga->setWarp(true);
    DiskFile *df0file = DiskFile::makeWithFile(df0_path);
    if (df0file) {
        fprintf(stderr, "disk found, insert the disk %s", df0_path);
        Disk *disk = Disk::makeWithFile(df0file);
        armAmiga->df0.ejectDisk();
        armAmiga->df0.insertDisk(disk);
        armAmiga->df0.setWriteProtection(false);
    }
    armAmiga->run();

and run it again on the A9 chip then it does not load defender of the crown anymore ... the A9 gets stuck as on the m1 chip very early in boot process ... no drive actions

image

I guess setWarp(true) skips the defect time waiting code ...

@dirkwhoffmann should I send you the xcode project with the isolated emulator code for testing on iPhone ARM chip ?

should I send you the xcode project with the isolated emulator code for testing on iPhone ARM chip ?

Yes, that's the way to go

pm with zipped project sent ...馃槑

So bad for apple... they wanted to sell us tons of M1 Macs and now we just use our iPhones for ARM development instead 馃槀

I think I got it:

This

Oscillator::waitUntil(u64 deadline)
{
#ifdef __MACH__
    mach_wait_until(deadline);
#else
...

has to be replaced by that:

Oscillator::waitUntil(u64 deadline)
{
#ifdef __MACH__
    mach_wait_until(nanos_to_abs(deadline));
#else
...

My nanos_to_absfunction was alright, but not called at all necessary places.

Yes, works fine now :)

Screenshot 2020-12-29 at 17 25 05

yup, confirmed! works fine now ;-)

Great news and great teamwork!

I'll upload version v0.16.2 tomorrow.

I'll upload version v0.16.2 tomorrow.

0.9.16.2 ;-)

dieser thread kann zu, v0.9.16.2 laeuft auf Apple silicon.
issue resolved, v0.9.16.2 works on Apple silicon.

thanks everybody, happy new year to all!

issue resolved, v0.9.16.2 works on Apple silicon.

Could you do one thing before I close the thread? Could you check if snapshot saving (and loading) works? 馃槵

yup: taking and reverting to a snapshot works for me.

yup: taking and reverting to a snapshot works for me.

Great 馃槑. I'll close it then. Please reopen if other issues arise.

what would also be interesting (only theoretically) is whether M1 saved snapshots could be loaded on Intel machine ... but in practise its maybe not as important ...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dirkwhoffmann picture dirkwhoffmann  路  4Comments

dirkwhoffmann picture dirkwhoffmann  路  3Comments

Alessandro1970 picture Alessandro1970  路  4Comments

dirkwhoffmann picture dirkwhoffmann  路  3Comments

dirkwhoffmann picture dirkwhoffmann  路  4Comments