TrinityCore does not compile on aarch64

Created on 2 May 2019  路  11Comments  路  Source: TrinityCore/TrinityCore

Description:

TrinityCore does not compile on aarch64

Current behaviour:

TrinityCore does not compile on aarch64

Expected behaviour:

It should compile on aarch64

Steps to reproduce the problem:

Try to compile on aarch64

Branch(es):

3.3.5

TC rev. hash/commit:

24fbbee4b9af7b5226772378dd83b78c103d969d

Operating system:

Linux 4.19 aarch64 Debian Stretch

Notes:

/home/trinitycore/TrinityCore/dep/g3dlite/include/G3D/AtomicInt32.h:122:29: error: invalid output constraint '=a' in asm
                          : "=a" (nz)
                            ^
/home/trinitycore/TrinityCore/dep/g3dlite/include/G3D/AtomicInt32.h:150:29: error: invalid output constraint '=a' in asm
                          : "=a" (ret)
                            ^
/home/trinitycore/TrinityCore/dep/g3dlite/include/G3D/System.h:504:15: error: invalid output constraint '=a' in asm
            : "=a" (timelo),
              ^

Those parts are x86 assembly which obviously won't work on ARM.

Priority-FutureFeatureRequest

Most helpful comment

We are not willing to sacrifice x86/x64 performace for the sake of supporting architectures nobody on the team cares about

All 11 comments

trinitycore NEEDS SSE2.

No it does not, especially not in caps.
The only place using SSE2 is the 2006-2008 implementation of the Fast Mersenne Twister algorithm.
This is easy enough to replace, with a performance impact as is usually the first step when porting applications to other cpu architectures.

However all of this is not important because one of the original authors has been improving this implementation since then, including support for arm, with and without neon (which for you is the replacement for sse2 on arm platforms)!

So this is a non-issue and 100% qualifies as a feature.

we tried to drop SSE2 requirement and performance dropped a lot, so , yes trinitycore NEEDS SSE2 and we don't go to drop that requirement to make tc to run on underpowered machines.

We are not willing to sacrifice x86/x64 performace for the sake of supporting architectures nobody on the team cares about

@Aokromes I think you should stop judging feature requests, and systems so quickly. It is toxic to potential future contributors, and lacks facts.

Otherwise, please everyone remind me how we got from a feature request to a lets make x86 slow by removing a feature?

I was hoping this issue could be the place to discuss how this can be done, maybe with people whi are interested to make it happen. But instead it turned into a political debate on how x86 should rule the world :(

it's a FACT we tested it.

hey @Artox :) We currently don't have any active developer interested enough into ARM so we can't provide ARM support, but this doesn't mean we will not merge PRs that will add support (or maybe even get interested into the discussion). In a way, we can support ARM in the same way we support Mac OS: none notices when we break the build on that platform but when we do there is a Mac OS guy who asks to push some fixes.

The G3D version we use ( v9 ) is quite old, I even still have a branch with G3D10 at https://github.com/jackpoz/TrinityCore/tree/G3D10 but it still doesn't support ARM. I have been following through the years the G3D developement across all their "new" repos (sourceforge -> self-hosted -> CodePlex -> self-hosted -> sourceforge xD) but their code structure seems to have changed quite much. It might be easier to just patch the current G3D 9 instead of trying to upgrade it (I don't even know if they support ARM in latest revision).

We tried replacing the https://github.com/TrinityCore/TrinityCore/tree/3.3.5/dep/SFMT library with https://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine but the performance slowdown wasn't worth the change. SFMT is not really a library we update.

(btw funny thing, I checked if I can fire up a ARM VM on Azure but I got only search results about "Azure Resource Manager" xD I guess the world is running out of 3-letters abbreviations).

I remember you worked to port AnothEr project to ARM, how did that go ?

By the way just for reference, we have https://github.com/TrinityCore/TrinityCoreCustomChanges/wiki also where we host every kind of interesting fork that we prefer not to merge. There are some branches and also some links to other forks maintained by other people outside of TC org, let me know if you'd like to add a project to that list too :)

Hai @jackpoz - by now it is 2 other projects - porting worked just fine. I even got the first one to run on Windows on ARM!

G3D only requires minimal changes. These are:

  • cpuid
  • cdecl
  • atomic integer math

I have a patch ready in my fork - but largely untested anywhere but Linux.
My approach was to remove the x86 assembler, and fall back to compiler builtins.
For msvc, Interlocked, and for gcc there are __sync_, which are also supported by llvm.

SFMT however is a bigger project. I am looking at:
https://github.com/MersenneTwister-Lab/SFMT/

This one uses SIMD instructions on arm, ppc and x86; and even has a fall-back path to a C implementation.
Updating the version in TC however is a pain, I will see if I can come up with a prototype.

So, I got it to compile with version 1.5.1 of SFMT that I linked above, but not runtime tested yet.
I do not know if updating sfmt will give x86 users any benefits at all at this point.

Is there a test case and/or benchmark for the TrinityCore SFMTRand class, especially its BRandom() - since all other methods call BRandom() internally;
That way we could put numbers on this update for x86.

SFMT v 1.5.1 des come with a small benchmark - here are some results for future reference with Mersenne exponent 19937 (./test-*-M19937 -s):

# 1. Ryzen 7 1700X - C
32 bit BLOCK:109ms for 100000000 randoms generation
32 bit SEQUE:176ms for 100000000 randoms generation
64 bit BLOCK:110ms for 50000000 randoms generation
64 bit SEQUE:148ms for 50000000 randoms generation

# 2. Ryzen 7 1700X - SSE2
32 bit BLOCK:43ms for 100000000 randoms generation
32 bit SEQUE:102ms for 100000000 randoms generation
64 bit BLOCK:43ms for 50000000 randoms generation
64 bit SEQUE:73ms for 50000000 randoms generation

# 3. Marvell Armada 8040 - C
32 bit BLOCK:226ms for 100000000 randoms generation
32 bit SEQUE:452ms for 100000000 randoms generation
64 bit BLOCK:226ms for 50000000 randoms generation
64 bit SEQUE:352ms for 50000000 randoms generation

# 4. Marvell Armada 8040 - NEON
32 bit BLOCK:242ms for 100000000 randoms generation
32 bit SEQUE:416ms for 100000000 randoms generation
64 bit BLOCK:242ms for 50000000 randoms generation
64 bit SEQUE:316ms for 50000000 randoms generation

# 5. Marvell Armada 388 - C
32 bit BLOCK:703ms for 100000000 randoms generation
32 bit SEQUE:1110ms for 100000000 randoms generation
64 bit BLOCK:703ms for 50000000 randoms generation
64 bit SEQUE:859ms for 50000000 randoms generation

# 6. Marvell Armada 388 - NEON
32 bit BLOCK:377ms for 100000000 randoms generation
32 bit SEQUE:826ms for 100000000 randoms generation
64 bit BLOCK:377ms for 50000000 randoms generation
64 bit SEQUE:607ms for 50000000 randoms generation

So the differences aren't prohibitively large imo, especially taking the clocks and wattage into account.

Another small update:
Now I can also report that the server actually works on aarch64. Tested by entering world with a character :) - on master branch.

I will prepare a PR with those changes I am most confident about (g3d) - for further discussion.

Was this page helpful?
0 / 5 - 0 ratings