Go: cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported

Created on 12 Aug 2020 · 40Comments · Source: golang/go

What version of Go are you using (`go version`)?

$ go version
go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output



$ go env

GO111MODULE=""

GOARCH="amd64"

GOBIN=""

GOEXE=""

GOFLAGS=""

GOHOSTARCH="amd64"

GOHOSTOS="linux"

GOINSECURE=""

GONOPROXY=""

GONOSUMDB=""

GOOS="linux"

GOPRIVATE=""

GOPROXY="https://proxy.golang.org,direct"

GOSUMDB="sum.golang.org"

GOTMPDIR=""

GCCGO="gccgo"

AR="ar"

CC="gcc"

CXX="g++"

CGO_ENABLED="1"

GOMOD=""

CGO_CFLAGS="-g -O2"

CGO_CPPFLAGS=""

CGO_CXXFLAGS="-g -O2"

CGO_FFLAGS="-g -O2"

CGO_LDFLAGS="-g -O2"

PKG_CONFIG="pkg-config"

GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build668753232=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Assemble the following ARM64 assembly file:

TEXT foo(SB),0,$0-0
        FLDPQ (R0), (F0, F1)
        FSTPQ (F0, F1), (R0)

What did you expect to see?

An object file with instructions that disassemble to the GNU-style instructions

   0:   ad400400    ldp q0, q1, [x0]
   4:   ad000400    stp q0, q1, [x0]

What did you see instead?

$ go tool asm test.s
test.s:2: unrecognized instruction "FLDPQ"
test.s:3: unrecognized instruction "FSTPQ"
asm: assembly of test.s failed

It appears that only FLDPS and FLDPD are supported. Going through the instruction lists, it appears that no FP instructions with 128 bit instruction size seem to be supported at all. It would be great to have them added as I need them for a project.

FeatureRequest NeedsInvestigation arch-arm64

Source

clausecker

👍1

Most helpful comment

Thanks, will do so. I'll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I'm going to need.

clausecker on 14 Aug 2020

👍4

All 40 comments

I found a bunch more issues with the assembler. Is it okay if I make a large issue just gathering all of them?

clausecker on 12 Aug 2020

👍1

Some more valid (I hope) instructions that don't encode:

VLD1R.P 1(R1), [V9.B8] // displacement 8 is incorrectly accepted
FMOVD $0x8040201008040201, F10 // won't generate a literal pool load, won't encode
VCMTST V8.B8, V9.B8, V9.B8 // unknown instruction
VSUB V10.B8, V9.B8, V10.B8 // invalid rrr, VADD works

There's probably a bunch more. Right now, almost every single SIMD instruction I want to use is broken in the assembler. This is not a sustainable state of affairs.

clausecker on 12 Aug 2020

👍3

@fuzxxl Thank you for the observation.
Yes, the current go assembler on ARM64 is not complemented. We added assembler support for most instructions because these instructions need to be used. Can you tell us which instructions you want to use now? We are happy to enable them, or you can try to do it if you are interested. 😊

zhangfannie on 13 Aug 2020

@zhangfannie Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start. I have not written the code yet, so I can't tell for sure what exactly it is going to involve. If you like, I can try to implement the algorithm using the GNU assembler and then tell you what instructions I used. How does that sound?

clausecker on 13 Aug 2020

👍1

Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start.

Please don't take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for _all XXX instructions_ are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.

davecheney on 14 Aug 2020

❤2

Thanks, will do so. I'll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I'm going to need.

clausecker on 14 Aug 2020

👍4

So, I've now sketched out a prototype using the UNIX assembler. The following new instructions are required:

VEOR (possibly already implemented)
VBIT
VBSL
VLD1 (possibly already implemented)
VLD1.P (might already be implemented)
FMOVD $imm64, Fr (translates to a load from a literal pool; having FMOVQ too would be nice)
VMOVI (seems like that one is already implemented)
VLD1R.P (implemented, but incorrectly; cf. example above)
VCMTST
VSUB (already implemented but defective)
VAND (possibly already implemented)
VUSHR (possibly already implemented)
VSHL (possibly already implemented)
UADDLV (possibly already implemented)
VMOV Vx.y[z], Vx.y[z] (possibly already implemented)
VUXTL
VUXTL2
VADD (possibly already implemented)
VST1 (possibly already implemented)
PRFM (possibly already implemented; nice to have)

Especially important is the literal-pool FMOVD $imm64 as I can't replace that one with a WORD directive or similar due to the impossibility of manipulating literal pools directly.

clausecker on 19 Aug 2020

👍1

@fuzxxl We will enable the above instructions ASAP. 🙂

zhangfannie on 19 Aug 2020

👍1

Thanks! Very much appreciated.

clausecker on 19 Aug 2020

@fuzxxl The current go assembler have enabled _FMOV (scalar, immediate)_ instruction, the go syntax is FMOVS and FMOVD. And according to ARM reference, there are no instructions like FMOVQ .

zhangfannie on 19 Aug 2020

The instruction I want is listed in the section “LDR (literal, SIMD&FP)” of the ARM reference:

LDR <St>, label
LDR <Dt>, label
LDR <Qt>, label

here, label has a PC-relative addressing mode referring to a literal in a nearby literal pool. These instructions should therefore be translated to Go syntax as

FMOVS $imm32, Fr
FMOVD $imm64, Fr
FMOVQ $imm128, Fr

The assembler needs to place the immediate in a nearby literal pool and substitute its address.

Of course, the assembler is free to use the “FMOV (scalar, immediate)” variant if the immediate fits. Similarly, it should translate these to movi or mvni if the immediate is of the appropriate form.

The desired usage is to load a bit mask into a SIMD register:

FMOVD $0x8040201008040201, F0

Perhaps the mnemonics MOVS, MOVD, and MOVQ might indeed be better here as the instructions do not directly deal with floating point numbers.

clausecker on 19 Aug 2020

👍1

Change https://golang.org/cl/249757 mentions this issue: cmd/asm: fix the error of checking the post-index offset of VLD[1-4]R instructions of arm64

gopherbot on 21 Aug 2020

Change https://golang.org/cl/249758 mentions this issue: cmd/internal/obj/arm64: enable some SIMD instructions

gopherbot on 21 Aug 2020

Thanks for the changes. You made a typo in changeset 249758, file asm7.go:5111:

    case 101: // /FMOVS/FMOVD/FMOVQ(liternal, SIMD&FP)

clausecker on 21 Aug 2020

I will modify it. 😅 Thank you.

zhangfannie on 21 Aug 2020

Due to some further optimisations in the code, I might also need the following instructions:

VUZP1 and VUZP2
VUXTL and VUXTL2 (or their generic variants, USHLL and USHLL2)
VADDP (might already be implemented)
VBIF

It would be great if you could add them, too. I hope this covers all of them.

Thanks for all your great help! I'm feeling like we're getting somewhere with this.

clausecker on 25 Aug 2020

@fuzxxl You are welcome. We will enable the assembler's support for these instructions ASAP. 🙂

zhangfannie on 26 Aug 2020

👍1

Thanks a lot for your hard work!

clausecker on 26 Aug 2020

Change https://golang.org/cl/251519 mentions this issue: cmd/internal/obj/arm64: add support for arm64 instruction FLDPQ and FSTPQ

gopherbot on 31 Aug 2020

Change https://golang.org/cl/253659 mentions this issue: cmd/asm: add more SIMD instructions on arm64

gopherbot on 9 Sep 2020

@zhangfannie I've tried your new instructions, thanks a lot! One thing I noticed is that FMOVD and FMOVQ take a vector register for an argument (e.g. V0) despite having an F prefix in the mnemonic. Taking a FP register (e.g. F0) might have been more idiomatic. I also noticed that it is unfortunately not possible to specify 128 bit immediates. For example,

FMOVQ $0x123456789abcdef0123456789abcdef, V0

fails to assemble with an “immediate out of range” message. And even if I choose a suitably sized immediate like

FMOVQ $0x123456789abcdef0, V0

the assembler only generates a 64 bit literal in the literal pool despite this instruction reading 128 bit from the literal pool. I would be great to have this improved.

clausecker on 11 Sep 2020

@fuzxxl For the issue of "immediate out of range". This issue is reported by strconv.ParseUint(str, 0, 64) called by p.atoi() in the process of parsing immediate. The current go only supports bit sizes 0, 8, 16, 32 and 64, if bit size is below 0 or above 64, an error is returned (Please see https://golang.org/pkg/strconv/#ParseUint). If we want to parse a 128 bit immediate, we need to support 128 bit sizes. @cherrymui Can we do it? Thank you.

zhangfannie on 14 Sep 2020

Go doesn't have 128-bit integers (for now), so I don't think we want to do that. Instead, can we let it take two 64-bit operands, for the high and low parts separately? E.g. FMOVQ $1, $2, V3 is V3 = 1<<64 + 2 ? This doesn't look as nice but maybe fine.

Or, we only support loading from memory, and let user write large constants to memory, something like FMOVQ x(SB), V3 with

DATA x+0(SB)/8, $lo64
DATA x+8(SB)/8, $hi64
GLOBL x(SB),RODATA,$16

GNU assembler doesn't seem to support directly loading large constant either. It needs a label. We could use a global data symbol.

cherrymui on 14 Sep 2020

@cherrymui The problem with a global symbol (as opposed to a literal pool) is that instructions are wasted generating the address of the global symbol. This incurs needless cost in high-performance code. Having two separate operands would be fine with me and I suppose it's a good alternative. Another idea would be to support a form that loads from a local label such that the programmer can manually create a literal pool.

Any resolution on the inconsistency of using V0 instead of F0 for the register we load into? After all, the instruction mnemonic indicates an FP-type instruction, not a SIMD-type instruction.

clausecker on 14 Sep 2020

How many instructions are there like this, where it takes a constant that doesn't fit into 64-bit? Is this the only case or there are still many? I'm less sure about supporting manual constant pool management, mainly for consistency (there have been several instructions that use the constant pool managed by the assembler).

I agree that the line between FP instruction and vector instruction is blurry. Does this instruction have anything specific to FP? If not, maybe we should just use VMOVQ (or VMOV) and take a V register?

cherrymui on 14 Sep 2020

@cherrymui I am unaware of any other instructions operating on 128 bit constants in a literal pool. I think having to write this one with two immediates is a workable choice.

The instruction is coded as taking a FP-style register in the ARM reference manual, so FMOVQ is consistent with that choice. I am inclined to say that both VMOVQ $imm, V0 and FMOVQ $imm, F0 should be supported as aliases to establish consistency in both SIMD and FP code and to make writing macros easier. In any case, the same choice should be made for literal-pool-loading FMOVD, too (this one appears to be somewhat more common in FP code for loading float64 constants, but it's also useful in vector code). It's not my call to make though.

clausecker on 14 Sep 2020

@cherrymui @clausecker Thank you for the solution. I will post a patch to write 128 bit with two immediates.

zhangfannie on 15 Sep 2020

👍1

@cherrymui @clausecker I spent some time trying to find a SIMD-FP instruction to do the "Dn << 64 + Dm = Qd" operation we need. Unfortunately, I did not find such an instruction 😥. Do you have any ideas? Thank you.

zhangfannie on 15 Sep 2020

@zhangfannie I'm not sure what you mean. The idea was to have the literal-loading form of FMOVQ be

    FMOVQ $foo, $bar, F0

where foo and bar will be placed in a nearby literal pool such that they are adjacent. This then compiles to something like

    FMOVQ pool(PC), F0
    ...
pool:
    DWORD $foo
    DWORD $bar

Or perhaps I misunderstood something here?

clausecker on 15 Sep 2020

👍1

@clausecker Sorry, my understanding is wrong. The patch will be submitted soon. Thank you for the explanation. 😊

zhangfannie on 16 Sep 2020

@cherrymui @zhangfannie While we're add it, I have difficulty understanding what the Go syntax for a load or store instruction like these is:

ldr q0, [r5, #32]
str q0, [r5, #32]

I'm sure they are implemented (at least I hope), but neither of these variants seem to work:

FMOVQ 32(R5), F0
VMOVQ 32(R5), V0
MOVQ 32(R5), F0
MOVQ 32(R5), V0

Have I missed anything?

clausecker on 17 Sep 2020

FMOVQ/VMOVQ seem the right syntax (MOVQ is not). They don't seem to be implemented yet, though. For now, FMOVQ only supports constant source operand, and VMOVQ does not exist. (We have VMOV, but that is currently used only for moves between V registers.)

cherrymui on 17 Sep 2020

@cherrymui Being able to load/store SIMD and FP registers would be a useful addition I suppose. I can use VLDR1 in many cases, but the addressing mode is pretty restrictive.

clausecker on 17 Sep 2020

I agree it is useful. It just needs to be implemented.

cherrymui on 17 Sep 2020

@cherrymui

Thanks! If you're adding this one, could you also add VBIC and VUSRA? It turns out these ones can be used to shave off some instructions in my code.

clausecker on 17 Sep 2020

@clausecker We will do it. Thank you.

zhangfannie on 18 Sep 2020

🚀1

@zhangfannie Super cool! Reading the manual further, I believe VSRI, VSLI, VUADDW, and VUADDW2 might be very helpful, too.

clausecker on 18 Sep 2020

Change https://golang.org/cl/255900 mentions this issue: cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64

gopherbot on 21 Sep 2020

@clausecker The patch is ready and is under internal review. I will submit it as soon as possible. Thank you.

zhangfannie on 29 Sep 2020

🚀1

Change https://golang.org/cl/258397 mentions this issue: cmd/asm: add several arm64 SIMD instructions

gopherbot on 30 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

proposal: cmd/vet: vet should warn when time.Time type (or types embed it) is used as map keys.

go101 · 3Comments

encoding/csv: Incorrectly parse records with \r as the record separator

ajstarks · 3Comments

cmd/go: go get extremely slow with no feedback by default

Miserlou · 3Comments

cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64

rsc · 3Comments

hcigvjrjir

natefinch · 3Comments

Go: cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 40 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?