Zig: inline assembly improvements

Created on 18 Nov 2016 · 19Comments · Source: ziglang/zig

This inline assembly does exit(0) on x86_64 linux:

    asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (60),
            [arg1] "{rdi}" (0)
        : "rcx", "r11");

Here are some flaws:

60 and 0 are number literals and need to be casted to a type to be valid. This causes an assertion failure in the compiler if you don't cast the number literals. Assembly syntax should include types for inputs.
[number], [arg1], [ret] unused, and that is awkward.
need multiple return values (see #83)
do we really need this complicated restraint syntax? maybe we can operate on inputs and outputs.
let's go digging into some real world inline assembly code to see the use cases.
~when we get errors from parsing assembly, we don't attach them to the offset from within the assembly string.~ #2080

proposal

Source

andrewrk

👍1

Most helpful comment

Another avenue to go down is the MSVC way of doing inline assembly. M$ does a smart augmented assembly, where you can transparently access C/C++ variables from the assembly. An example would be a memcpy implementation:

void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
    __asm {
        mov rsi, Src
        mov rdi, Dst
        mov rcx, Length
        rep movsb
    }
}

It provides a really nice experience. However, MSVC isn't smart about the registers, so _all_ registers used are backed up to the stack before emitting the assembly, and are then restored after the assembly. This avoids the mess of having to specify cluttered registers, but at the cost of a fair bit of performance.

The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well.

kiljacken on 9 Dec 2016

👍3

All 19 comments

One idea:

const result = asm volatile ("rax" number: usize, "rdi" arg1: usize, "rcx", "r11")
    -> ("rax" ret: usize)  "syscall" (60, 0);

This shuffles the syntax around and makes it more like a function call. Clobbers are extra "inputs" that don't have a name and a type. The register names are still clunky.

This proposal also operates on the assumption that all inline assembly can operate on inputs and outputs.

andrewrk on 18 Nov 2016

@ofelas can I get your opinion on this proposal?

andrewrk on 18 Nov 2016

Right, you really made me thinkg here, haven't done that much asm in zig yet, here are a few that I've used on x86, they primarily struggle with the issue of multiple return values, the below examples may not be correct, I always end up spending some time reading the GCC manuals when doing inline asm in C, it isn't always straight forwards.

I just skimmed through the discussion over at Rust users and Rust inline assembly, they seem to have similar discussions and it seems that the asm feature may not be used that much. If you really need highly optimized or complex asm wouldn't you break out to asm (or possibly llvm ir)?

I guess what we have to play with is what LLVM provides, at least as long as zig has a tight connection to it (It seems there are discussions on also supporting Cretonne in Rust according to the LLVM Weekly).

With the above proposal would I write the PPC eieio (and isync, sync) like this _ = asm volatile () -> () "eieio" (); and old style _ = asm volatile ("eieio");? This may typically be available as an intrinsic barrier, I guess. Think I read somewhere that the _ would be the same as Nims discard, it may not be needed as this asm didn't return anything.

Not sure I answered you question...

inline fn rdtsc() -> u64 {
    var low: u32 = undefined;
    var high: u32 = undefined;
    // ouput in eax and edx, could probably movl edx, fingers x'ed...
    low = asm volatile ("rdtsc" : [low] "={eax}" (-> u32));
    high = asm volatile ("movl %%edx,%[high]" : [high] "=r" (-> u32)); 
    ((u64(high) << 32) | (u64(low)))
}

The above obviously is a kludge, I initially hoped to write it that more like this, it does however feel strange having to specify the outputs twice, both lhs and inside the asm outputs, with the potential of mixing the order which may be important.

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    var low: u32 = undefined;
    var high:u32 = undefined;
    low, high = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

Or possibly like this, not having to undefined/zeroes/0 the output only parameters;

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    const (low: u32, high: u32) = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

I've also tinkered with the cpuid instruction which is particularly nasty;

inline fn cpuid(f: u32) -> u32 {
    // See: https://en.wikipedia.org/wiki/CPUID, there's a boatload of variations...
    var id: u32 = 0;
    if (f == 0) {
        // Multiple outputs (as an ASCII string) which we mark as clobbered and just leave untouched
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f) : "ebx", "ecx", "edx");
    } else {
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f));
    }
}

ofelas on 18 Nov 2016

With the proposal, rdtsc would look like this in zig:

fn rdtsc() -> u64 {
    const low, const high = asm () -> ("eax" low: u32, "edx" high: u32) "rdtsc" ();
    ((u64(high) << 32) | (u64(low)))
}

This seems like an improvement.

cpuid with the proposal. I propose that instead of naming the function after the assembly instruction, we name it after the information we want. So let's choose one of the use cases, get vendor id.

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   *a, *b, *c = asm () -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32) "cpuid" ();
}

Once again volatile not necessary here. cpuid doesn't have side effects, we only want to extract information from the assembly.

So far, so good. Any more use cases?

andrewrk on 18 Nov 2016

❤1

Yes, that ain't too shabby, so with the correct input in eax it is;

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   // in eax=0, out: eax=max accepted eax value(clobbered/ignored), string in ebx, ecx, edx
   *a, *b, *c = asm ("eax" func: u32) -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32, "eax") "cpuid" (0);
}

Would something like this be possible, ignoring my formatting?

result = asm ( // inputs
        "=r" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        "=r" res: u8 = result,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res)
        // multiline asm string
        \\movq %[count], %rcx
        \\1:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
        // args/parameters
        (count, &left, &right, result);

ofelas on 18 Nov 2016

👍1

Yes, that ain't too shabby, so with the correct input in eax it is;

Ah right, nice catch.

I like putting the values of the inputs above as you did. Then we don't need them below.

Is the count arg necessary to have the movq instruction? seems like we could pass that as a register.

And then finally result should be an output instead of an input right?

So it would look like this:

const result = asm ( // inputs
        "{rcx}" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res: u8)
        // multiline asm string
        \\1b:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
);

This is a good example of why we should retain the constraint syntax, since we might want {rcx} or =r.

andrewrk on 19 Nov 2016

Not too familiar with the x86 asm, I nicked that example from the Rust discussions, in this case rcx (and ecx i 32 bit) is a loop counter somewhat similar to ctr on Power PC. So the movq, decq, jnz drives the loop. So as long at that condition is met it probably doesn't matter. Maybe it could have been done with the loop instruction that decrements and tests at the same time.

result is both an input and an output, like if you were updating a cksum or similar where you would feed in an initial or intermediate value that you want to update.

Are you planning to support all the various architecture specific input/output/clobber constraints and indirect inputs/outputs present in LLVM?

ofelas on 19 Nov 2016

void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
    __asm {
        mov rsi, Src
        mov rdi, Dst
        mov rcx, Length
        rep movsb
    }
}

The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well.

kiljacken on 9 Dec 2016

👍3

As kiljacken says, I personally really, _really_ enjoy the Intel syntax over GAS as D has done it (except for GDC, which is based on GCC). I'm only assuming it'll be harder to implement a MSVC-styled inline assembly feature.

dd86k on 19 Oct 2017

The end game is we will have our own assembly syntax, like D, which will end up being compiled to llvm compatible syntax. It's just a lot of work.

I at first tried to use the Intel syntax but llvm support for it is buggy and some of the instructions are messed up to the point of having silent bugs.

andrewrk on 19 Oct 2017

👍2 👎1

Points 1 and 2 in the OP seem to be solved.

SamTebbs33 on 31 Aug 2019

OUTDATED

This has been split off into #5241. This comment will no longer be updated.

New Inline Asm Syntax

asm (arches) (bindings, clobbers) (:return_register|void|noreturn) { local_labels body } (else ...)? + config? (somewhere)

Arches

An optional list of target architectures. If this is null, the block is assumed to be for all architectures (an assembler error is always a compile error). Otherwise, one of these must match builtin.arch, or an else branch must be present. This is a list rather than a single value as some architectures have mutually compatible subsets (e.g. 8086/x86/x86_64, MIPS/RISC-V).

Bindings and Clobbers

Bindings have the form "register" name: type = init_value. name can be _, if the register is desired only for initialisation. name can also be a variable in scope, in which case type and init_value are omitted, and changes to this register's value are taken as changes to the variable. init_value can be undefined, in which case type can be omitted (it doesn't matter much in assembly anyway), unless name is the return register (more on this later). Clobbers are simply "register".

Return Register

A binding can be nominated as the return value, with :name. (Allowing :"register" would cause parsing ambiguity, and this can be trivially done with a binding anyway.) void and noreturn are also allowed. Reaching the end of a noreturn block is safety-checked UB.

Local Labels

A list of local labels. Formatted as strings.

Local labels are unique to the block: %(label) matches %(label) within the block, and is guaranteed not to match anything else in the program. They are listed within the braces of the body because they really don't make sense outside that context.

Body

The assembly code itself, as a string. If this fails to assemble, it's a compile error.

The following macros are defined:

%[name]
Register, as specified in bindings section.
%(label)
Label, as listed in local labels section.
@[variable]
Pre-mangled global variable name. Used to reference globals. See #5211.
@(function)
Pre-mangled function name. Used to call functions. See #5211.

A literal % or @ is escaped with another one: %% or @@. Strictly speaking, if we're substituting text, only one of @[] and @() is needed -- but, if we want to integrate the assembler with the compiler, the distinction may be important, so I've listed both.

Else

If arches is non-null and none of the listed architectures match builtin.arch, this is compiled instead. Can be used to switch on architectures, optimise a specific architecture only, or simply @compileError. If this is not present, a target mismatch is a compile error.

N.B.: An else branch is only allowed if arches is non-null. This decision was made because, when you set arches to null, either you know execution will never reach this point on the wrong architecture, or you only care about compiling for a specific architecture. In the former case, you definitely want an unexpected architecture to be a compile error; and in the latter, to support a new architecture, the laziest thing you can do is start caring.

Config

Configuration is passed in a pragma (#5239) with the following fields:

impure
This block has side effects.
stack(n)
This block allocates n bytes on the stack. Defaults to 0.
calls(funcs)
This block calls the functions listed in funcs. Defaults to .{}.

Example

const builtin = @import("builtin");

const fib_asm = fn (n: u32) u32 {
    return asm (.{ builtin.Arch.riscv64, builtin.Arch.riscv32 }) @{
        stack(12),
        calls(.{ fib_iter }),
    } (
        "a0" this  : u32 = 0,
        "a1" next  : u32 = 1,
        "=r" to_go : u32 = n,
    ) :this {
        .{ "loop", "end" }

        \\%(loop):
        \\  bez %[to_go], %(end)
        // We can do function pro/epi at callsite!
        \\  addi sp, -12
        \\  sd ra, 0(sp)
        \\  sw %[to_go], 8(sp)
        \\  call @(fib_iter)
        \\  lw %[to_go], 8(sp)
        \\  ld ra, 0(sp)
        \\  addi sp, 12
        \\  addi %[to_go], -1
        \\  j %(loop)
        \\%(end):
    } else @compileError("Your machine could be better");
};

// Actually returns two values, but the compiler has no way to express that
const fib_iter = fn @{callconv(.Naked)} (this: u32, next: u32) void {
    // No need to check architecture -- we'll only call this from fib_asm
    asm (null) @{impure} (
        "a0" this,
        "a1" next,
        "=r" temp = undefined,
    ) void {
        .{}

        \\  add %[temp], %[this], %[next]
        \\  mv %[this], %[next]
        \\  mv %[next], %[temp]
    };
};

TL;DR: Benefits over Status Quo

If any of the sections are missed, the compiler can detect exactly which ones
Order of mandatory components has a logical progression, just like function declaration
Option to tie to target architecture
Registers have types
Can express non-returning and valueless assembly
Can reference global variables and call functions
Won't unexpectedly jump to random points in the program
Communicates metadata to compiler, but does not require it
Provides alternative for unsupported architectures
Can be automatically distinguished from status quo, albeit with some lookahead
Can be automatically derived from status quo

EleanorNB on 1 May 2020

Ok, sorry, I changed it. I can't help it, I'm a perfectionist.

EleanorNB on 1 May 2020

Ok, it's a living document. I'll admit it.

EleanorNB on 1 May 2020

I've split it off into its own issue. See above.

EleanorNB on 1 May 2020

Hey @andrewrk -- given the emphasis on stabilisation in this release cycle, should we take the time to get this right now, so we're not stuck with it forever?

EleanorNB on 8 May 2020

Hey, I did a fairly major rework of #5241 recently. Now there's a more powerful constraint syntax.

EleanorNB on 10 May 2020

Possible inspiration from Rust: New inline assembly syntax available in nightly

andrewrk on 9 Jun 2020

👍1

For those who want to look further into that, there's more here.

There's a lot of good stuff there. The two deal-breakers for me are contextually repurposed syntax (out is not a function, reg is not a variable) and behind-the-scenes non-configurable action (assigning outputs). I've updated #5241 with the good stuff.

EleanorNB on 10 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

question: async/await and threadpools for fs/cpu work

jorangreef · 3Comments

cimport fails to escape characters correctly for zig

daurnimator · 3Comments

zig fmt deletes comments starting with empty comment

fengb · 3Comments

proposal: rename List to Vector in standard library

andrewrk · 3Comments

make @mulAdd support integers, comptime integers, and comptime floats, and no explicit type parameter

andrewrk · 3Comments