Zig: inline assembly improvements

Created on 18 Nov 2016  路  19Comments  路  Source: ziglang/zig

This inline assembly does exit(0) on x86_64 linux:

    asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (60),
            [arg1] "{rdi}" (0)
        : "rcx", "r11");

Here are some flaws:

  • 60 and 0 are number literals and need to be casted to a type to be valid. This causes an assertion failure in the compiler if you don't cast the number literals. Assembly syntax should include types for inputs.
  • [number], [arg1], [ret] unused, and that is awkward.
  • need multiple return values (see #83)
  • do we really need this complicated restraint syntax? maybe we can operate on inputs and outputs.
  • let's go digging into some real world inline assembly code to see the use cases.
  • ~when we get errors from parsing assembly, we don't attach them to the offset from within the assembly string.~ #2080
proposal

Most helpful comment

Another avenue to go down is the MSVC way of doing inline assembly. M$ does a smart augmented assembly, where you can transparently access C/C++ variables from the assembly. An example would be a memcpy implementation:

void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
    __asm {
        mov rsi, Src
        mov rdi, Dst
        mov rcx, Length
        rep movsb
    }
}

It provides a really nice experience. However, MSVC isn't smart about the registers, so _all_ registers used are backed up to the stack before emitting the assembly, and are then restored after the assembly. This avoids the mess of having to specify cluttered registers, but at the cost of a fair bit of performance.

The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well.

All 19 comments

One idea:

const result = asm volatile ("rax" number: usize, "rdi" arg1: usize, "rcx", "r11")
    -> ("rax" ret: usize)  "syscall" (60, 0);

This shuffles the syntax around and makes it more like a function call. Clobbers are extra "inputs" that don't have a name and a type. The register names are still clunky.

This proposal also operates on the assumption that all inline assembly can operate on inputs and outputs.

@ofelas can I get your opinion on this proposal?

Right, you really made me thinkg here, haven't done that much asm in zig yet, here are a few that I've used on x86, they primarily struggle with the issue of multiple return values, the below examples may not be correct, I always end up spending some time reading the GCC manuals when doing inline asm in C, it isn't always straight forwards.

I just skimmed through the discussion over at Rust users and Rust inline assembly, they seem to have similar discussions and it seems that the asm feature may not be used that much. If you really need highly optimized or complex asm wouldn't you break out to asm (or possibly llvm ir)?

I guess what we have to play with is what LLVM provides, at least as long as zig has a tight connection to it (It seems there are discussions on also supporting Cretonne in Rust according to the LLVM Weekly).

With the above proposal would I write the PPC eieio (and isync, sync) like this _ = asm volatile () -> () "eieio" (); and old style _ = asm volatile ("eieio");? This may typically be available as an intrinsic barrier, I guess. Think I read somewhere that the _ would be the same as Nims discard, it may not be needed as this asm didn't return anything.

Not sure I answered you question...

inline fn rdtsc() -> u64 {
    var low: u32 = undefined;
    var high: u32 = undefined;
    // ouput in eax and edx, could probably movl edx, fingers x'ed...
    low = asm volatile ("rdtsc" : [low] "={eax}" (-> u32));
    high = asm volatile ("movl %%edx,%[high]" : [high] "=r" (-> u32)); 
    ((u64(high) << 32) | (u64(low)))
}

The above obviously is a kludge, I initially hoped to write it that more like this, it does however feel strange having to specify the outputs twice, both lhs and inside the asm outputs, with the potential of mixing the order which may be important.

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    var low: u32 = undefined;
    var high:u32 = undefined;
    low, high = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

Or possibly like this, not having to undefined/zeroes/0 the output only parameters;

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    const (low: u32, high: u32) = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

I've also tinkered with the cpuid instruction which is particularly nasty;

inline fn cpuid(f: u32) -> u32 {
    // See: https://en.wikipedia.org/wiki/CPUID, there's a boatload of variations...
    var id: u32 = 0;
    if (f == 0) {
        // Multiple outputs (as an ASCII string) which we mark as clobbered and just leave untouched
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f) : "ebx", "ecx", "edx");
    } else {
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f));
    }
}

With the proposal, rdtsc would look like this in zig:

fn rdtsc() -> u64 {
    const low, const high = asm () -> ("eax" low: u32, "edx" high: u32) "rdtsc" ();
    ((u64(high) << 32) | (u64(low)))
}

This seems like an improvement.

cpuid with the proposal. I propose that instead of naming the function after the assembly instruction, we name it after the information we want. So let's choose one of the use cases, get vendor id.

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   *a, *b, *c = asm () -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32) "cpuid" ();
}

Once again volatile not necessary here. cpuid doesn't have side effects, we only want to extract information from the assembly.

So far, so good. Any more use cases?

Yes, that ain't too shabby, so with the correct input in eax it is;

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   // in eax=0, out: eax=max accepted eax value(clobbered/ignored), string in ebx, ecx, edx
   *a, *b, *c = asm ("eax" func: u32) -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32, "eax") "cpuid" (0);
}

Would something like this be possible, ignoring my formatting?

result = asm ( // inputs
        "=r" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        "=r" res: u8 = result,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res)
        // multiline asm string
        \\movq %[count], %rcx
        \\1:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
        // args/parameters
        (count, &left, &right, result);

Yes, that ain't too shabby, so with the correct input in eax it is;

Ah right, nice catch.

I like putting the values of the inputs above as you did. Then we don't need them below.

Is the count arg necessary to have the movq instruction? seems like we could pass that as a register.

And then finally result should be an output instead of an input right?

So it would look like this:

const result = asm ( // inputs
        "{rcx}" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res: u8)
        // multiline asm string
        \\1b:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
);

This is a good example of why we should retain the constraint syntax, since we might want {rcx} or =r.

Not too familiar with the x86 asm, I nicked that example from the Rust discussions, in this case rcx (and ecx i 32 bit) is a loop counter somewhat similar to ctr on Power PC. So the movq, decq, jnz drives the loop. So as long at that condition is met it probably doesn't matter. Maybe it could have been done with the loop instruction that decrements and tests at the same time.

result is both an input and an output, like if you were updating a cksum or similar where you would feed in an initial or intermediate value that you want to update.

Are you planning to support all the various architecture specific input/output/clobber constraints and indirect inputs/outputs present in LLVM?

Another avenue to go down is the MSVC way of doing inline assembly. M$ does a smart augmented assembly, where you can transparently access C/C++ variables from the assembly. An example would be a memcpy implementation:

void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
    __asm {
        mov rsi, Src
        mov rdi, Dst
        mov rcx, Length
        rep movsb
    }
}

It provides a really nice experience. However, MSVC isn't smart about the registers, so _all_ registers used are backed up to the stack before emitting the assembly, and are then restored after the assembly. This avoids the mess of having to specify cluttered registers, but at the cost of a fair bit of performance.

The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well.

As kiljacken says, I personally really, _really_ enjoy the Intel syntax over GAS as D has done it (except for GDC, which is based on GCC). I'm only assuming it'll be harder to implement a MSVC-styled inline assembly feature.

The end game is we will have our own assembly syntax, like D, which will end up being compiled to llvm compatible syntax. It's just a lot of work.

I at first tried to use the Intel syntax but llvm support for it is buggy and some of the instructions are messed up to the point of having silent bugs.

Points 1 and 2 in the OP seem to be solved.

OUTDATED

This has been split off into #5241. This comment will no longer be updated.

New Inline Asm Syntax

asm (arches) (bindings, clobbers) (:return_register|void|noreturn) { local_labels body } (else ...)? + config? (somewhere)

Arches

An optional list of target architectures. If this is null, the block is assumed to be for all architectures (an assembler error is always a compile error). Otherwise, one of these must match builtin.arch, or an else branch must be present. This is a list rather than a single value as some architectures have mutually compatible subsets (e.g. 8086/x86/x86_64, MIPS/RISC-V).

Bindings and Clobbers

Bindings have the form "register" name: type = init_value. name can be _, if the register is desired only for initialisation. name can also be a variable in scope, in which case type and init_value are omitted, and changes to this register's value are taken as changes to the variable. init_value can be undefined, in which case type can be omitted (it doesn't matter much in assembly anyway), unless name is the return register (more on this later). Clobbers are simply "register".

Return Register

A binding can be nominated as the return value, with :name. (Allowing :"register" would cause parsing ambiguity, and this can be trivially done with a binding anyway.) void and noreturn are also allowed. Reaching the end of a noreturn block is safety-checked UB.

Local Labels

A list of local labels. Formatted as strings.

Local labels are unique to the block: %(label) matches %(label) within the block, and is guaranteed not to match anything else in the program. They are listed within the braces of the body because they really don't make sense outside that context.

Body

The assembly code itself, as a string. If this fails to assemble, it's a compile error.

The following macros are defined:

  • %[name]
    Register, as specified in bindings section.
  • %(label)
    Label, as listed in local labels section.
  • @[variable]
    Pre-mangled global variable name. Used to reference globals. See #5211.
  • @(function)
    Pre-mangled function name. Used to call functions. See #5211.

A literal % or @ is escaped with another one: %% or @@. Strictly speaking, if we're substituting text, only one of @[] and @() is needed -- but, if we want to integrate the assembler with the compiler, the distinction may be important, so I've listed both.

Else

If arches is non-null and none of the listed architectures match builtin.arch, this is compiled instead. Can be used to switch on architectures, optimise a specific architecture only, or simply @compileError. If this is not present, a target mismatch is a compile error.

N.B.: An else branch is only allowed if arches is non-null. This decision was made because, when you set arches to null, either you know execution will never reach this point on the wrong architecture, or you only care about compiling for a specific architecture. In the former case, you definitely want an unexpected architecture to be a compile error; and in the latter, to support a new architecture, the laziest thing you can do is start caring.

Config

Configuration is passed in a pragma (#5239) with the following fields:

  • impure
    This block has side effects.
  • stack(n)
    This block allocates n bytes on the stack. Defaults to 0.
  • calls(funcs)
    This block calls the functions listed in funcs. Defaults to .{}.

Example

const builtin = @import("builtin");

const fib_asm = fn (n: u32) u32 {
    return asm (.{ builtin.Arch.riscv64, builtin.Arch.riscv32 }) @{
        stack(12),
        calls(.{ fib_iter }),
    } (
        "a0" this  : u32 = 0,
        "a1" next  : u32 = 1,
        "=r" to_go : u32 = n,
    ) :this {
        .{ "loop", "end" }

        \\%(loop):
        \\  bez %[to_go], %(end)
        // We can do function pro/epi at callsite!
        \\  addi sp, -12
        \\  sd ra, 0(sp)
        \\  sw %[to_go], 8(sp)
        \\  call @(fib_iter)
        \\  lw %[to_go], 8(sp)
        \\  ld ra, 0(sp)
        \\  addi sp, 12
        \\  addi %[to_go], -1
        \\  j %(loop)
        \\%(end):
    } else @compileError("Your machine could be better");
};

// Actually returns two values, but the compiler has no way to express that
const fib_iter = fn @{callconv(.Naked)} (this: u32, next: u32) void {
    // No need to check architecture -- we'll only call this from fib_asm
    asm (null) @{impure} (
        "a0" this,
        "a1" next,
        "=r" temp = undefined,
    ) void {
        .{}

        \\  add %[temp], %[this], %[next]
        \\  mv %[this], %[next]
        \\  mv %[next], %[temp]
    };
};

TL;DR: Benefits over Status Quo

  • If any of the sections are missed, the compiler can detect exactly which ones
  • Order of mandatory components has a logical progression, just like function declaration
  • Option to tie to target architecture
  • Registers have types
  • Can express non-returning and valueless assembly
  • Can reference global variables and call functions
  • Won't unexpectedly jump to random points in the program
  • Communicates metadata to compiler, but does not require it
  • Provides alternative for unsupported architectures
  • Can be automatically distinguished from status quo, albeit with some lookahead
  • Can be automatically derived from status quo

Ok, sorry, I changed it. I can't help it, I'm a perfectionist.

Ok, it's a living document. I'll admit it.

I've split it off into its own issue. See above.

Hey @andrewrk -- given the emphasis on stabilisation in this release cycle, should we take the time to get this right now, so we're not stuck with it forever?

Hey, I did a fairly major rework of #5241 recently. Now there's a more powerful constraint syntax.

Possible inspiration from Rust: New inline assembly syntax available in nightly

For those who want to look further into that, there's more here.

There's a lot of good stuff there. The two deal-breakers for me are contextually repurposed syntax (out is not a function, reg is not a variable) and behind-the-scenes non-configurable action (assigning outputs). I've updated #5241 with the good stuff.

Was this page helpful?
0 / 5 - 0 ratings