Zig: Proposal: Support nested anonymous struct and unions.

Created on 4 May 2018 · 23Comments · Source: ziglang/zig

Proposal: Support nested anonymous struct and unions

| Section | Value |
|-----------------|-----------------------------------------------------------------|
| Author: | Byron Heads |
| Implementation: | |
| Status: | Draft |

Abstract

A useful and common pattern found in a few other languages is the ability for structs and unions to contain anonymous structs and unions.

Example: Views into packed unions

const ARGBColor = packed union {
   bytes: u32,
   packed struct {
      a: u8,
      r: u8,
      g: u8,
      b: u8,
   };

  var color = ARGBColor{ .a = 0, .r = 255, .g = 0, .b = 255};
  color.r = 128;

  glFunction(color.bytes);
}

const VirtualRegister32 = packed union {
  F32: u32,
  packed struct {
      H16: packed union {
        F16 : u16
        packed struct {
          H8: u8,
          L8: u8
       },
       L16 = packed union {
         F16 : u16
         packed struct {
           H8: u8,
           L8: u8
      }
  }
};

car r1 = VirtualRegister32{.F32 = 0};

r.L16.L8 = 9;

Example: Struct OOP

struct base { 
  int type_id; 
  struct base *next;
}; 

 struct derived { 
  struct base; /* anonymous */ 
int a_new_field; ... };

Pros

Removes the need to name nested struts where the name is not meaningful
Can reduce the need to cast on structs and union types (especially when working with C)
Reduces the need to property functions

Cons

Could add confusion at the call site
Can be solved by casting and property functions
Can by solved by naming the nested struct

proposal

Source

bheads

👍6

Most helpful comment

I think the following issue forms an argument in this discussion but I do not for which side.
When I import a C header in zig with

use @cImport({
    @cInclude("rgb.h");
});

containing following structure:

typedef struct{
    union {
        struct {
            union {
                uint8_t r;
                uint8_t red;
            };
            union {
                uint8_t g;
                uint8_t green;
            };
            union {
                uint8_t b;
                uint8_t blue;
            };
        };
        uint8_t raw[3];
    };
}  CRGB ;

I would like to be able to interface with C libraries containing this features / headers,
since one of zigs selling points is the seamless integration with existing C libraries.

belse-de on 19 Dec 2019

👍5

All 23 comments

How would you access that struct though without adding confusion to the callsite? The following accomplishes what you are looking for and more?

const ARGBColor = packed union {
    bytes: u32,
    argb: packed struct {
      a: u8,
      r: u8,
      g: u8,
      b: u8,
    },
};

BraedonWooding on 5 May 2018

In C the parent struct/union basically adopts the members of the anonymous child struct/union.
I also find it usefull:

https://stackoverflow.com/questions/8932707/what-are-anonymous-structs-and-unions-useful-for-in-c11

sjpschutte on 5 May 2018

Yes, but my point was that it adds obfuscation to the call site (clarified original comment, a little); when you call .a is it a union to bytes and to .r, .g, .b or is it part of the structure that is a union to bytes. I just don't see how adding it helps anything? Having to clarify that you are talking about the struct argb doesn't add any noise and adds clarification, also since Zig allows methods in structs it complicates things :).

Like the following call; foo.bar.a is clear that a is a property of bar which is a union member in foo but foo.a removes that 'clearness' and isn't much more succinct; Maybe I'm just someone who likes it to be more explicit, but I don't get the gain here, for the loss of clarification?

BraedonWooding on 5 May 2018

The idea is c.argb.a doesn't add any new information. In this case I care about the elements of the color, and need to pass the entire color to say gl.

I don't think this code is hard to understand or that naming the struct makes a difference.

var c = Color{
  .a = 0, 
  .r = 43, 
  .g = 213,
  .b = 0
};

c.g = 99;

glFunc(c.bytes);

Vs 

c.argb.g=86;

bheads on 5 May 2018

Can someone give an example that is not solved by Zig's other features?
Having a bytes representation can be done without unions:

const Color = packed struct { r: u8, g: u8, b: u8 };
var c: Color = undefined;
const bytes = ([]u8)((&c)[0..]);

Ofc, you would probably have toBytes and asBytes functions.

As for another example from the stack overflow linked earlier:

typedef struct {
    bool is_float;
    union {
       float f;
       char* s;
    };
} mychoice_t;

Here, you should really use tagged unions in Zig:

const MyChoice = union(enum) {
    Float: f32,
    String: []const u8,
};

Before we can consider nested anonymous struct/unions we need to know the exact use-case. Arguing over example code that is solved by Zig's other features is not very productive. Link some Zig code that could be improved by this feature, or links some C code that is not easily mapped to Zig because of nested struct/unions. We want real world examples not "made up on the spot to get a feature into a language" examples.

Hejsil on 5 May 2018

👍3

Recently I used it for color and vector representations. Zig solves tagged unions nicely but this is for packed structs. I understand this is a qol enhancement, but casting, bit shifting and functions to do something trivial like accessing the bytes in the color value seems like a bad solution.

Here is a better example (typing code on a phone sucks). Here the two example color formatea both work with the foo function without having to know what order the format is in.

const ARGB = union {
  bytes : u32, 
  struct {
    a: u8, r:u8, g:u8, b:u8
  }
};

const RGBA = union {
  bytes: u32,
  struct {
    r: u8, g:u8, b: u8, a:u8
  }
}

fn foo(c: var) void {
  c.r *=2;
  glFoobar(c.bytes);
}

And yes the component struct could have a common name like bits or something but I don't find this ads more information just more typing since in most cases your manipulating the components and passing an array of them off to be rendered.

bheads on 5 May 2018

There are two glaring issues with the example code you gave; the first one being that the order is different therefore the value of bytes will be different, yet you are passing it into the same function which presumably wants them in the same order? I just can't think of a function that wouldn't care about the order of those 4 components, without you telling it the specific order??

That is "both work with the foo function without having to know what order the format is in." Makes no sense to me, its a colour; it matters what order they are in if you are setting a standard and I don't see how glfoobar could use an invariant standard without having to pass in a type which formulates to standard.

Regardless if we look pass this and pretend lets say that the order doesn't matter you could also just write;

const ARGB = packed struct {
    a: u8, r:u8, g:u8, b:u8
    pub fn bytes(self: &this) u32 {
        return @ptrCast(&u8, self)[0..4];
    }
};

const RGBA = packed struct {
    r: u8, g:u8, b: u8, a:u8
    pub fn bytes(self: &this) u32 {
        return @ptrCast(&u8, self)[0..4];
    }
}

fn foo(c: var) void {
  c.r *=2;
  glFoobar(c.bytes());
}

Note: I just did a translation, there are many things you could do to improve this its a rough idea though.

Keeping in mind that I'm presuming that your code is correct! Maybe double check glFoobar, because from my knowledge as I've said I can't think of one that was order invariant :).

BraedonWooding on 5 May 2018

Yes foo doesn't care about the byte order in this example, all it cared about was doubling the red color and passing it to a fake openGl function, that function would care about the order but that is not foos concern (foo assumes you passed the correct struct depending on the format of the openGl device. This is the beauty of generic code, ie why implement foo for each possible color format (that could include 24 bit colors too!). This is also assuming lots of interactions with C functions.

Having to casting seems wrong and the function unnecessary in that case. Does the compiler ensure its inlined? if not then you wont use the function in critical paths of code, and will end up always casting, and code with lots of casting == bugs.

Without this feature I would just name the nested struct, the union is to useful but is adding more to be remembered that doesn't add anything.

here is another version of the color object(note: I am not familiar with aligning in zig yet so might be wrong).

const ColorARGB= packed union align(4) {
  bytes : u32, 
  packed struct {
    a: u8,
    packed union align(1) {
       pure: u24,
       packed struct {
          r:u8,
          g:u8,
          b:u8
      }
    }
  },
  array: [4]u8,  
};

Vectors are also common:

const Vector = packed union align(16) {
  array: [4]f32,
  struct: {   // this can be named, but is adding more to remember
    x: f32,
    y: f32,
    z: f32,
    w: f32,
  },
  simd: f128,
}

bheads on 5 May 2018

I suggest you run some C code that does what you want; you'll find that the bytes value is NOT the same (you may have to mark the structs as volatile else C may optimise the order, though I sincerely doubt it). This is talking about the ARGB and RGBA example of course.

Your function doesn't 'care about order' as all it does is access r but as I said glFoo is the one that cares about it not foo. glFoo must care about the order of the bytes property else it must perform some superfluous operation, since you get given an array that is either [A, R, G, B] or [R, G, B, A] literally none of the members are in the same order. :). So by extension your function does care about order, but as I've said before the source of that is glFoo :).

Now onto some of your comments and your examples; I'll split it up so I can clarify a few things.

Having to casting seems wrong and the function unnecessary in that case.

Maybe have a look at what casting actually is in code to understand; raw data has no type, so when you cast typically you are indicating to the compiler what type you want it to pretend it is. So even if you use a union you are still casting; it is just up to whether or not you see the cast in your code or if it is behind the scenes. Unnecessary enough to add an entire new construct? I would say not, I would say it is even extra necessary to indicate what is actually going on.

Does the compiler ensure its inlined?

Different issue, not dependent on this; and yes it would/should.

if not then you won't use the function in critical paths of code, and will end up always casting, and code with lots of casting == bugs.

Why wouldn't you? I think your overvaluing the benefit of inlining. A function call is pretty minimal as all things go. Don't understand how this amounts to lots of casting?? You're performing a single cast per 'bytes' call, the same as if you had a flattened nested struct. I'll cover the fact that casting is often not actually carried out in assembly/machine code a little further below :).

As with your other examples I don't see any real benefit towards having them be non-nested? For example let's break down the vector one;

The simd bit follows same casting things as before. Though you'll have to align cast the Vector ptr to widen it to 4 than ptr cast it across to a f128, similar if you wanted to get a u32 from the colour.
- i.e. do const simd = @ptrCast(&f128, @alignCast(16, &self));
And to get it as an array you can just do ptrCast(&f32, &Vector)[0..4] which would give you a slice array of the contents, again I don't see any difference.

Of course in some cases casting will add more instructions for example when casting to a higher promoted type such as f128, and definately when you cast across bounds like floating to integer (though this isn't relevant when talking about ptr casting). So in some cases it'll add an instruction or two, but in reality those instructions would exist even if you just used a flattened nested struct.

Basically this issue seems to be at a standstill; to convince me (not speaking for everyone of course, but I would presume @Hejsil based on his comments), I would have to see there be a significant difference in some areas;

ease of calling, having a .simd() vs .simd is purely subjective, I personally prefer the function since you could even block it from editing it; and adding a whole new feature for a subjective choice like this is not a good enough reason.
a difference in the instructions generated
a safer way to do things.
A real world example!

Maybe instead we should add a small std module that helps you perform these casts safer such as std.cast.widenStructAsSingle(Vector) (super bad name) which would give you the f128 simd result, of course you would wrap this up in a nice function; but this would be error free as we would provide some nice compile time checks. Or maybe have a std.cast.toArray(Vector) which would return each struct member as an array. These would remove any chance for 'bugs' :), as well as accomplish what you are looking for.

BraedonWooding on 5 May 2018

Huh? If C unions dont keep the order, then unions are not safe to use. Maybe your talking about packing, with align all of these unions would work in C as intended. Also C/C++ dont rearrange the order of members. They can add padding between members based on alignment, but thats it. Without that you couldn't pass unions/structs around. So yes without align and packed the stuct of u8 may not align (though I dont know of any modern compiler that wouldnt pack that correctly), but anyone that is doing this kind of data packing optimization would understand this, or their code wouldnt work.

Function calls are expensive in critical path code (16ms is not a lot of time to update 1000s of vectors and color arrays), saving registers to the stack, and you risk cache miss while calling the function. In C/C++ your example functions would all be macros or use a compiler that inlines your code or errors if it cannot.

The purpose of the union is to not cast and not have the data modified at all. In the vector example you want that data to fit into a simd register and you want to avoid calling the unaligned asm, you also want to access each axis without bit shifting and casting all the time.

So currently just naming the components is really the alternative, this ask is a QOL enhancement.

Some examples from github projects:
https://github.com/arkanis/single-header-file-c-libs/blob/master/math_3d.h#L155
https://github.com/scoopr/vectorial/blob/master/include/vectorial/simd4f_sse.h#L30

bheads on 5 May 2018

Fairly certain the functions in question will in fact be inlined. You can enforce this by marking them as inline and get a compile error if for some reason that can't be done.

const warn = @import("std").debug.warn;

pub fn main() void
{
    const ARGB = packed struct {
            a: u8, r:u8, g:u8, b:u8,
    };
    const Color = extern union {
        bytes: u32,
        argb: ARGB,
    };

    var x = Color{ .argb = ARGB{.a = 0xDE, .r = 0xC0, .g = 0xEF, .b = 0xBE,},};

    warn("{X8}\n", x.bytes);
}

is functionally identical to

const warn = @import("std").debug.warn;

pub fn main() void
{
    const ARGB = packed struct {
        a: u8, r:u8, g:u8, b:u8,

        pub inline fn bytes(self: &this) &u32 {
            return @ptrCast(&u32, @alignCast(@alignOf(u32), self));
        }
    };

    var x = ARGB{.a = 0xDE, .r = 0xC0, .g = 0xEF, .b = 0xBE,};

    warn("{X8}\n", *x.bytes());
}

tgschultz on 5 May 2018

Edited as it sounded a bit harsh, didn't mean to be vitriolic; just came out that way. Am sorry @bheads if you took offense, was written as I woke up :).

It will inline! Even if it doesn't, you can force it to inline with that keyword, which unlike C++ __inline or whatever it is called will actually inline.
About the whole C/C++ moving things around, I was actually wrong thanks for the correction I updated my comment to remove that bit; what I was thinking about was something different entirely, regardless it was just a completely different point and not relevant and I was just trying to help you see why your code would produce weird bytes field as they order wasn't right.
And as above stated, it is not just functionally identical; they are instruction identical
function calls aren't expensive; I don't know what position your talking from, but unless your looking at a very slow microprocessor, where you want to squeeze every last inch of performance; you simply won't care that much about function calls, the biggest cost to them is the impact on the cache imo. Regardless keep in mind I stated they weren't 'that' expensive, running anything millions of times adds up; AGAIN though as a final time, the compiler will inline it...
The data isn't modified, and when you use unions you however allow the data to be modified? I gave you an explicit detailing of what a cast is instruction wise (well not very detailed but to a moderate depth)
I don't think align/ptrcast does bit shifting, but functionally and instructionally it does the same thing as your union example :). SO if either one does casting/bit shifting so will your union. If they aren't instructionally the same then if they produce slower code either that is a bug/missed optimisation, or a valid reason why this is good; maybe go have a look :).
Finally, thanks for the github files; the second one just straight up doesn't use nested anonymous functions (maybe a mistake), and the first one really needs to; otherwise the naming gets all of; and if you don't want to then you could easily make a function that returns that, since casting from an array of [16] to [4][4] is possible. Again casting in this case is just a compiler thing I'm about 99% sure, it'll probably require a single instruction for size or something; but as long as you keep them as arrays till the end then make them slices everything will be fine.

Overall; maybe the IRC is more of a place to talk about inling and non relevant points as stated below let's keep this about just nested unions.

BraedonWooding on 6 May 2018

I haven't read this issue yet, I'll try to clear up the confusion when I do.

But let's remember our community motto

Be excellent to each other

@BraedonWooding No need to repeat yourself. Even if someone is nitpicking, don't accuse them of nitpicking. Just ignore it and move on.

Meta-conversation is off topic. That means this comment (the one I am typing right now) is off topic and replying to this comment is off topic.

andrewrk on 6 May 2018

👍1

Back to the topic at hand...

I do use something like this in C where you have a sort of fake inheritance
using anonymous structs. You have a base struct. Other structures embed
the base struct at the beginning of themselves. You then upcast pointers
from the base type to the actual type. It was a pattern I first saw in
Amiga programming more decades ago than I want to think about. Often you
had a member "base" at the beginning of each derived struct. However, it
become a little nicer and cleaner with anonymous structs.

struct base {
     int type_id;
     struct base *next;
};

struct derived {
     struct base; /* anonymous */
     int a_new_field;
     ...
};

Now if I have a variable of type derived, then it appears to have fields
id and next. If I change the definition of base it automatically
gets these.

I have not had much use for anonymous unions but where I have seen them
used was for some form of variant record:

struct {
    int variant_type;
    union {
        struct variant_A a_var;
        struct variant_B b_var;
        struct variant_C c_var;
        ....
    }; /* anonymous */
} variant_struct;

Obviously, this variant record type is better handled in Zig. However, the first example with
the anonymous structs is pretty handy. While enum structs in Zig cover
most of this, they are not possible to extend after the fact. The C
version is.

Grr. I replied via email and now the formatting is gone... Anyone know how to fix that?

kyle-github on 6 May 2018

@kyle-github Your first example is pretty neat, actually. I didn't know you could "inline" an existing struct into another in C. This C OOP is also a thing that is done in Zig code, though the pattern is a little different. Because Zig is allowed to rearrange fields of a none packed struct, casting from base to derived is not a good idea, as you have no guarantees that base is the first field of derived. Instead, Zig has the @fieldParentPtr builtin (example). This builtin require that the base field in derived have a name.

For packed structs in Zig, you OOP model would work just fine, as the order is preserved. I do however think, that even if you are using packed, you should probably still use @fieldParentPtr, as it reduces the opportunity for bugs. Rearrange any fields in your struct, and @fieldParentPtr will still work as expected. And if you ever change the type of the base field in derived, @fieldParentPtr gives a compiler error.

@fieldParentPtr, does however not give you base's fields in derived, which is the main argument for nested anonymous struct/unions.

Also, on another note. Isn't unions that represent the data in two or more forms only really a thing because it avoids C strict aliasing bugs since casting between pointer types violates strict aliasing?

Hejsil on 6 May 2018

Hi @Hejsil, yes the "OOP" method is heavily used in a lot of C. Often the "base" object will actually be the vtable plus some sort of type ID or something simple.

I am not sure I am tracking your point about @fieldParentPtr? Unless I am understanding the use incorrectly, this seems to going the other direction. Downcasting rather than upcasting?

The point about extensibility is that using something like enum struct is hugely more clean but it cannot be extended later. You must modify the enum directly to add another type. The C method allows you to extend a base type in a different compilation unit. It is definitely limited, but does not constrain you to modify the original code.

kyle-github on 6 May 2018

@kyle-github

const Base = struct {
    const Self = this;

    type_id: u64,
    next: ?&Self,

    // Could use a vtable instead.
    fooFn: fn(&Self) usize,

    fn cast(self: &Self, comptime D: type) ?&D {
        if (type_hash(D) != self.type_id)
            return null;
        return @fieldParentPtr(D, "base", self);
    }

    fn foo(self: &Self) usize {
        return self.fooFn(self);
    }
};

const Derived = struct {
    const Self = this;

    base: Base,
    data: usize,

    fn init(data: usize) Self {
        return Self { .base = {.type_id=type_hash(Self), .next=null, .fooFn=foo }, .data=data }; 
    }

    fn foo(base: &Base) usize {
        const self = @fieldParentPtr(Self, "base", base);
        return self.data;
    }
};

alexnask on 7 May 2018

Struct embedding has a long history before Go made it popular. The origin seems to be the Plan 9 C compiler (GCC has it today, see https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html). If it is "standard enough" for a flavor of C to have it, it might be worthwhile to do in Zig. Then again, Zig has much better support for inheritance.

isaachier on 11 Jun 2018

👍2

Came here to write this very proposal. I've used it in both C and Go, and it has been very useful for code clarity, in my opinion. I hope it makes it in. 😄

Nairou on 12 Nov 2019

I think the following issue forms an argument in this discussion but I do not for which side.
When I import a C header in zig with

use @cImport({
    @cInclude("rgb.h");
});

containing following structure:

typedef struct{
    union {
        struct {
            union {
                uint8_t r;
                uint8_t red;
            };
            union {
                uint8_t g;
                uint8_t green;
            };
            union {
                uint8_t b;
                uint8_t blue;
            };
        };
        uint8_t raw[3];
    };
}  CRGB ;

I would like to be able to interface with C libraries containing this features / headers,
since one of zigs selling points is the seamless integration with existing C libraries.

belse-de on 19 Dec 2019

👍5

Thanks @Hejsil I was glad to be pointed to this issue by @vexu.

...some C code that is not easily mapped to Zig because of nested struct/unions. We want real world examples.

I am currently porting liburing's io_uring helpers to Zig.

I realized that mapping io_uring_sqe to Zig is almost impossible to do in a future-proof way because the kernel is adding loads of unions to the struct, and our std lib definition can't keep up without breaking all existing Zig code that references the struct using the pure u32 or u64 instead of going through the union.

i.e. Every time we update our std lib version of io_uring_sqe to add a new union to match the kernel, we break all Zig code that uses the older version of the struct.

Real world code example is here: https://github.com/ziglang/zig/issues/6349

jorangreef on 15 Sep 2020

👍1

This feature could use this syntax:

const U = extern union {
    full: u32,
    usingnamespace _: extern struct {
        a: u8,
        b: u8,
        c: u8,
        d: u8
    },
};

Tetralux on 15 Sep 2020

There is a parallel discussion in #6349

Rocknest on 21 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

replace "&&" and "||" with "and" and "or"

andrewrk · 3Comments

proposal: rename List to Vector in standard library

andrewrk · 3Comments

fix inability to interact with C ABI symbols with underscore name (`_`) by making it a keyword

andrewrk · 3Comments

QOL Proposal: use(xxx) statement to reduce repeating element

bheads · 3Comments

make Debug and ReleaseSafe modes fully safe

andrewrk · 3Comments