Crystal: [Feature Request] Add bitfields to C structs

Created on 14 Jan 2017 · 41Comments · Source: crystal-lang/crystal

Summary

This is an RFC to add bit-fields to structures. This only applies to C structs (i.e. structs defined within a lib).

Motivation

I'm writing an Operating System kernel in Crystal and had to resort to ugly bit twiddling to simulate bit-fields, which significantly reduces readability and maintainability.

I'm aware that this is a very specific use case, but I'm positive that the vast majority of Crystal users, especially those who are used to working with C code, could benefit from this.

Detailed design

Currently, C structs look somewhat like this:

lib C
  struct Foo
    foo : UInt32
    bar : UInt32
  end
  struct Bar
    foo, bar : UInt32
  end
end

Bit-fields could be added in a backward-compatible way by adding another colon after the type, followed by the size in bits. This syntax is currently illegal, and shouldn't be used in any existing codebase, so doing it that way would probably be safe. This approach would look somewhat like the following:

lib C
  struct Foo
    hi : UInt32 : 16
    lo : UInt32 : 16
  end
  struct Bar
    hi, lo : UInt32 : 16
  end
end

Another way would be to prepend the size in bits to the field, followed by a colon and the usual field definition. Since identifiers can't start with a number, this would also currently be considered illegal by the compiler, and also shouldn't break any existing codebase. That approach would look somewhat like the following:

lib C
  struct Foo
    16: hi : UInt32
    16: lo : UInt32
  end
  struct Bar
    16: hi, lo : UInt32
  end
end

I personally prefer the second syntax, because I think that it's more readable, especially when multiple fields are defined on the same line.

Update

@RX14 proposed the following syntax:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    hi : UInt32
    lo : UInt32
  end

  @[Packed]
  struct Bar
    @[BitSize(16)]
    hi, lo : UInt32
  end
end

All of the aforementioned approaches would roughly translate to the following C code:

struct Foo {
  uint32_t hi : 16
  uint32_t lo : 16
} foo_t

Implementation details

Bit-fields should only work with integer primitives for reliability.

The size of the struct should equal the sum of all bits, in bytes (as usual).
That's a little more complicated when using bit-flags and can be confusing if you're not familiar with that concept. The following is an example of how sizes should be calculated:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    foo_hi, foo_lo : UInt32 # 16 bits + 16 bits = 4 bytes
    bar : UInt32 # 32 bits = 4 bytes
  end # 8 bytes
end

discussion

Source

SplittyDev

👍5

All 41 comments

The first notation could work, I don't like the second one. I want to propose a third option, which may look more like a generic type:

lib C
  struct Foo
    foo : Int32(2)
    bar : Int32(2)
  end
  struct Bar
    foo, bar : Int32(2)
  end
end

Many data structures in networking and on-disk files use this to save some bytes here and there. And just like in C, this is not an every-day feature, but one which makes code having to do this so much more readable and safer. Even if none of our notation proposals are accepted, I'd like to see this feature.

Papierkorb on 15 Jan 2017

@Papierkorb I thought about doing it like that too, but I think there's too much ambiguity between that and actual generics. Especially since numbers are allowed in real generics (e.g. StaticArray(T, N)).
I guess it wouldn't cause any issues since generics aren't allowed in libraries anyway, but still, it'd be easy to confuse that with actual generics.

SplittyDev on 15 Jan 2017

Especially since numbers are allowed in real generics

Which was exactly my point:

which may look more like a generic type

Papierkorb on 15 Jan 2017

@Papierkorb Ah yeah, I see. Well I wouldn't mind either way, I just think it's kinda confusing.

SplittyDev on 15 Jan 2017

Using Int32(2) doesn't make much sense except from c compatibility. It should be BitInt(2).

konovod on 15 Jan 2017

@konovod How would BitInt(N) make more sense?
BitInt would be a completely new type with the same functionality as a normal integer. So why add unnecessary complexity?

Neither of field : T : N or N: field : T requires new types.
Also, you really don't wanna access the bit-fields as actual bit arrays. You wanna be able to work with them in the same way as you'd work with a normal integer. That's the whole point of having bit-fields.

SplittyDev on 15 Jan 2017

How about making this an attribute like @[Packed]. It's not such a common feature so verbosity should be fine.

RX14 on 15 Jan 2017

@RX14 sounds good, but wouldn't the attribute be superfluous?
Let's say we have the following struct:

lib C
  @[PackedBits]
  struct Foo
    2 : foo, bar, baz : UInt32
  end
end

Wouldn't just using @[Packed] be sufficient? The compiler should be able to easily detect the presence of a bit-field by just matching against a new rule. So even though an attribute would clearly indicate the presence of bit-fields, I don't know whether it's actually necessary.

SplittyDev on 15 Jan 2017

@SplittyDev no, I meant like this:

struct Foo
  @[BitSize(2)]
  foo, bar, baz : UInt32
end

RX14 on 15 Jan 2017

👍1

@RX14 Oh, yeah I think that's a good idea.
Would that cause any issues with the way multiple fields are handled?
I'm not sure how Crystal handles that, but in other languages I've used, attributes only apply to the first element immediately following the attribute, so would it even be possible to make one attribute work with multiple fields?

SplittyDev on 15 Jan 2017

@SplittyDev well, BitInt isn't ideal, but why UInt32? What will change if it is UInt64 or UInt16? And what will change if it will be signed Int32 as in first post? I think the name should show that it is a special field, something like UInt(N), but Int is already used and tells nothing about N is bits, not bytes.
On the other hand i like the idea of feature and any syntax is fine because usages will be somewhere in the low-level implementations, not everyday code.

konovod on 15 Jan 2017

@konovod There's a reason for that. Consider the following code:

lib C
  @[Packed]
  struct Foo
    @[BitSize(8)]
    high, low : UInt16
  end
end

Now, both high and low are one byte wide, but their type is still UInt16, which is two bytes wide.
The important part here is that sizeof(Foo) is actually equal to sizeof(UInt16).

Without bit-fields you'd do something like that to get the same effect:

struct Foo
  def initialize(@value : UInt16)
  end

  def high
    (@value >> 8) & 0xFF
  end

  def low
    @value & 0xFF
  end
end

SplittyDev on 15 Jan 2017

I know what bitfields are for, i used them. But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32? It is better (both to compiler and to the reader of code) if they will have specific type that has a size of N bits. 16 is a just misleading.

konovod on 15 Jan 2017

But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32?

I'm afraid I don't quite understand what you mean by that.
Of course, in that trivial example I could've used two UInt8 fields instead of two UInt16 fields with a width of 8 bits. And in that case it would've also been a lot cleaner to do it that way.

That was only a very basic example though. Here's a more complex one:

lib C
  @[Packed]
  struct Page
    @[BitSize(1)] present, rw, user, accessed, dirty : UInt32
    @[BitSize(7)] unused : UInt32
    @[BitSize(20)] frame : UInt32
  end
end

Which equals the following C code:

struct page {
    uint32_t    present    : 1;
    uint32_t    rw         : 1;
    uint32_t    user       : 1;
    uint32_t    accessed   : 1;
    uint32_t    dirty      : 1;
    uint32_t    unused     : 7;
    uint32_t    frame      : 20;
} __attribute__((packed));

The size of the Page structure equals (5 + 7 + 20) = 32 bits (= 4 bytes).

SplittyDev on 15 Jan 2017

compare with

 lib C
   @[Packed]
   struct Page
     @[BitSize(0x01)] present, rw, user, accessed, dirty : UInt16
     @[BitSize(0x07)] unused : UInt16
     @[BitSize(0x14)] frame : UInt16
   end
 end

the size is still 32 bits = 4 bytes. What have changed for a compiler? Nothing. What UInt16 means in this context - just nothing. So it would be better if it would be some UIntGeneric type.

konovod on 15 Jan 2017

@konovod By adding a generic type to be used with bit-fields you're basically creating an integer that could hold infinitely many bits. Code like that would break on a wide range of CPU models and architectures. Usually, the max size of a bit-field is limited by its type.

SplittyDev on 15 Jan 2017

@SplittyDev Well, that depends on implementation. It can e.g. be always 32 (64?) bits, or automatically widens to a minimal size that can hold given number of bits (and fails to compile if @[BitSize(65)]).

konovod on 15 Jan 2017

@konovod In that case you'd still have the problem that you couldn't work with the generic integer easily. What if a function expects an Int32 and you pass it an UIntGeneric? I think adding something like that just overcomplicates everything.

SplittyDev on 15 Jan 2017

@SplittyDev such problem already exists - if something expects Int16 and you pass 0 there it fails because you must type 0i16. I hope this will be solved eventually and you will be able to pass any integer to function that expects integer without unnecessary conversions. Anyway, right now you still have to convert Int16 to pass it to function that expects Int32, i don't think that converting an UIntGeneric too will add much troubles.

konovod on 15 Jan 2017

I guess we could fully embrace LLVM's arbitrary-sized integers with stuff like UInt(7).

RX14 on 15 Jan 2017

@konovod Crystal doesn't support implicit type casts. Why would you add a type that you'd have to convert to another type every time you wanna use it, instead of just using the actual type?

I just don't see how that would be helpful, at all.

SplittyDev on 15 Jan 2017

@RX14 yeah that would be possible. Although I still don't get how one would benefit from doing it that way.

SplittyDev on 15 Jan 2017

@SplittyDev well i'm guessing that the underlying struct will be represented as such in the IR. Even on clang.

RX14 on 15 Jan 2017

@RX14 it probably will.

But I think the compiler should handle this, and not the user.
If the user wants the type to behave like a UInt16, then it should behave that way, even if the actual width (sign + value bits) is less than the size of the type.

C does it that way for a reason, and I think that it would be better to be consistent with C there.

SplittyDev on 15 Jan 2017

@SplittyDev looks like not actually: https://godbolt.org/g/KVQTxJ

RX14 on 15 Jan 2017

@SplittyDev Behavior will be same as Int16\Int8\Float32 - you have to convert it if you want to pass it somewhere where another type is expected. Completely consistent with the rest of language. And even if you declare the field as UInt32 you will have to use explicit conversion to e.g. write there a 0 or 1 or do anything useful.
Sign is another problem btw - i think signed bitfields should be just forbidden. They can create a lot of unexpected behavior and I can't imagine valid uses for them.

konovod on 15 Jan 2017

@konovod I've updated the 'implementation details' section of the RFC. Only allowing unsigned integers seems sensible.

@RX14, that's very interesting. I'm not very good at reading LLVM code, but it looks like this just allocates a chunk of memory big enough to hold the struct, and uses bitwise arithmetic to do the trick.

SplittyDev on 15 Jan 2017

@konovod actually, thinking about it, I don't think that allowing signed values would cause any issues.

SplittyDev on 15 Jan 2017

@SplittyDev after some thoughts i agree, they aren't such evil.

konovod on 15 Jan 2017

I'm pretty sure it's possible to write a macro that allows bitfield access to a property. Something like:

@[Extern]
struct Foo
  bitfields @raw : Int32, [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

The bitfields macro would define getters and setters, and using bit manipulation would set/get values through the underlying field.

asterite on 15 Jan 2017

I was thinking the same as @asterite.

We can move the struct out of lib, adding the @[Extern] flag so it continues to behave as a C struct we can now add methods, thus add getters and setters abstractions. Not as declarative as C bitfields, more expressive, but not particularly harder to read or write.

struct Foo
  @foo : Int32

  def hi
    @foo >> 4
  end

  def lo
    @foo & 0xffff
  end
end

ysbaddaden on 15 Jan 2017

@asterite, @ysbaddaden Let's say I have a struct with a size of 128 bits. I'd need two UInt64 fields to represent that and the macro to support that would quickly become extremely complicated. Also, what about packing? Would the struct really look like a C struct in memory?

Imho, a macro is only a temporary solution. Having support for that in the compiler would be a permanent solution and I'm sure that it's gonna be useful for a lot of low-level developers.

SplittyDev on 15 Jan 2017

@SplittyDev You can mark a struct with @[Packed] to make it packed. It's possible to do it with arbitrary sizes, my macro was just an example of a possible interface. For example we could have this:

@[Extern]
@[Packed]
struct Foo
  bitfields [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

Here we don't specify the backing field, it can probably be generate by the macro (or we can pass a name for that). Then for the total size we can use UInt8[total_size] and read/write from that.

I personally think that a general solution that doesn't need to modify the core language, if readable/usable enough, is always superior to modifying the language, specially for functionality that is very rarely needed.

asterite on 15 Jan 2017

👍1

@asterite That would only work for getting and setting aligned bits though.
Consider this:

struct {
  uint32_t a : 3
  uint32_t b : 3
  uint32_t c : 2
} foo_t;

~Something like that would be completely impossible with macros.~
Update: Forget what I said, it's wrong.

SplittyDev on 15 Jan 2017

@SplittyDev I don't understand why

asterite on 15 Jan 2017

Oh, I'm sorry @asterite I wrote that from my mobile and didn't pay enough attention. What I said is wrong.

SplittyDev on 15 Jan 2017

Is there something to do here? Do we define a macro in the stdlib or do leave this to a shard?

bew on 6 May 2018

That might be solved by using newly introduced (#6063) Annotation type (if merged).

Sija on 6 May 2018

Here's something I wrote inspired by @asterite's suggestion above. It's not actually storing the data in compressed bytes like in C but allows for reading in bitfields and outputting them again with changed values. This is what I needed for my use case of reading in 416 bits with random sized values. I'd appreciate any useful suggestions.

https://github.com/elorest/bitfields

elorest on 9 Dec 2018

❤1

Anybody here opposed to closing this? It seems like the usecases can be solved through a shard, which we can consider to include into the standard library if it becomes really popular or otherwise necessary.