Crystal: [Feature Request] Add bitfields to C structs

Created on 14 Jan 2017  路  41Comments  路  Source: crystal-lang/crystal

Summary

This is an RFC to add bit-fields to structures. This only applies to C structs (i.e. structs defined within a lib).

Motivation

I'm writing an Operating System kernel in Crystal and had to resort to ugly bit twiddling to simulate bit-fields, which significantly reduces readability and maintainability.

I'm aware that this is a very specific use case, but I'm positive that the vast majority of Crystal users, especially those who are used to working with C code, could benefit from this.

Detailed design

Currently, C structs look somewhat like this:

lib C
  struct Foo
    foo : UInt32
    bar : UInt32
  end
  struct Bar
    foo, bar : UInt32
  end
end

Bit-fields could be added in a backward-compatible way by adding another colon after the type, followed by the size in bits. This syntax is currently illegal, and shouldn't be used in any existing codebase, so doing it that way would probably be safe. This approach would look somewhat like the following:

lib C
  struct Foo
    hi : UInt32 : 16
    lo : UInt32 : 16
  end
  struct Bar
    hi, lo : UInt32 : 16
  end
end

Another way would be to prepend the size in bits to the field, followed by a colon and the usual field definition. Since identifiers can't start with a number, this would also currently be considered illegal by the compiler, and also shouldn't break any existing codebase. That approach would look somewhat like the following:

lib C
  struct Foo
    16: hi : UInt32
    16: lo : UInt32
  end
  struct Bar
    16: hi, lo : UInt32
  end
end

I personally prefer the second syntax, because I think that it's more readable, especially when multiple fields are defined on the same line.

Update

@RX14 proposed the following syntax:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    hi : UInt32
    lo : UInt32
  end

  @[Packed]
  struct Bar
    @[BitSize(16)]
    hi, lo : UInt32
  end
end

All of the aforementioned approaches would roughly translate to the following C code:

struct Foo {
  uint32_t hi : 16
  uint32_t lo : 16
} foo_t

Implementation details

Bit-fields should only work with integer primitives for reliability.

The size of the struct should equal the sum of all bits, in bytes (as usual).
That's a little more complicated when using bit-flags and can be confusing if you're not familiar with that concept. The following is an example of how sizes should be calculated:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    foo_hi, foo_lo : UInt32 # 16 bits + 16 bits = 4 bytes
    bar : UInt32 # 32 bits = 4 bytes
  end # 8 bytes
end
discussion

All 41 comments

The first notation could work, I don't like the second one. I want to propose a third option, which may look more like a generic type:

lib C
  struct Foo
    foo : Int32(2)
    bar : Int32(2)
  end
  struct Bar
    foo, bar : Int32(2)
  end
end

Many data structures in networking and on-disk files use this to save some bytes here and there. And just like in C, this is not an every-day feature, but one which makes code having to do this so much more readable and safer. Even if none of our notation proposals are accepted, I'd like to see this feature.

@Papierkorb I thought about doing it like that too, but I think there's too much ambiguity between that and actual generics. Especially since numbers are allowed in real generics (e.g. StaticArray(T, N)).
I guess it wouldn't cause any issues since generics aren't allowed in libraries anyway, but still, it'd be easy to confuse that with actual generics.

Especially since numbers are allowed in real generics

Which was exactly my point:

which may look more like a generic type

@Papierkorb Ah yeah, I see. Well I wouldn't mind either way, I just think it's kinda confusing.

Using Int32(2) doesn't make much sense except from c compatibility. It should be BitInt(2).

@konovod How would BitInt(N) make more sense?
BitInt would be a completely new type with the same functionality as a normal integer. So why add unnecessary complexity?

Neither of field : T : N or N: field : T requires new types.
Also, you really don't wanna access the bit-fields as actual bit arrays. You wanna be able to work with them in the same way as you'd work with a normal integer. That's the whole point of having bit-fields.

How about making this an attribute like @[Packed]. It's not such a common feature so verbosity should be fine.

@RX14 sounds good, but wouldn't the attribute be superfluous?
Let's say we have the following struct:

lib C
  @[PackedBits]
  struct Foo
    2 : foo, bar, baz : UInt32
  end
end

Wouldn't just using @[Packed] be sufficient? The compiler should be able to easily detect the presence of a bit-field by just matching against a new rule. So even though an attribute would clearly indicate the presence of bit-fields, I don't know whether it's actually necessary.

@SplittyDev no, I meant like this:

struct Foo
  @[BitSize(2)]
  foo, bar, baz : UInt32
end

@RX14 Oh, yeah I think that's a good idea.
Would that cause any issues with the way multiple fields are handled?
I'm not sure how Crystal handles that, but in other languages I've used, attributes only apply to the first element immediately following the attribute, so would it even be possible to make one attribute work with multiple fields?

@SplittyDev well, BitInt isn't ideal, but why UInt32? What will change if it is UInt64 or UInt16? And what will change if it will be signed Int32 as in first post? I think the name should show that it is a special field, something like UInt(N), but Int is already used and tells nothing about N is bits, not bytes.
On the other hand i like the idea of feature and any syntax is fine because usages will be somewhere in the low-level implementations, not everyday code.

@konovod There's a reason for that. Consider the following code:

lib C
  @[Packed]
  struct Foo
    @[BitSize(8)]
    high, low : UInt16
  end
end

Now, both high and low are one byte wide, but their type is still UInt16, which is two bytes wide.
The important part here is that sizeof(Foo) is actually equal to sizeof(UInt16).

Without bit-fields you'd do something like that to get the same effect:

struct Foo
  def initialize(@value : UInt16)
  end

  def high
    (@value >> 8) & 0xFF
  end

  def low
    @value & 0xFF
  end
end

I know what bitfields are for, i used them. But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32? It is better (both to compiler and to the reader of code) if they will have specific type that has a size of N bits. 16 is a just misleading.

But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32?

I'm afraid I don't quite understand what you mean by that.
Of course, in that trivial example I could've used two UInt8 fields instead of two UInt16 fields with a width of 8 bits. And in that case it would've also been a lot cleaner to do it that way.

That was only a very basic example though. Here's a more complex one:

lib C
  @[Packed]
  struct Page
    @[BitSize(1)] present, rw, user, accessed, dirty : UInt32
    @[BitSize(7)] unused : UInt32
    @[BitSize(20)] frame : UInt32
  end
end

Which equals the following C code:

struct page {
    uint32_t    present    : 1;
    uint32_t    rw         : 1;
    uint32_t    user       : 1;
    uint32_t    accessed   : 1;
    uint32_t    dirty      : 1;
    uint32_t    unused     : 7;
    uint32_t    frame      : 20;
} __attribute__((packed));

The size of the Page structure equals (5 + 7 + 20) = 32 bits (= 4 bytes).

compare with

 lib C
   @[Packed]
   struct Page
     @[BitSize(0x01)] present, rw, user, accessed, dirty : UInt16
     @[BitSize(0x07)] unused : UInt16
     @[BitSize(0x14)] frame : UInt16
   end
 end

the size is still 32 bits = 4 bytes. What have changed for a compiler? Nothing. What UInt16 means in this context - just nothing. So it would be better if it would be some UIntGeneric type.

@konovod By adding a generic type to be used with bit-fields you're basically creating an integer that could hold infinitely many bits. Code like that would break on a wide range of CPU models and architectures. Usually, the max size of a bit-field is limited by its type.

@SplittyDev Well, that depends on implementation. It can e.g. be always 32 (64?) bits, or automatically widens to a minimal size that can hold given number of bits (and fails to compile if @[BitSize(65)]).

@konovod In that case you'd still have the problem that you couldn't work with the generic integer easily. What if a function expects an Int32 and you pass it an UIntGeneric? I think adding something like that just overcomplicates everything.

@SplittyDev such problem already exists - if something expects Int16 and you pass 0 there it fails because you must type 0i16. I hope this will be solved eventually and you will be able to pass any integer to function that expects integer without unnecessary conversions. Anyway, right now you still have to convert Int16 to pass it to function that expects Int32, i don't think that converting an UIntGeneric too will add much troubles.

I guess we could fully embrace LLVM's arbitrary-sized integers with stuff like UInt(7).

@konovod Crystal doesn't support implicit type casts. Why would you add a type that you'd have to convert to another type every time you wanna use it, instead of just using the actual type?

I just don't see how that would be helpful, at all.

@RX14 yeah that would be possible. Although I still don't get how one would benefit from doing it that way.

@SplittyDev well i'm guessing that the underlying struct will be represented as such in the IR. Even on clang.

@RX14 it probably will.

But I think the compiler should handle this, and not the user.
If the user wants the type to behave like a UInt16, then it should behave that way, even if the actual width (sign + value bits) is less than the size of the type.

C does it that way for a reason, and I think that it would be better to be consistent with C there.

@SplittyDev looks like not actually: https://godbolt.org/g/KVQTxJ

@SplittyDev Behavior will be same as Int16\Int8\Float32 - you have to convert it if you want to pass it somewhere where another type is expected. Completely consistent with the rest of language. And even if you declare the field as UInt32 you will have to use explicit conversion to e.g. write there a 0 or 1 or do anything useful.
Sign is another problem btw - i think signed bitfields should be just forbidden. They can create a lot of unexpected behavior and I can't imagine valid uses for them.

@konovod I've updated the 'implementation details' section of the RFC. Only allowing unsigned integers seems sensible.

@RX14, that's very interesting. I'm not very good at reading LLVM code, but it looks like this just allocates a chunk of memory big enough to hold the struct, and uses bitwise arithmetic to do the trick.

@konovod actually, thinking about it, I don't think that allowing signed values would cause any issues.

@SplittyDev after some thoughts i agree, they aren't such evil.

I'm pretty sure it's possible to write a macro that allows bitfield access to a property. Something like:

@[Extern]
struct Foo
  bitfields @raw : Int32, [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

The bitfields macro would define getters and setters, and using bit manipulation would set/get values through the underlying field.

I was thinking the same as @asterite.

We can move the struct out of lib, adding the @[Extern] flag so it continues to behave as a C struct we can now add methods, thus add getters and setters abstractions. Not as declarative as C bitfields, more expressive, but not particularly harder to read or write.

struct Foo
  @foo : Int32

  def hi
    @foo >> 4
  end

  def lo
    @foo & 0xffff
  end
end

@asterite, @ysbaddaden Let's say I have a struct with a size of 128 bits. I'd need two UInt64 fields to represent that and the macro to support that would quickly become extremely complicated. Also, what about packing? Would the struct really look like a C struct in memory?

Imho, a macro is only a temporary solution. Having support for that in the compiler would be a permanent solution and I'm sure that it's gonna be useful for a lot of low-level developers.

@SplittyDev You can mark a struct with @[Packed] to make it packed. It's possible to do it with arbitrary sizes, my macro was just an example of a possible interface. For example we could have this:

@[Extern]
@[Packed]
struct Foo
  bitfields [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

Here we don't specify the backing field, it can probably be generate by the macro (or we can pass a name for that). Then for the total size we can use UInt8[total_size] and read/write from that.

I personally think that a general solution that doesn't need to modify the core language, if readable/usable enough, is always superior to modifying the language, specially for functionality that is very rarely needed.

@asterite That would only work for getting and setting aligned bits though.
Consider this:

struct {
  uint32_t a : 3
  uint32_t b : 3
  uint32_t c : 2
} foo_t;

~Something like that would be completely impossible with macros.~
Update: Forget what I said, it's wrong.

@SplittyDev I don't understand why

Oh, I'm sorry @asterite I wrote that from my mobile and didn't pay enough attention. What I said is wrong.

Is there something to do here? Do we define a macro in the stdlib or do leave this to a shard?

That might be solved by using newly introduced (#6063) Annotation type (if merged).

Here's something I wrote inspired by @asterite's suggestion above. It's not actually storing the data in compressed bytes like in C but allows for reading in bitfields and outputting them again with changed values. This is what I needed for my use case of reading in 416 bits with random sized values. I'd appreciate any useful suggestions.

https://github.com/elorest/bitfields

Anybody here opposed to closing this? It seems like the usecases can be solved through a shard, which we can consider to include into the standard library if it becomes really popular or otherwise necessary.

Yes, let's close this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HCLarsen picture HCLarsen  路  162Comments

chocolateboy picture chocolateboy  路  87Comments

farleyknight picture farleyknight  路  64Comments

fridgerator picture fridgerator  路  79Comments

RX14 picture RX14  路  62Comments