Crystal: New offsetof() expression

Created on 20 Mar 2019 · 31Comments · Source: crystal-lang/crystal

I thought it would be nice to have a C-style offsetof() macro besides sizeof(), which could be practical when interfacing with C. My implementation currently looks like this:

macro offsetof(path, property_name)
  {%
    type = path.resolve
    properties = type.instance_vars
    offset_expr = "0"
  %}

  {% for current_property in properties %}
    {% if current_property.name == property_name %}
      ({{offset_expr.id}})
    {% end %}
    {% offset_expr = offset_expr + " + sizeof(" + "#{current_property.type.name}" + ")" %}
  {% end %}
end

This seems to work for me, although I don't know whether Crystal performs some sort of packing/alignment which would break this. If so, is there a different way to approach this?

help-wanted feature compiler

Source

malte-v

Most helpful comment

@asterite Concrete use case:
I'm currently getting into Vulkan (the graphics API) and I'm using Crystal since it's my programming language of choice. At some point you will need tell Vulkan how to pass data from the main program (written in Crystal) to some shader (written in GLSL/HLSL/...). In order to do that, you must define a data structure which contains all per-vertex data. A minimalistic example would be:

struct Vertex
  property pos : Vec2
  property color : Vec3

  def initialize(@pos = Vec2.zero, @color = Vec3.zero)
  end
end

Then, you fill in a structure containing the information on how this data is laid out:

struct Vertex
  # ...

  # this is just for completeness, see below for the relevant part
  def self.get_binding_description
    description = Vk::VertexInputBindingDescription.new
    description.binding = 0
    description.stride = sizeof(Vertex)
    description.input_rate = Vk::VertexInputRate::Vertex

    return description
  end

  def self.get_attribute_descriptions
    descriptions = uninitialized StaticArray(Vk::VertexInputAttributeDescription, 2)

    # this is for the position
    descriptions[0].binding = 0    # you can ignore
    descriptions[0].location = 0   # these two
    descriptions[0].format = Vk::Format::R32G32Sfloat # this basically means "two 32-bit floats"
    descriptions[0].offset = offsetof(Vertex, pos)  # uh-oh... how should I do this?

    # this is for the color
    descriptions[1].binding = 0
    descriptions[1].location = 1
    descriptions[1].format = Vk::Format::R32G32B32Sfloat # this means "three 32-bit floats"
    descriptions[1].offset = offsetof(Vertex, color)

    descriptions
  end
end

Without considering hard-coding the offsets an option, I don't think there is a different way to solve this problem. The use case might be rare, but when you're in a situation like above, you're pretty much stuck. I think seamless C interop is one of Crystal's major selling points, and sometimes you need to deal with numerical offsets. I can understand you don't want to "pollute" the language with features that are rarely used, but this simply cannot be implemented without modifying the compiler.

malte-v on 20 Mar 2019

👍6

All 31 comments

It's not safe to calculate member offsets by hand. There is no guarantee that the binary data layout corresponds to the order and size in the type declaration. LLVM can and will move members around.

So only LLVM can tell the actual offset used of a specific ivar, but we can expose this as an intrinsic method like sizeof.

straight-shoota on 20 Mar 2019

Thanks for the quick clarification! Probably, LLVMOffsetOfElement is what should be used here (it's already in LibLLVM). I would be happy to help implementing this if you point me in the right direction. This requires introducing a new AST node, right? And if so, what's the process of creating one?

malte-v on 20 Mar 2019

Before adding any new feature to the standard library or the language we should answer the question of why this is needed. Why do you need the offsetof function?

asterite on 20 Mar 2019

struct Vertex
  property pos : Vec2
  property color : Vec3

  def initialize(@pos = Vec2.zero, @color = Vec3.zero)
  end
end

Then, you fill in a structure containing the information on how this data is laid out:

struct Vertex
  # ...

  # this is just for completeness, see below for the relevant part
  def self.get_binding_description
    description = Vk::VertexInputBindingDescription.new
    description.binding = 0
    description.stride = sizeof(Vertex)
    description.input_rate = Vk::VertexInputRate::Vertex

    return description
  end

  def self.get_attribute_descriptions
    descriptions = uninitialized StaticArray(Vk::VertexInputAttributeDescription, 2)

    # this is for the position
    descriptions[0].binding = 0    # you can ignore
    descriptions[0].location = 0   # these two
    descriptions[0].format = Vk::Format::R32G32Sfloat # this basically means "two 32-bit floats"
    descriptions[0].offset = offsetof(Vertex, pos)  # uh-oh... how should I do this?

    # this is for the color
    descriptions[1].binding = 0
    descriptions[1].location = 1
    descriptions[1].format = Vk::Format::R32G32B32Sfloat # this means "three 32-bit floats"
    descriptions[1].offset = offsetof(Vertex, color)

    descriptions
  end
end

malte-v on 20 Mar 2019

👍6

I think the usecase is valid :+1:

We can probably go with offsetof(Type, @var) and I guess it could work with both classes and structs, though for classes it might not be that useful. And error on primitive types and such.

asterite on 20 Mar 2019

👍1

Maybe offsetof(Type.@var), leveraging property access syntax (var.@property) to reduce confusion whether @var refers to Type's or some in the current scope?

jhass on 21 Mar 2019

👍1

@jhass It's a good idea. But since this is mostly to interact with C libraries I'd prefer to use the same syntax as C.

asterite on 21 Mar 2019

👍2

Ok, so I started working on this yesterday and it looks like we have a somehow working implementation to begin with: https://github.com/malte-v/crystal/commit/384e26a13eeb1c261aac1fee6ddb06c33d402429

Mostly I just looked up how sizeof works and added something equivalent for offsetof. Here's how it looks like:

struct Foo
  @x = 0_i64
  @y = 34_i32
end

foo = Foo.new

puts offsetof(typeof(foo), @y) #=> 8

Currently, my implementation just expands all OffsetOf nodes to NumberLiterals in the main semantic visitor by calling LLVMOffsetOfElement. I'm not entirely sure whether this is good practice, because from my understanding Crystal has two different ways of handling sizeof expressions:

If the type given to sizeof is not a typeof expression, it's simply expanded to a NumberLiteral by calling LLVMGetSizeOfTypeInBits which returns a UInt64
Otherwise, the sizeof expression is translated into some kind of LLVM-IR-specific sizeof expression using LLVMSizeOf, which returns a ValueRef (I think so?)

Do we also need this distinction with offsetof? I'm wondering because it looks like my current implementation works fine with typeof. If we also need the second variant for offsetof, we could probably generate a getelementptr, right?

Anyways, I wanted to say thank you for keeping the code clean and easy to understand so even newcomers con contribute!

malte-v on 24 Mar 2019

LLVM cannot and will not reorder struct members. It doesn't even do padding. So this can probably be done with the exact same logic as your original macro.

RX14 on 24 Mar 2019

@malte-v If you have sizeof(typeof(something)) then you can't compute the type right away because the type of something might change. For example:

a = 1
loop do
  sizeof(typeof(a))
  a = "hello"
end

In the first pass the compiler will infer that a is an Int32 and so typeof(a) will be Int32. However once the loop is analyzed and after seeing a = "hello" the compiler will conclude that a is Int32 | String and so the previous type of typeof wasn't the true, final one.

That's why in offsetof if you have typeof you can't compute the size right away. You have to wait for the types to settle. I _think_ we shouldn't be using LLVMSizeOf because once types settle we can compute it using LLVMGetSizeOfTypeInBits, but that would be an improvement.

asterite on 24 Mar 2019

@RX14 LLVM does padding:

struct Foo
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end

puts "real size: " + sizeof(Foo).to_s
puts "real x offset: " + offsetof(Foo, @x).to_s
puts "real y offset: " + offsetof(Foo, @y).to_s
puts "real z offset: " + offsetof(Foo, @z).to_s

puts "wrong size: #{sizeof(Int64) + sizeof(Int8) + sizeof(Int32)}"
puts "wrong z offset: #{sizeof(Int64) + sizeof(Int8)}"

Output:

real size: 16
real x offset: 0
real y offset: 8
real z offset: 12
wrong size: 13
wrong z offset: 9

@asterite I see. Now I'm wondering if we should allow typeof in offsetof at all. Wouldn't it be weird if you don't know the type of an expression but you know the name of one of its instance variables?

malte-v on 24 Mar 2019

@malte-v I guess for now we can disallow it until someone needs it. But I can't see why we would disallow it. It can make for more DRY code. Like:

def name_offset(obj)
  offsetof(typeof(obj), @name)
end

asterite on 24 Mar 2019

@asterite I agree we shouldn't invest time into this right now. I also added an instance_offsetof expression which works for classes:

class Foo
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end   

puts "class sizeof: " + instance_sizeof(Foo).to_s
puts "class offsetof z: " + instance_offsetof(Foo, @z).to_s   


struct Bar
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end   

puts "struct sizeof: " + sizeof(Bar).to_s
puts "struct offsetof z: " + offsetof(Bar, @z).to_s   


bar = Bar.new
puts "z by offset: " + Pointer(Int32).new(pointerof(bar).address + offsetof(Bar, @z)).value.to_s

Output:

class sizeof: 24
class offsetof z: 16
struct sizeof: 16
struct offsetof z: 12
z by offset: 5

But why does the class instance take up more space? Another thing I encountered was a transform method in cleanup_transformer.cr which is only defined for InstanceSizeof:

def transform(node : InstanceSizeOf)
  exp_type = node.exp.type?

  if exp_type
    instance_type = exp_type.instance_type.devirtualize
    unless instance_type.class?
      node.exp.raise "#{instance_type} is not a class, it's a #{instance_type.type_desc}"
    end
  end

  if expanded = node.expanded
    return expanded
  end

  node
end

Is this only for checking that the type passed toinstance_sizeof is a class? If so, why isn't this included in MainVisitor?

malte-v on 24 Mar 2019

A class is a reference type. Memory is allocated for it and you get a pointer to it, and the pointer is passed around (unlike a struct where the entire struct is passed around, no pointer involved). The sizeof a class type is then the size of a pointer. That's why there's instance_sizeof: to know the size behind that pointer. And it doesn't make sense to ask this for a struct type.

For offsetof there's no point to define two things: it's always the offset of a field relative to the beginning (if it's for a class, that means behind the pointer).

So instance_offsetof can go away.

asterite on 24 Mar 2019

Also a class type always hold a type ID as the first field. Then if you have a class Foo and a class Bar < Foo, Foo is still represented as a pointer and behind it you can know whether it's Foo or Bar at runtime.

asterite on 24 Mar 2019

@asterite The only difference between offsetof and instance_offsetof is that instance_offsetof uses llvm_struct_type insteaf of llvm_type:

def size_of(type)
  size = llvm_typer.size_of(llvm_typer.llvm_type(type))
  size = 1 if size == 0
  size
end

def instance_size_of(type)
  llvm_typer.size_of(llvm_typer.llvm_struct_type(type))
end

def offset_of(type, element_index)
  llvm_typer.offset_of(llvm_typer.llvm_type(type), element_index)
end

def instance_offset_of(type, element_index)
  llvm_typer.offset_of(llvm_typer.llvm_struct_type(type), element_index)
end

First I tried to use the original offset_of implementation with classes, which gave me weird results like 140112558951168. Then I tried using llvm_struct_type to construct the LLVM type like instance_sizeof does, and it worked fine. To be honest, I don't know the difference between the two. So what should be done to remove the instance_offsetof?
To me it seemed logical that we need a dedicated instance_offsetof expression since a pointer can't have instance variables and we want to know the offset of some instance variable of a class behind the pointer.

malte-v on 24 Mar 2019

You need to use llvm_type if it's a struct, llvm_struct_type if it's a class. You can check this by calling struct? on the type

asterite on 25 Mar 2019

I just added specs as following:

require "../../spec_helper"

describe "Semantic: offsetof" do
  it "types offsetof" do
    assert_type("offsetof(String, @length)") { int32 }
  end

  it "errors on undefined instance variable" do
    assert_error "struct Foo; @a = 0; end; offsetof(Foo, @b)", "type Foo doesn't have an instance variable called @b"
  end

  it "errors on typeof inside offsetof expression" do
    assert_error "struct Foo; @a = 0; end; foo = Foo.new; offsetof(typeof(foo), @a)", "can't use typeof inside offsetof expression"
  end

  it "gives error if using offsetof on something that can't have instance variables" do
    assert_error "offsetof(Int32, @foo)", "type Int32 can't have instance variables"
  end

  it "gives error if using offsetof on something that's neither a class nor a struct" do
    assert_error "module Foo; @a = 0; end; offsetof(Foo, @a)", "Foo is neither a class nor a struct, it's a module"
  end

  it "errors on offsetof uninstantiated generic type" do
    assert_error "struct Foo(T); @a = 0; end; offsetof(Foo, @a)", "can't take offsetof element @a of uninstantiated generic type Foo(T)"
  end
end

I'm happy with it right now and I'd say it's ready for review: #7589

malte-v on 25 Mar 2019

❤1

Maybe offsetof(Type.@var), leveraging property access syntax (var.@property) to reduce confusion whether @var refers to Type's or some in the current scope?

I tending more towards this syntax now. Type.@var makes it clear that this is about the @var ivar of Type. In Type, @var, it's easy to confuse @var with a ivar in the current scope.

straight-shoota on 26 Mar 2019

👍1

@straight-shoota How would you combine that with typeof? For example offsetof(typeof(exp), @var).

asterite on 26 Mar 2019

I know that Symbols are possibly going to be dropped, but isn't offsetof(Type, :var) better?
Or even Type.offsetof(:var) to make it similar to responds_to??

konovod on 26 Mar 2019

@konovod @var, unlike :var, shows that offsetof only works with instance variables.

malte-v on 26 Mar 2019

👍1

@asterite Anything wrong with offsetof(typeof(exp).@var)?

straight-shoota on 26 Mar 2019

@konovod I wouldn't want to add such a rarely used method to every type scope.

straight-shoota on 26 Mar 2019

@straight-shoota The logic is a bit harder. You can then have offsetof(foo.bar.@baz) and then you need to separate the foo.bar from @baz and so on. I think making it look like C's syntax is fine.

asterite on 26 Mar 2019

👍1

What I mean is, I don't think there's a need to invent new syntax for this if the only place this comes from is C and it works fine there, it will be familiar to C users and using a different syntax doesn't seem to give any benefit.

asterite on 26 Mar 2019

👍3

Yes, let's keep it simple.

straight-shoota on 26 Mar 2019

@malte-v well, compiler will say it anyway if anybody will try to use it with a method.
But @var looks like expression, while :var clearly points that it's a member name, not a value.

konovod on 26 Mar 2019

Ruby uses :@var in some places, such as instance_var_get(:@foo). It emphasizes that it's not the actual ivar (because it's a symbol) but still refers to the ivar @foo.