Crystal: What to do with struct inheritance

Created on 28 Apr 2016 · 15Comments · Source: crystal-lang/crystal

Right now struct inheritance is possible. For example:

struct Point2
  getter x, y

  def initialize(@x = 0, @y = 0)
  end
end

struct Point3 < Point2
  getter x, y, z

  def initialize(x = 0, y = 0, @z = 0)
    super(x, y)
  end
end

Regardless of whether the above makes semantic sense (let's not discuss this), struct inheritance has a big issue.

You'd expect Point2 to have two members, x : Int32 and y : Int32, so its total size is 8 bytes. You'd expect Point3 to have three members and a total size of 12 bytes. That currently is like that, no problem so far.

However, let's say we have an array of Point2:

points = [] of Point2

An array has a size, capacity and a buffer holding the elements. You'd expect the buffer to have Point2 elements next to each other, and since each Point2 occupies 8 bytes they'll be next to each other, so every 8 bytes would belong to a different Point2 object. However...

points = [] of Point2
points << Point3.new

The above is valid, because Point3 < Point2. But Point3 occupies 12 bytes! Not only that, but if we want to know whether we have a Point2 or Point3 inside the array's buffer we also need to store something else: the type id. Suppose we use an Int32 for the type id. In total every element inside the array's buffer to be 16 bytes long. This is probably unexpected, because if Point2 occupies 8 bytes and I want an array of them, I don't want them to suddenly occupy double space (or more, imagine someone extends Point3 with more fields).

C# has structs too, and they don't support inheritance. The reason is basically what I just explained (in the official docs they also mention structs can't be inherited).

Now, C# allows compiling a library and then linking against it. The generated code must know in advance the size of a struct inside an array. If you let someone extend a struct then the generated code will be wrong.

In Crystal we compile a program by analyzing all its source code, so we can effectively know whether a struct was inherited and make an array of such structs occupy the necessary size. However, if we go that route it will mean that in the future we will never be able to generate small libraries and link them together separately. Still, the issue of an array of structs being bigger than what you'd expect is a separate issue that should be given some thought.

So, we must decide what to do with this. We have these choices:

Disallow struct inheritance. The exception would be primitive-like types like Value, Struct and Int, etc., which kind of use inheritance (it's mostly conceptual here) but you can't have an array of Value, Struct or Int. In the future you'll anyway be able to have an array of Object, and if you put a struct inside it, it will be boxed.
Allow struct inheritance but only for abstract classes. Non-abstract classes won't be inheritable. The Point2/Point3 example won't compile. An array of a non-abstract struct will only contain such structs so the size of the buffer elements can be known in advance. For the case of an abstract struct, and in order to provide binary compatibility, we'll probably want to represent it as a {type_id, pointer_to_value} in memory to provide binary compatibility (now that I think of it, modules should probably also be represented in this way).
Allow full struct inheritance. This will mean that if you have an array of struct Foo you can't expect the size to be what you expect because someone might inherit your struct. We could add a final annotation to prevent that. But compiling a piece of code and linking it later won't be possible anymore.

I personally would like 1 to happen. A struct is meant to be used when you either have an immutable type, or a kind of type for which you care about its memory size and know its layout. You wouldn't want its size to change under the hood.

It also simplifies the language and its implementation. If you need an abstract struct from which you have many inherited structs and then you need an array of such abstract struct, you can use a module instead. Or use an alias of all struct types (an alias is better to prevent adding more types being added, because a module can be included in other types too).

In case you wonder, for reference types (classes) this isn't a problem because a class is represented as a pointer to which data is stored. The first member behind that pointer is a type_id. That's why unions of class types, and even unions of class types and nil, can be represented with a single pointer, so the elements of an array of such types always occupy the size of a pointer (nil is encoded as the null pointer).

Also note that right now the compiler doesn't handle well an array, or proc, of a struct type when it has subtypes. #2382 and #2527 are cases of this.

This issue is not new to me and @waj, but it's time we take a decision.

/cc @waj @bcardiff

Source

asterite

Most helpful comment

There is also the option of allowing inheritance, but don't allowing polymorphism. For example:

points = [] of Point2 # 8 bytes per element
points << Point3.new(1,2,3)  # Error! The << method takes a Point2, can't cast.

This is similar to what C++ does, something of the type of a struct takes exactly its size in memory, nothing more, and can't hold any other type. You can use polymorphism only with reference types. There is already differences in how structs and classes behave, I think this one makes sense here.

lbguilherme on 28 Apr 2016

👍3

All 15 comments

I'd be happy with 1, espically assuming using included modules as type restrictions still work

module M
end

struct A
  include M
end

struct B
  include M
end

a = Array(M).new
a << A.new
a << B.new

pp a
puts typeof(a)

will on 28 Apr 2016

@will Yes, that works and will continue to work :-)

asterite on 28 Apr 2016

I also prefer 1. Composition over inheritance 👍

sdogruyol on 28 Apr 2016

There is also the option of allowing inheritance, but don't allowing polymorphism. For example:

points = [] of Point2 # 8 bytes per element
points << Point3.new(1,2,3)  # Error! The << method takes a Point2, can't cast.

lbguilherme on 28 Apr 2016

👍3

Maybe it means Point3.new.is_a?(Point2) would return false?

lbguilherme on 28 Apr 2016

Of course I forgot to say that there could be other solutions to this 😊

@lbguilherme I didn't know C++ does that. I think we shouldn't copy that behaviour, it would be very confusing if you could inherit a struct but it didn't act in a polymorphic way. I'd prefer to disallow inheritance in that case and resort to a base module.

asterite on 28 Apr 2016

👍1

I prefer 1, as long as, already mentioned, structs can be built up from partials (via modules).

Worth considering is something struct-union-like, as an additional type-storage-style, which of course then would need the type_id in first position, but then it would be a known fact.

ozra on 28 Apr 2016

I think @lbguilherme's solution is nice; basically composition with sugar. This would seem somewhat analogous to Go's struct embedding, which is very useful.

A downside that @asterite mentioned is that if the exact same syntax is used for structs and classes, one may assume the one will behave like the other. So a modification of both the syntax and the terminology would be important.

It think this is clearer than using modules as suggested as a workaround, and feels more like a lower-level, non-abstracted type, which I think is the point of a struct in Crystal in the first place.

Perelandric on 30 Apr 2016

I really like Go's struct embedding. I think in a way it's similar to D's alias this. It's like method_missing but the compiler forwards the method to the aliased type(s) for you, and even gives an error if a method is found in more than one alias. In Go this is similar. In Crystal one could use method_missing but you'd have to check if the types respond to the method and then do the call, otherwise give a compile error. But I think this is pretty common and needed so I'd maybe like to introduce such feature in Crystal, with a built-in syntax (don't know how yet). It's very useful for mocking and for abstracting stuff. But of course I have to check this with @waj and others.

If we do that, I think struct inheritance won't be needed at all.

asterite on 30 Apr 2016

+1 on the 1 option

benoist on 2 May 2016

+1 for option 1. Modules and, maybe in the future, aliases should be better ways to factor the code.

The motto for struct are their value-type and stack alloc. Since inheritance won't play nice with that for the exposed reason, then it should be disallowed.

bcardiff on 2 May 2016

We finally decided to keep struct inheritance but only from abstract structs (solution number 2). Non-abstract structs won't be inheritable, but with an abstract struct you could steel model a hierarchy and use it as an array element. This also goes nicely with Value, Number, Int and Float, all of which are abstract structs and have subtypes. And it will also automatically make all primitive types like Nil, Int32, Char and Symbol non-inheritable.

asterite on 2 May 2016

👍1

Revisiting this issue after thinking about it; glad to see the same conclusions already arrived from you guys: agree completely with the "abstract is inheritable".

ozra on 3 May 2016

Point 2 is now implemented: non-abstract structs can't be inherited, and arrays of abstract structs, and proc with abstract structs as arguments, and casting to a base abstract struct work well.

After implementing this I realized it's very simple to allow non-abstract structs to be inherited. The compiler knows if a struct is inherited so it could represent such type well (with a type id and enough space for all children). However, with this we could inherit Int32 and other primitive types, and suddently all [] of Int32 in your program would take a lot more space. That's not nice. And I think that's not nice for any kind of non-abstract struct, you want memory layout guarantees. Well, we could add a final annotation, but that makes the language more complex and it's something more to think about.

So in the end choosing point 2 is the best option: you can use a hierarchy with intermediate abstract structs, you can use is_a?, you can reuse code, you get polymorphism and you can memory layout guarantees.

asterite on 3 May 2016

Reopening because something's missing in the implementation.

asterite on 3 May 2016

Was this page helpful?

0 / 5 - 0 ratings