Crystal: Casting doesn't work with generic classes and recursive type aliases/unions

Created on 8 Nov 2016  路  9Comments  路  Source: crystal-lang/crystal

A great example of this is JSON::Type. It is defined as an alias to the Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type) union.

This works:

1_i64.as(JSON::Type)

This works, too:

[1_i64.as(JSON::Type), 2_i64.as(JSON::Type)].as(JSON::Type)

However, this doesn't work:

{"a" => [1_i64, 2_i64]}.as(JSON::Type)

This doesn't error, but segfaults if you try to run any method on it:

[1_i64, 2_i64].as(JSON::Type)

@RX14 Has told me that this is a limitation of generic classes to union casts (as it needs to change the binary representation), posting this for the purposes of documentation and discussion.

bug compiler

Most helpful comment

Array(Int64) is internally represented with these fields: a size, a capacity, and a pointer to the data. In llvm terms, this will be struct { i32, i32, i64* }. However, Array(Type) will have a pointer to Type, which is a type union. A union is represented in LLVM as {i32, void*} (this isn't actually valid LLVM), but the point is that a union is composed of two parts: a type id (the i32) and the data, which has a size that is the maximum size of each elements in the union. So the final array type, in LLVM, is something like: struct { i32, i32, {i32, void*}*}.

Now, we have:

struct { i32, i32, i64* }         ; Array(Int64)
struct { i32, i32, {i32, void*}*} ; Array(Type)

Casting would mean just having the compiler treat the first raw bytes as the second type, which is not correct: the contents pointed by an i64* are not the same as in {i32, void*}*. So you'd need to transform the data, something that a cast doesn't do. A cast just reinterprets underlying bytes as a different type.

Now, this is a huge and complex explanation, and this is why I think casts and unions are not as easy and nice as I originally thought. I'd like to avoid having to learn that, but right now you have to learn it to understand how to program well in Crystal. Basically, in a typed language the binary representation of things is something of a big importance, and something that is usually not a problem in dynamic languages.

All 9 comments

This:

{"a" => [1_i64, 2_i64]}

is typed by Crystal to be Hash(String, Array(Int64)), which isn't any of the types listed in the alias JSON::Type. That's why the cast can't be done. The same applies to:

[1_i64, 2_i64] # Array(Int64)

however it seems a compiler bug is letting that compile fine.

Unions, big aliases and casting is a huge source of confusion and pain in the language, I'd like to somehow improve that, I just don't know how yet.

1_i64 is in the type union (Int64).
[1_i64, 2_i64] is in the type union: Array(Type), so Array(Int64).
"a" is a String.
{"a" => [1_i64, 2_i64]} is in the type union: Hash(String, Type). Array(Type) is a Type. Int64 is a type, thus Hash(String, Type) should match Hash(String, Array(Int64)).

Array(Int64) is internally represented with these fields: a size, a capacity, and a pointer to the data. In llvm terms, this will be struct { i32, i32, i64* }. However, Array(Type) will have a pointer to Type, which is a type union. A union is represented in LLVM as {i32, void*} (this isn't actually valid LLVM), but the point is that a union is composed of two parts: a type id (the i32) and the data, which has a size that is the maximum size of each elements in the union. So the final array type, in LLVM, is something like: struct { i32, i32, {i32, void*}*}.

Now, we have:

struct { i32, i32, i64* }         ; Array(Int64)
struct { i32, i32, {i32, void*}*} ; Array(Type)

Casting would mean just having the compiler treat the first raw bytes as the second type, which is not correct: the contents pointed by an i64* are not the same as in {i32, void*}*. So you'd need to transform the data, something that a cast doesn't do. A cast just reinterprets underlying bytes as a different type.

Now, this is a huge and complex explanation, and this is why I think casts and unions are not as easy and nice as I originally thought. I'd like to avoid having to learn that, but right now you have to learn it to understand how to program well in Crystal. Basically, in a typed language the binary representation of things is something of a big importance, and something that is usually not a problem in dynamic languages.

@asterite So are you saying that casting has no way to resolve a recursive type alias?

Somehow the compiler needs to be able to expand, e.g.

alias Type = Int64 | Array(Type)

into

alias Type = Int64 | Array(Int64 | Array(Type))

which of course further expands to

alias Type = Int64 | Array(Int64 | Array(Int64 | Array(Type)))

Somehow casting needs to be smart enough to handle this.

@trans alias has nothing to do with this. The rule is if the binary representation of the type _inside_ the generic has to change, casting fails.

As a note:

This:

alias Type = Int64 | Array(Int64 | Array(Int64 | Array(Type)))

Is not equivalent to this:

alias Type = Int64 | Array(Int64) | Array(Array(Int64))) | ...

The first one can store an array that have both ints and arrays as elements, such as this: [1_i64, [5_i64]], where the later can't.

@lbguilherme Oops. You're right, I removed it.

@RX14 The alias was just the means of creating the union. I should have used the word union instead.

Does as do casting? I didnt realize as actually changed anything. I thought it just garunteed intersection of a type set for the compiler.

as as of now doesn't really cast anything, all it does, from a semantic view point is narrowing or widening type unions. For example, you can do 1.as(Int32|String) or (rand < 0.5 ? "aa" : 5).as(Int32). It will not try to arrange inner data in a different way or try to go deeper into the type. Casting the generic type of an array will require a map.

[1, 2].as(Array(String|Int32)) # not ok
[1, 2].map(&.as(String|Int32)) # ok

as doesn't cast in the way that you can cast an int32 to an int64, but (as far as I know, maybe @asterite can clarify) using as will change the memory representations of unions (or remove the union). You can see this when using sizeof() with unions of ints of different widths, or other value types. I would assume that the compiler assumes there's only one single binary representation of a type when its doing its codegen phase, so as has to transform between those. Note that it doesn't change the value of the data at all, just how their "slot" in memory is represented. I'm not sure how classes (cast to supertype/subtype) work.

The problem comes with generic types. There's no way to change the binary representation of types which are in a generic, or even verify that all the types in a generic container are correct. For example, an array lives on the heap and its contents can be inserted with any type even after you use as. This is why you have to use as in map when you are "casting" an array. It's then a copy which isn't modifiable by another thread. Maybe there should be a #copy_as method in array which makes the casting less verbose.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

relonger picture relonger  路  3Comments

asterite picture asterite  路  3Comments

RX14 picture RX14  路  3Comments

will picture will  路  3Comments

cjgajard picture cjgajard  路  3Comments