Crystal: Incorrect overload selected for Array(Array(T)) vs Array(T)

Created on 29 Mar 2020 · 15Comments · Source: crystal-lang/crystal

def foo(a : Array(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = [5].as(Array(Array(Int32)) | Array(Int32))
foo(z)

Prints "two" but should print "one". I'm still not exactly sure why this happens.

bug topicsemantic

Source

asterite

Most helpful comment

The problem is that this is a bug where the problem is not the code. We need to define how we want it to work first.

Here's the code again:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = Gen(Int32).new.as(Gen(Gen(Int32)) | Gen(Int32))

foo(z)

The type of z is a union of:

Gen(Gen(Int32))
Gen(Int32)

When the compiler tries to match that union type against Gen(T) it will do this:

Take the first type in the union: Gen(Gen(Int32)). It matches against Gen(T). T becomes Gen(Int32).
The the second type in the union: Gen(Int32). Try to match it against Gen(T). However, we said T is Gen(Int32). So it's matching against Gen(Gen(Int32)). No match.

The only match is the first one, Gen(Gen(Int32)), where T is Gen(Int32).

Then the next overload is considered, and the entire union matches.

That's why we are seeing that behavior.

How it should work? Should T become the union of those types? But if T is Gen(Int32) | Int32 it would mean we had Gen(Gen(Int32) | Int32) in the first place, which is not the case.

We basically need to dispatch the union type to two different instantiations of the same overload. But the code never does that in any place of the compiler.

So this is a huge change, if we want to implement it.

Or... other ideas of how this should match?

asterite on 31 Mar 2020

👍3

All 15 comments

Without the dependency on Array:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = Gen(Int32).new.as(Gen(Gen(Int32)) | Gen(Int32))

foo(z)

asterite on 29 Mar 2020

Not sure if it's relevant, but the bug appears inconsistent.

def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end

def foo(a : T) forall T
  p "second call for #{a} of class #{a.class}"
  p a
end  

foo([1, [2, [[3]], [4, [[5]]]]])

"first call for 1 of class Int32"
"second call for 1 of class Int32"
1
"first call for [2, [[3]], [4, [[5]]]] of class Array(Array(Array(Array(Int32)) | Int32) | Array(Array(Int32)) | Int32)"
"first call for 2 of class Int32"
"second call for 2 of class Int32"
2
"first call for [[3]] of class Array(Array(Int32))"
"second call for [[3]] of class Array(Array(Int32))"
[[3]]
"first call for [4, [[5]]] of class Array(Array(Array(Int32)) | Int32)"
"first call for 4 of class Int32"
"second call for 4 of class Int32"
4
"first call for [[5]] of class Array(Array(Int32))"
"first call for [5] of class Array(Int32)"
"first call for 5 of class Int32"
"second call for 5 of class Int32"
5

stevensonmt on 29 Mar 2020

I'm not enough of a mathematician to do this correctly, but bear with me.
So numbers belong to the set N.
[[3]] and [[5]] belong to the same set or type A for N ⊂ A.
[2, [[3]]] and [4, [[5]]] would also belong to the same set or type B for which (A | N) ⊂ B.
So the array [2, [[3]], [4, [[5]]] is a collection C which is B(N | B). When the compiler sees this pattern it determines that N | B is the unit type T. When calling the first method on C it will call the second method on any B that is not B(B). Because [[3]] is A and is not B, the second method is called. Because [4, [[5]]] is B, the first method is called. For this instance the unit type T is no longer N | B, but rather N | A. Because A is defined as containing N but not itself, N is the unit type T for the first method called on A.

If any of that makes sense and is factual, the compiler's actions are surprising but correct. The expected output would only be possible if the compiler waits to define the unit type T until all possible recursion is done.

stevensonmt on 30 Mar 2020

It might be a correct guess but there's no recursion going on.

It works wrong because we have a bug in our code.

asterite on 30 Mar 2020

Recursion might be the wrong term for the algorithm process, but it does seem to have something to do with the recursive union type.
B(A|B) triggers it when C(A|B) does not.

stevensonmt on 30 Mar 2020

Doing some old-fashioned puts debugging in the compiler and discovered this when I ran it against @stevensonmt's example code:

Crystal::Matches(
 @cover=true,
 @matches=
  [#<Crystal::Match:0x10cf32330
    @arg_types=[Array(Array(Int32))],
    @context=
     #<Crystal::MatchContext:0x10cf52900
      @def_free_vars=["T"],
      @defining_type=main,
      @free_vars={"T" => Array(Int32)},
      @instantiated_type=main,
      @strict=false>,
    @def=
     def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p("first call for #{a} of class #{a.class}")
    foo(a)
  end
end,
    @named_arg_types=nil>,
   #<Crystal::Match:0x10cf32300
    @arg_types=[(Array(Array(Int32)) | Int32)],
    @context=
     #<Crystal::MatchContext:0x10cf52800
      @def_free_vars=["T"],
      @defining_type=main,
      @free_vars={"T" => (Array(Array(Int32)) | Int32)},
      @instantiated_type=main,
      @strict=false>,
    @def=
     def foo(a : T) forall T
  p("second call for #{a} of class #{a.class}")
  p(a)
end,
    @named_arg_types=nil>],
 @owner=main,
 @success=true)

Crystal::Type#lookup_matches_without_parents is correctly matching Array(Array(Int32)) (the [[3]]) to the first method but also lumping it in with Int32 in a second match. I haven't figured out how that match is used yet, but it seems to be choosing the second match somewhere.

jgaskins on 31 Mar 2020

Interesting bug, if I try this: https://carc.in/#/r/8t59

def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end

def foo(a : Array(Array(Int32)))
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end

def foo(a : T) forall T
  p "second call for #{a} of class #{a.class}"
  p a
end  

foo([1, [2, [[3]], [4, [[5]]]]])

Then I get this:

"first call for 1 of class Int32"
"second call for 1 of class Int32"
1
"first call for [2, [[3]], [4, [[5]]]] of class Array(Array(Array(Array(Int32)) | Int32) | Array(Array(Int32)) | Int32)"
"first call for 2 of class Int32"
"second call for 2 of class Int32"
2
"first call for [[3]] of class Array(Array(Int32))"
"first call for [3] of class Array(Int32)"
"first call for 3 of class Int32"
"second call for 3 of class Int32"
3
"first call for [4, [[5]]] of class Array(Array(Array(Int32)) | Int32)"
"first call for 4 of class Int32"
"second call for 4 of class Int32"
4
"first call for [[5]] of class Array(Array(Int32))"
"first call for [5] of class Array(Int32)"
"first call for 5 of class Int32"
"second call for 5 of class Int32"
5

faustinoaq on 31 Mar 2020

Looks like the bug is somewhere in this file. Haven't found it yet. I'll look some more after work today.

jgaskins on 31 Mar 2020

The problem is that this is a bug where the problem is not the code. We need to define how we want it to work first.

Here's the code again:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = Gen(Int32).new.as(Gen(Gen(Int32)) | Gen(Int32))

foo(z)

The type of z is a union of:

Gen(Gen(Int32))
Gen(Int32)

When the compiler tries to match that union type against Gen(T) it will do this:

Take the first type in the union: Gen(Gen(Int32)). It matches against Gen(T). T becomes Gen(Int32).
The the second type in the union: Gen(Int32). Try to match it against Gen(T). However, we said T is Gen(Int32). So it's matching against Gen(Gen(Int32)). No match.

The only match is the first one, Gen(Gen(Int32)), where T is Gen(Int32).

Then the next overload is considered, and the entire union matches.

That's why we are seeing that behavior.

How it should work? Should T become the union of those types? But if T is Gen(Int32) | Int32 it would mean we had Gen(Gen(Int32) | Int32) in the first place, which is not the case.

We basically need to dispatch the union type to two different instantiations of the same overload. But the code never does that in any place of the compiler.

So this is a huge change, if we want to implement it.

Or... other ideas of how this should match?

asterite on 31 Mar 2020

👍3

This makes me question whether I understand forall. My mental model is that it instantiates a new overload for every single type it was called with, including splitting unions. For example:

def foo(a : T) forall T
end

foo [1, "b"].first

In this example, it seems that there would be 2 overloads for foo — one for Int32 and one for String. Is that correct or is there just one for Int32 | String?

jgaskins on 31 Mar 2020

@jgaskins there is just one, for the union type.

It works exactly the same way without forall. forall doesn't change which instantiations occur, it just increases the expressiveness of type restrictions.

RX14 on 31 Mar 2020

👍2

Exactly. forall just lets you talk about that T and use it in code later on. But it doesn't change how things match.

That said, in my mind forall should also expand union types and provide one overload for each, but that's not the case. And maybe we shouldn't do it. Should we do it for each subclass of a type hierarchy too?

Another thing to consider:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  p T
end

foo(Gen(Int32).new || Gen(Char).new)

Currently fails:

Error: no overload matches 'foo' with type (Gen(Char) | Gen(Int32))

Overloads are:
 - foo(a : Gen(T))
Couldn't find overloads for these types:
 - foo(a : Gen(Int32))

It's because Gen(Char) matches, T becomes Char, then Gen(Int32) doesn't match and there's no overload to cover that.

The entire situation is very confusing.

My advice is to avoid using forall if not needed. In the example in the forums the T wasn't needed at all.

asterite on 31 Mar 2020

The type of z is a union of:

Gen(Gen(Int32))

Gen(Int32)

When the compiler tries to match that union type against Gen(T) it will do this:

Take the first type in the union: Gen(Gen(Int32)). It matches against Gen(T). T becomes Gen(Int32).

The the second type in the union: Gen(Int32). Try to match it against Gen(T). However, we said T is Gen(Int32). So it's matching against Gen(Gen(Int32)). No match.

The only match is the first one, Gen(Gen(Int32)), where T is Gen(Int32).

Then the next overload is considered, and the entire union matches.

That's why we are seeing that behavior.

How it should work? Should T become the union of those types? But if T is Gen(Int32) | Int32 it would mean we had Gen(Gen(Int32) | Int32) in the first place, which is not the case.

We basically need to dispatch the union type to two different instantiations of the same overload. But the code never does that in any place of the compiler.

So this is a huge change, if we want to implement it.

Or... other ideas of how this should match?

Thanks for the clear delineation of the problem. I never expected to uncover such a fundamental bug. How important is this corner case?

stevensonmt on 31 Mar 2020

How important is this corner case?

There's lucky framework, amber, athena and thousand of shards out there. We have a compiler. There's the standard library with a lot of features.

It's a bug but it's not preventing people from creating wonderful things. I think this is a very low priority bug. It's pretty rare to have to match against Array(T) and wanting to get that T. It's a bit leaky. And if you really need to do that, you can do it by doing typeof(array.first) or something like that too. So there are workarounds.

asterite on 31 Mar 2020

😕2 👍2

It works exactly the same way without forall. forall doesn't change which instantiations occur, it just increases the expressiveness of type restrictions.

Exactly. forall just lets you talk about that T and use it in code later on. But it doesn't change how things match.

Ahh, okay, similar to the idea of &block vs yield (purposely ignoring the block inlining for yield for this analogy), where they both match the exact same way but the &block version gives you a reference to the block itself but there's only one instantiation of the &block overload?

jgaskins on 31 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

spec --location to work on any line in indvidual spec

will · 3Comments

Float JSON representation is wrong

RX14 · 3Comments

Unify structs and C structs

oprypin · 3Comments

Allow accessing named tuples with strings

costajob · 3Comments

Allow type annotation + initialization `@var : Type = value`

oprypin · 3Comments