Crystal: Semi-formal specification of the new "global" type inference algorithm

Created on 30 Mar 2016  路  14Comments  路  Source: crystal-lang/crystal

This is a follow up to #1824 that explains a bit more formally how would the new "global" type inference algorithm will work.

The overall idea is that before having to resolve method calls and instantiate method definitions the compiler gets to know the type of all global, class and instance variables. Of course right now that's not the case. Note that local variables and method arguments don't necessary have to be type annotated: this only affects global, class and instance variables.

The most obvious way to declare the type of such variables is by using a straight type annotation. For example:

class Foo
  @x : Int32
end

This tells the compiler that Foo's @x is of type Int32. However, with the above code if we do:

Foo.new

the compiler will complain saying something like "@x wasn't initialized in all of Foo's initialize methods, rendering it nil, and that doesn't match the type Int32 specified". That is, checking that later @x is effectively Int32 is something that the compiler will still do.

If you don't want to explicitly declare the type of a variable you have other ways. The syntactic rules are:

  1. @x = <literal>, where <literal> is a ilteral such as 1, nil, [] of Int32 or "foo". In that case the variable is inferred to have the type of such literal.
  2. @x = T.new(...). @x is inferred to have the type T.
  3. @x = var, where var is a method argument that has a restriction of type T. @x is inferred to have the type T.
  4. @x = var where var is a method argument that has a default value that matches rules 1 or 2. In that case the type is inferred from the default argument.
  5. @x = LibFoo.function where LibFoo is a lib type. @x is inferred to have the return type of that function. This is easy to do because there's no need to instantiate methods and the types are explicitly declared in C bindings.
  6. LibFoo.function(out @x), where LibFoo is a lib type. @x is inferred to have the type of the function argument in that position minus a pointer type (for example if the argument is Int32* then @x is inferred to be Int32).

Note that the final inferred type will be the union of all cases where those rules are applied. Additionally, the Nil type is added for instance vars if they are not initialized in all initialize methods, or for global/class vars if they are not initialized at all.

Some examples with instance variables of the above rules:

# Rule 1
class Foo
  # inferred to be Int32
  @x = 1
end
# Rule 1
class Foo
  def initialize
    # inferred to be Int32
    @x = 1
  end
end
# Rule 2
class Foo
  def initialize
    # inferred to be MemoryIO
    @x = MemoryIO.new
  end
end
# Rule 3
class Foo
  def initialize(io : IO)
    # inferred to be IO
    @io = io
  end
end

# This is also valid
class Foo
  # inferred to be IO
  def initialize(@io : IO)
    # the compiler actually rewrites this to code in the above snippet
  end
end
# Rule 4
class Foo
  def initialize(x = 1)
    # inferred to be Int32
    @x = x
  end
end

# Also valid
class Foo
  def initialize(@x = 1)
    # inferred to be Int32
  end
end

# Also valid
class Foo
  def initialize(@x = MemoryIO.new)
    # inferred to be MemoryIO
  end
end
# Rule 5
lib LibFoo
  fun foo : Int32
end

class Foo
  def initialize
    # inferred to be Int32
    @x = LibFoo.foo
  end
end
# Rule 6
lib LibFoo
  fun foo(out x : Int32*)
end

class Foo
  def initialize
    # inferred to be Int32
    LibFoo.foo(out @x)
  end
end

An example with multiple types:

class Foo
  def initialize
    @x = "hello"
  end

  def initialize(@x : Int32)
  end

  # @x is inferred to be String | Int32
end

An example with methods other than initialize:

class Foo
  def lazy
    # inferred to be Array(Int32)?
    @lazy ||= [] of Int32

    # The above is syntax sugar for:
    #
    #     @lazy || (@lazy = [] of Int32)
    #
    # so the rule is applied from the right-hand side.
    # `Nil` is added because it's not initialized in all `initialize`
    # or in a catch-all initialization
  end
end

An example with class variables:

class Foo
  # Inferred to be Int32 because of Rule 1
  @@x = 1
end
class Foo
  # inferred to be Int32? because of rule 3
  def self.x=(@@x : Int32)
  end
end

The idea of this behaviour is to have to add as few type annotations as possible. Or, put another way, don't have to put redundant type annotations.

However, type annotations are necessary if instance/class/global variables don't match the above rules. For example:

class Foo
  def initialize(argument)
    # No way to infer syntactically the type of @x
    @x = argument.call
  end

  # so we have to use a type annotation
  @x : SomeType
end

In general, existing code will probably need to add a few type annotations like the above one, and some more type annotations in initialize arguments. This will have the added benefit that you won't be able to construct an object with an invalid type and later find out, with some obscure error, that an instance variable was mis-used.

We will try to make error messages as clear as possible when a type couldn't be inferred because none of the rules where matched, maybe also including a link to the above rules or directly saying them in the error message.

This is the most probable path we will take in upcoming releases, but of course we'd like to know your opinion.

accepted compiler

Most helpful comment

@lbguilherme Yes. There's an idea of restricting T.new methods to always return T to avoid this confusion but it's not decided yet.

All 14 comments

In the last example, would annotating argument as something like Proc(Int32) work, causing @x to be Int32?

@kirbyfan64 Nope, because that would mean having to resolve the call method. Of course Proc is kind of a built-in type, but the call method is just a regular method... so no. The rules have to be very simple.

Following these rules, for the following snippet:

class Node
  def initialize(@key : Int32)
    @priority = calculate_from_key
  end

  private def calculate_from_key
    if @key > 10
      1_000
    else
      "low"
    end
  end
end

It will require me indicate that @priority is of types Int32 | String upfront, correct? Assume this since type cannot be inferred from a Call node?

@luislavena Yes, exactly. Anything that's @x = ... that doesn't follow one of the rules I described above will need an explicit type annotation. The idea is that no method instantiation (method type checking) needs to be done in order to know the type of global/class/instance variables.

Another example:

class Node
  def initialize
    x = 1
    # The compiler *won't* deduce that @x is an Int32
    # because it doesn't follow one of the above rules
    @x = x
  end
end

The simplicity and limitations of inference are a pro in my opinion, it makes the final type of members clearer than it would have been with deeper reaching inference. +1 :-)

@asterite excellent, thank you! Rules defined here makes sense!

class A
  def self.new
    "foo"
  end
end

$x = A.new

What would happen here? $x would be inferred as A, and then the compiler would error later trying to put a String into a A variable?

@lbguilherme Yes. There's an idea of restricting T.new methods to always return T to avoid this confusion but it's not decided yet.

I agree this would be better for self-consistency. But one big issue from stdlib:

abstract class Channel
  def self.new
    Unbuffered(T).new
  end

  def self.new(capacity)
    Buffered(T).new(capacity)
  end
end

@lbguilherme Not an issue, because Channel.new returns a Channel. Not exactly a Channel, but it's compatible with it (a subclass). So the rule about T.new is actually making the variable be "T or any of its subclasses", also known as "T+", but that would be transparent for the programmer.

@asterite Great!!

Redefining new is done with care and not whimsically, so I don't see a good reason for restricting its' potential superficially really.

Redefining new is done with care and not whimsically

That alone sounds like a perfect reason to ensure .new returns T or any of its subclasses (aka: Principle of least surprise).

:smile:

There's "common programs" and there's DSL-scenarios, where some trickery can always be of benefit ;-)

Was this page helpful?
0 / 5 - 0 ratings