This is a follow up to #1824 that explains a bit more formally how would the new "global" type inference algorithm will work.
The overall idea is that before having to resolve method calls and instantiate method definitions the compiler gets to know the type of all global, class and instance variables. Of course right now that's not the case. Note that local variables and method arguments don't necessary have to be type annotated: this only affects global, class and instance variables.
The most obvious way to declare the type of such variables is by using a straight type annotation. For example:
class Foo
@x : Int32
end
This tells the compiler that Foo's @x is of type Int32. However, with the above code if we do:
Foo.new
the compiler will complain saying something like "@x wasn't initialized in all of Foo's initialize methods, rendering it nil, and that doesn't match the type Int32 specified". That is, checking that later @x is effectively Int32 is something that the compiler will still do.
If you don't want to explicitly declare the type of a variable you have other ways. The syntactic rules are:
@x = <literal>, where <literal> is a ilteral such as 1, nil, [] of Int32 or "foo". In that case the variable is inferred to have the type of such literal.@x = T.new(...). @x is inferred to have the type T.@x = var, where var is a method argument that has a restriction of type T. @x is inferred to have the type T.@x = var where var is a method argument that has a default value that matches rules 1 or 2. In that case the type is inferred from the default argument.@x = LibFoo.function where LibFoo is a lib type. @x is inferred to have the return type of that function. This is easy to do because there's no need to instantiate methods and the types are explicitly declared in C bindings.LibFoo.function(out @x), where LibFoo is a lib type. @x is inferred to have the type of the function argument in that position minus a pointer type (for example if the argument is Int32* then @x is inferred to be Int32).Note that the final inferred type will be the union of all cases where those rules are applied. Additionally, the Nil type is added for instance vars if they are not initialized in all initialize methods, or for global/class vars if they are not initialized at all.
Some examples with instance variables of the above rules:
# Rule 1
class Foo
# inferred to be Int32
@x = 1
end
# Rule 1
class Foo
def initialize
# inferred to be Int32
@x = 1
end
end
# Rule 2
class Foo
def initialize
# inferred to be MemoryIO
@x = MemoryIO.new
end
end
# Rule 3
class Foo
def initialize(io : IO)
# inferred to be IO
@io = io
end
end
# This is also valid
class Foo
# inferred to be IO
def initialize(@io : IO)
# the compiler actually rewrites this to code in the above snippet
end
end
# Rule 4
class Foo
def initialize(x = 1)
# inferred to be Int32
@x = x
end
end
# Also valid
class Foo
def initialize(@x = 1)
# inferred to be Int32
end
end
# Also valid
class Foo
def initialize(@x = MemoryIO.new)
# inferred to be MemoryIO
end
end
# Rule 5
lib LibFoo
fun foo : Int32
end
class Foo
def initialize
# inferred to be Int32
@x = LibFoo.foo
end
end
# Rule 6
lib LibFoo
fun foo(out x : Int32*)
end
class Foo
def initialize
# inferred to be Int32
LibFoo.foo(out @x)
end
end
An example with multiple types:
class Foo
def initialize
@x = "hello"
end
def initialize(@x : Int32)
end
# @x is inferred to be String | Int32
end
An example with methods other than initialize:
class Foo
def lazy
# inferred to be Array(Int32)?
@lazy ||= [] of Int32
# The above is syntax sugar for:
#
# @lazy || (@lazy = [] of Int32)
#
# so the rule is applied from the right-hand side.
# `Nil` is added because it's not initialized in all `initialize`
# or in a catch-all initialization
end
end
An example with class variables:
class Foo
# Inferred to be Int32 because of Rule 1
@@x = 1
end
class Foo
# inferred to be Int32? because of rule 3
def self.x=(@@x : Int32)
end
end
The idea of this behaviour is to have to add as few type annotations as possible. Or, put another way, don't have to put redundant type annotations.
However, type annotations are necessary if instance/class/global variables don't match the above rules. For example:
class Foo
def initialize(argument)
# No way to infer syntactically the type of @x
@x = argument.call
end
# so we have to use a type annotation
@x : SomeType
end
In general, existing code will probably need to add a few type annotations like the above one, and some more type annotations in initialize arguments. This will have the added benefit that you won't be able to construct an object with an invalid type and later find out, with some obscure error, that an instance variable was mis-used.
We will try to make error messages as clear as possible when a type couldn't be inferred because none of the rules where matched, maybe also including a link to the above rules or directly saying them in the error message.
This is the most probable path we will take in upcoming releases, but of course we'd like to know your opinion.
In the last example, would annotating argument as something like Proc(Int32) work, causing @x to be Int32?
@kirbyfan64 Nope, because that would mean having to resolve the call method. Of course Proc is kind of a built-in type, but the call method is just a regular method... so no. The rules have to be very simple.
Following these rules, for the following snippet:
class Node
def initialize(@key : Int32)
@priority = calculate_from_key
end
private def calculate_from_key
if @key > 10
1_000
else
"low"
end
end
end
It will require me indicate that @priority is of types Int32 | String upfront, correct? Assume this since type cannot be inferred from a Call node?
@luislavena Yes, exactly. Anything that's @x = ... that doesn't follow one of the rules I described above will need an explicit type annotation. The idea is that no method instantiation (method type checking) needs to be done in order to know the type of global/class/instance variables.
Another example:
class Node
def initialize
x = 1
# The compiler *won't* deduce that @x is an Int32
# because it doesn't follow one of the above rules
@x = x
end
end
The simplicity and limitations of inference are a pro in my opinion, it makes the final type of members clearer than it would have been with deeper reaching inference. +1 :-)
@asterite excellent, thank you! Rules defined here makes sense!
class A
def self.new
"foo"
end
end
$x = A.new
What would happen here? $x would be inferred as A, and then the compiler would error later trying to put a String into a A variable?
@lbguilherme Yes. There's an idea of restricting T.new methods to always return T to avoid this confusion but it's not decided yet.
I agree this would be better for self-consistency. But one big issue from stdlib:
abstract class Channel
def self.new
Unbuffered(T).new
end
def self.new(capacity)
Buffered(T).new(capacity)
end
end
@lbguilherme Not an issue, because Channel.new returns a Channel. Not exactly a Channel, but it's compatible with it (a subclass). So the rule about T.new is actually making the variable be "T or any of its subclasses", also known as "T+", but that would be transparent for the programmer.
@asterite Great!!
Redefining new is done with care and not whimsically, so I don't see a good reason for restricting its' potential superficially really.
Redefining new is done with care and not whimsically
That alone sounds like a perfect reason to ensure .new returns T or any of its subclasses (aka: Principle of least surprise).
:smile:
There's "common programs" and there's DSL-scenarios, where some trickery can always be of benefit ;-)
Most helpful comment
@lbguilherme Yes. There's an idea of restricting
T.newmethods to always returnTto avoid this confusion but it's not decided yet.