Crystal: Introduce syntax for free variables

Created on 11 Sep 2016 · 32Comments · Source: crystal-lang/crystal

Right now a free variable in a type restriction is one with a single letter, optionally followed by a digit. For example:

def foo(x : T)
  T
end

foo(1) # => Int32
foo("hello") # => String

Here T takes the type of the argument x.

This is used for example to create an Array filled with an initial value:

class Array(T)
  def initialize(size : Int, value : T)
    # ...
  end
end

Array.new(3, 'a') # => ['a', 'a', 'a']

Here T isn't set (because we didn't do Array(Int32), we didn't specify T explicitly) so T becomes Char, and because T is also that T of Array(T), the returned array is Array(Char).

Another use case is Enumerable#map:

module Enumerable(T)
  def map(&block : T -> U)
    ary = [] of U
    each { |e| ary << yield e }
    ary
  end
end

Here U becomes the block's type, and so we can create an array of the block's type.

The problem with this single-letter convention is that a type like U can't exist, because if someone defines class U at the top level, then U isn't a free variable anymore: U doesn't take the block's type, but must be that class U.

Because of this last problem, we disallowed naming types with single letters at the top-level.

Another bad side-effect of this is that type arguments, like the T in Array(T), must only be single letter because in the Array#initialize example above we needed T to be a free variable but also match the type argument.

This feels like the language is limiting us too much.

For the case where T can either be a free variable or match a type argument, like in the Array#iniitalize example above, we could actually note that the T is a type argument, and we never match it against a top-level type named T. We can implement this right now and this is backwards compatible and solves part of the problem.

The problem still remains for that U that isn't a type argument but we still want to set to a type inferred by the compiler. Or the first foo example above.

I propose we introduce syntax to say "this is a free variable, please deduce the type of this". Possible syntax can be:

def foo(x : 'T)

def foo(x : `T)

def foo(x : %T)
def foo(x : ^T)
def foo(x : @T)
def foo(x : ?T)
def foo(x : $T)

Or maybe another symbol. I'm personally inclined towards something that has good syntax highlighting out of the box :-)

I think this will be needed in very few places, so I'd prefer this and be able to name types and type arguments however I want than to be limited.

Enumerable#map then becomes (assuming we go with %):

module Enumerable(T)
  def map(&block : T -> %U)
    ary = [] of U
    each { |e| ary << yield e }
    ary
  end
end

/cc @bcardiff @waj

accepted draft compiler

Source

asterite

👍5 👎3

Most helpful comment

I'm not fond of introducing a new syntax merely to reallow single letter types. In examples one may use Foo and Bar instead or A and B. It's not such a big deal.

I like how generics/freevars are a single letter. It helps understand that it's a placeholder type.

That being said, I prefer a definition that isn't separated from the freevar itself (def (T) foo(x : T) is weird). As long as the syntax is only for freevars and not for generics, I can live with ~T in the rare places it would be required.

ysbaddaden on 11 Sep 2016

👍6

All 32 comments

How do you imagine when the type variable need to be used multiple types?

def merge(a : Enumerable('T), Enumerable('T))
  # ...
end

I find that tedious.
If we want to differentiate type variables from constants we should introduce a scope for them. In types we have that using generics Array(T). So that will hide any T constant.
In generic types we need an order so we can create them. In methods we don't and that is why we avoid the issue.

I am still fine with the single letter(+ digit) convention.
But if we want to get rid of that I would vote for something like:

def (T) merge(a : Enumerable(T), Enumerable(T))
  # ...
end

class Array(T)
  typevar S # so S can be used for all methods inside the lexical scope as a type var. (yet is reminds me to C++ tempates... 

  def merge(other: Array(S)) : Array(T|S)
  end
end

bcardiff on 11 Sep 2016

Hmm... I didn't find your first example tedious. The problem with def (T) ... is that now a Def ASTNode will need an array of free variables, making them all a big bigger and the memory footprint will increase. With 'T or %T it's just a new AST node, FreeVar, that is only used when needed, and its analysis is straight-forward (no need to check if something is a free var relative to the def where it's used).

The typevar S syntax is more confusing for me. I think Swift and Rust have something like that and I find it extremely confusing (and I remember @waj told me the same thing regarding Swift).

In any case, they are valid proposals, we are just brainstorming :-)

asterite on 11 Sep 2016

what about _T, _U

kostya on 11 Sep 2016

def (T) ... may not increase the memory footprint if the freevars list is a context that will help you choose if T is a freevar or a constant. And after parsing the def def you will get rid of that context.

If we go for 'T then Array(T) should be Array('T) for consistency IMO. And it feels like again, just splitting the grammar with a rule.

a) starting with ' , or
b) single leters (+digit)

the other alternative is lexical scoping as I propose.
Any other will be inconsistent.

for sure: brainstorming open minded mode on ;-)

bcardiff on 11 Sep 2016

I don't understand why Array(T) needs to change to Array('T)... where? In the class declaration? When using it as a type restriction?

asterite on 11 Sep 2016

class Array(^T)
  # T is a constant
  # ^T is binding to the free var T
end

otherwise i find confusing what will the following def will accept

def merge(a: Array(^T), b: Array(T))
end

both a & b are of the same type or other.
And what does T means inside that def.

bcardiff on 11 Sep 2016

please without ', strange symbol, when single, not like it in Rust also, and editors should change syntax parsing.

kostya on 11 Sep 2016

👍2

python style __U__ :)

kostya on 11 Sep 2016

In:

def merge(a: Array(^T), b: Array(T))
end

assuming ^T is the syntax for free variables, ^T is a free variable. Array(T) will try to match T to an existing type T and will of course fail if there's no such type. But I don't see why we need to declare class Array(^T).

It basically will work like this:

class Array(T)
  def foo(x : T) # this T is the T of the array
    T
  end

  def bar(x : ^T) # this is a free variable
    T
  end
end

def foo(x : ^T) # this is a free variable
  T
end

def error(x : T) # this is an error, because there's no top-level type T
end

ary = Array(Int32).new
ary.foo(1) # => Int32
ary.foo('a') # error, Char doesn't match Int32
ary.bar(1) # => Int32
ary.bar('a') # => Char (because ^T is a free variable)

foo(1) # => Int32
foo('a') # => Char

But also:

class Array(T)
  def initialize(size : Int, value : T) # this T matches the type argument
    # ...
  end
end

# 'a' matches T, which is the array type argument.
# Because it's not yet set, and because this is an `initialize` method
# (for `new` it would be the same), T becomes Char, which also becomes
# the type argument, so we get Array(Char)
Array.new(3, 'a') # => ['a', 'a', 'a']

# Here T is set to Char, so the T in the method will be Char,
# and it doesn't match Bool
Array(Char).new(3, true) # Error, Bool doesn't match Char

This last case seems a bit strange, but I think it makes sense, and it will keep the number of ^T low.

So in most cases, in generic types, we won't need to use that annoying symbol ^T. We only need it when we really want a free variable that's not an existing type argument, for example in Enumerable#map or Array#+(other : Array(^U)), so in very few places, so chances of ridding the language with strange symbols are very low.

asterite on 11 Sep 2016

👍3

I find this example confusing:

def merge(a: Array(^T), b: Array(T))
end

Are ^T and T the same type or not? If they are not (which I find less confusing than if they aren't) , then I should use ^T everywhere in the method definition. In other words, does the ^ symbol become part of the type name or not? That makes the syntax for locally (to the method) defined type vars different to class type vars.

If we wanted to keep the same syntax for the type vars, then we need a way to declare the type vars at the method level. If we had choose angle brackets this would be really simple (like in C#, Java or C++):

class Foo<T>
  def bar<Q>(x : Q)
  end
end

Instead we decided to use parentheses, easier for the eyes and for the parser. But having two pair of parentheses one after another in the method definition doesn't look nice (and is probably ambiguous to parse), so I think @bcardiff suggestion of putting them before the method name looks more appealing.

class Foo(T)
  def (Q) bar(x : Q)
  end
end

I find this syntax much more homogeneous, more clean, without strange characters added.

Given all this said, I don't find specially limiting the fact that single letter class names are reserved for free type vars. It's a really simple rule to learn that makes the syntax even simpler.

What if we allow longer type var names but still forbid declaring single letter classes? Type vars at the class lever are always declared explicitly so this is easy to fix. At the method level, if you want to use long type vars, use the @bcardiff syntax. Otherwise nothing has to be done, keep using the simpler syntax:

class Foo(Type)
  def bar(x : Q)
  end

  def (Another) bar(x : Another)
  end
end

waj on 11 Sep 2016

The other alternative is to go with Haskell type variables that match variables grammar and not constants. As expressed in https://github.com/crystal-lang/crystal/issues/3112#issuecomment-239858693

bcardiff on 11 Sep 2016

The main issue with disallowing single-letter top-level type names is that examples become a bit longer: class A; end; ... vs; class NeedToThinkANameMaybeFoo; end; .... I don't know, I can create a class named T in Java, C#, Ruby... I think I should be able to do that in Crystal too. Maybe it's not a big deal :-)

It's true that we can remove the restriction of type variables being single letter names right now because in Hash(Key, Value), using Key inside Hash will always mean the Key of Hash, never a free variable (well, it will act as a free variable if it's a type parameter of the uninstantiated generic class that has the method, and that's OK). I'll at least implement this change and it'll make the language a bit more flexible.

I now understand the issue with ^T and T. In that case I would give an error, maybe at parse time, because T is used both as a type and as a free variable.

^T isn't part of the type name, it just means: T is a free variable in this method, assign it the type you deduce, either from the argument type or from the block's type.

Basically, what I proposed is that this (I'll use ~ because I find it less noisy):

def foo(x : ~T)
end

is the same as what you propose as:

def (T) foo(x : T)
end

The second form is longer, and now we have two pairs of parentheses, and a duplicated T. On the other hand we don't need to think of a new symbol, so either way is fine. I maybe prefer ~T only because I feel its uses are very little, so adding this as part of a method signature (explaining that a method can have an optional list of free variable names) instead of seeing it as a modifier (in a type restriction, ~T means T is a free variable) feels like the user has to learn more. Plus it increases a Def's size, and remember a lot of Defs are created (one for each instantiation plus one untyped).

For example:

module Enumerable(T)
  def (U) map(&block : T -> U)
  end

  # vs.

  def map(&block : T -> ~U)
  end
end

In the first method I feel my 👀 are going from left to right, while in the second form I just read it and when I read ~U I see it's a free var.

In any case, either way is fine with me, we just need to settle on a syntax, or decide to leave things like they are now.

asterite on 11 Sep 2016

👍1

I'm not fond of introducing a new syntax merely to reallow single letter types. In examples one may use Foo and Bar instead or A and B. It's not such a big deal.

I like how generics/freevars are a single letter. It helps understand that it's a placeholder type.

ysbaddaden on 11 Sep 2016

👍6

If the case that you're trying to address with ~T is going to be rare, then why does it have to be identified by a single cryptic prefix-character? Why not use something more verbose for this, and save single-character prefixes for something which is going to be typed a lot of times?

drosehn on 12 Sep 2016

👍1

+1 to the "I don't really see the single letter restriction as a problem" crowd.

+1 to the "def (T) foo(x : T) is weird" crowd. At first sight it looks as it could be a return type annotation (I think it will especially feel like that for C-and-derivatives programmers).

Finally, +1 to @drosehn: if it really needs to be fixed, and if it really is so infrequent, let's go for something that shouts it's meaning to your face.

def foo(x: newtype(T)) or something of the sort (just to provide a strawman :P).

Symbols combined with type names feel quite "rusty". I know it's just syntax and it's really silly, but whenever I think of giving Rust another try, reading code with that kind of syntax makes me look for some other thing to do with my spare time.

mverzilli on 12 Sep 2016

👍2

I like the idea of using a word like newtype(T), it's true that we don't need to use symbols for this.

I forgot to mention another example of why disallowing single-letter names at the top-level is "bad": #3269 . For example:

N = (ARGV[0]? || 10).to_i

N.times do
end

Or what about making it easy to use E in a short script?

E = Math::E

# ...

That is, top-level constants like these are also currently disallowed.

I prefer to use sometext(T) in a few places (I think most users will never need this) than limiting all scripts, benchmarks or small prototypes where one can use very short names to try things out.

asterite on 12 Sep 2016

👍1

I've bashed my head back and forth and written syntax upon syntax for this very issue back in the days, in my various toy-langs, (and coded C++ for shitloads of years) - with those biases in mind, this is my 2 satoshis take:

I've used the terms "typeparam" for T in class Foo(T) and "typevar" for T in def foo(x : T).

Allowing longer typeparam names for generic types sounds very good: class Foo(Bar, Qwo).
The Single Letter Limitation: no problem. Seriously! How hard is it to do Exp = Math::E or TMP = (ARGV[0]? || 10).to_i... or class Foo, class Bar, class FooSub, etc.?
- Easily readable and grokable "real" source code should be priority one, making examples and throw-ups easy should be way second to that.
Using the same symbol for a method typevar and a typeparam with the intent of _T being something else_ is bad practise imo. (class Foo(T), and then foo(x : T) - don't do that!
- With long typeparam names - that's made a lot easier: use long names in generic types, and single letter for method specific! StdLib containers could use de facto naming ElementT etc.
The only reason I see to explicitly mark method typevars is to prevent accidental use of a typeparam instead of the intended method scoped use. And also to allow typevars that are _not used in the method's formal parameters_. But then additional syntax is needed for calls. And decoration is out of the question.
- I think allowing only /[A-Z][0-9]/ for typevars is very much OK.
- If the safe-guarding is wanted, I think the explicit listing is better: def (Q) foo(x: Bar(Q), y: Q) instead of resorting to decoration. It does look weird. Agree with that. As @drosehn points out, decoration should be saved for things that are done often. Also, decoration, if done only on first use, looks... sketchy: def foo(x: Bar(~Q), y: Q), in that case, it should be part of the name. But I don't like that.
- I would then actually prefer an error if a method typevar has the same name as a generic typeparam, when it's been explicitly listed as such. One should use different names.
- In my alternative-syntax-fork I use Type<T>, with that def foo<T>(x : T) becomes natural as per @waj 's example. But, agreed, with the paren-notation of Crystal, it does look at bit weird. But still better than decoration.
- If we go one step further with generic parametrizations (I'd really like that: DRY and performant) this could also create function variations _where the T is not used in the method formal parameters_, or not even a typevar, but a "generic configuration" (make a variation of the method based on some constant value, etc., just like for generic types), for instance (the code from a PoC database I intend to re-implement from scratch in Crystal):

template <typename T>
inline auto read_varilen_integer(u8*& data) -> T {
    T unsigned_value = read_varilen_natural<T>(data);

    return ((unsigned_value & 1) ? ~(unsigned_value >> 1)
    : (unsigned_value >> 1));
}

Here the T is used only to generic-parametrize the _call_ to read_varilen_natural (defined similarily) and for the _return type_. This would still require additional syntax for calls in Crystal though. An alternative for this in Crystal is of course to pass the type as an argument - but it's not certain that it will be optimized away by LLVM.

I don't thing the Def ASTNode size @asterite mentions should be an argument in the choice. Few defs (best case: none) will have the list, and so will just be an additional 8byte nil.

I know we all have very different backgrounds into Crystal. I want a cleaner, more productive, replacement for C++. Others come from Ruby and see everything in a dynamic perspective. I prefer to have control to make efficient stuff when _I need too_, and as clear straight forward readable code in most cases. If performance wasn't an issue, we'd all code in functional languages - right? The only reason for imperative langs to exist imo is to be able to tell the machine _how_ to do what we want, to squeeze those extra cycles.

ozra on 12 Sep 2016

👍1

If performance wasn't an issue, we'd all code in functional languages - right?

I disagree with this. I mean, some things are just easier to represent with objects. In particular, plugins are nearly impossible with statically-typed functional languages, short of doing ugliness like:

data MyObj = MyObj { doSomething :: Int -> Int }

and then explicitly construct it:

myPluginEntryPoint = MyObj { doSomething = (\a -> a) }

But that's not any more readable than just using ABCs! Even functional languages that _do_ have plugin systems usually use objects.

Back on topic: my vote:

Leave single-letter vars invalid at the top-level. Short code snippets are a bit longer (that's kind of ironic), but I think it makes code overall more readable/accessible.
If someone needs to have a generic parameter that isn't an argument, they can do something like:

crystal def f(normal_arg : NormalType, T)

T isn't really an argument; it's just a generic parameter. So the type _appears_ to be a normal argument, but it's all compile-time. Someone could call it like f(arg, SomeType).

Failing that, Haskell's solution could also work: use lowercase letters for generic parameters. They're not allowed as types to begin with, anyway. This wouldn't really hurt readability, and you don't end up with the weirdness of using symbols (e.g. %T).

refi64 on 12 Sep 2016

I think the current situation is fine as it is, but merging #3294 would be nice. Not being able to define single-letter constants at the top level only affects short scripts, which use top-level code. And for that I say there's always a simple workaround: Just put it in a class/singleton.

RX14 on 12 Sep 2016

👍1

@kirbyfan64 - yes, I exaggerated for effect, sorry for deviating from the subject.
I love the def f(normal_arg : NormalType, T) idea!

ozra on 12 Sep 2016

So, today we discussed this with @waj and @bcardiff

The thing is... did you know that only top-level names like T and T1 are disallowed? But not nested names. But with just this rule things can still break. For example:

module Moo
  class Foo
    def foo(x : U) # U is a free variable
    end
  end
end

# Later, someone reopens Moo and defines a class U
module Moo
  class U
  end
end

Oops, the free variable U above now becomes bound to Moo::U, so compilation breaks.

I know, the solution is simply to disallow these short names as types everywhere, right? Well, what about Math::E? Should we rename it because of this?

In fact, I initially implemented the rule above and later decided to only do this for the top-level because Math::E would break. But this rule is incomplete, as we just saw.

So, for us it doesn't feel right that the language constraints names we can choose... it would take a lot of fun out if we'd have to rename Math::E to something else.

Also, free variables are not a very common feature, so adding syntax for that instead of disallowing short type names feels like the correct thing to do. But, free variables are usually short, like T and U, and we can still do that, so we can still use that as a convention, while at the same time allowing T, U and E for type names, be it for small scripts, sample code or math constants.

What syntax to use? We've settled with:

module Enumerable(T)
  def (U) map(&block : T -> U)
  end
end

This is because if we use something like ~U or newtype(U) it would be confusing to use U and ~U in the same method. With def (U) that's impossible to happen.

Don't worry, using free variables like that is very uncommon, most occurrences of that are in the standard library, and the syntax is not hard not full of strange symbols, so we think it's still a happy syntax.

We'll also free the names for type variables, so class Foo(Bar) can be written if someone wants too (#3294), though by convention one would use a short name.

asterite on 13 Sep 2016

👍1

👎 for, looks too strange:

module Enumerable(T)
  def (U) map(&block : T -> U)
  end
end

newtype(U) - :+1: interesting, not any change in syntax, just new compiler method, like is_a?

kostya on 14 Sep 2016

Uhhh...I'm sorry, but that looks kinda weird. :O

And the idea to use lowercase letters never even got any consideration. :(

refi64 on 14 Sep 2016

@kostya I don't know why you say that it looks weird, in Java it's almost the same.

public <T> void foo(T x) { }

This:

def (U) map(&block : T -> U)
end

is read: "given U a free variable, the method map has the following signature".

Also, what about this:

def push(x : newtype(U), y : Array(newtype(U)))
end

Seems a bit long, we have to use newtype everytime we need to mention U as a free variable. With the proposed syntax:

def (U) push(x : U, y: Array(U))
end

Much simpler to read and to analyze.

@kirbyfan64 We considered lowercase letters, but it's weird:

def foo(x : a)
  a = 1 # ???
  a::T # ???
end

a usually means var or call, and now it could also mean a type, and some things, like a::T are not supported by the current grammar, so this would be a huge change.

asterite on 14 Sep 2016

The main cons of lower case is that simple highlighters won't be able to color : a as a type.
So from reading the source is not easily depicted which variables are type variables and which normal variables.

I really tried to push Haskell like type variables.

In haskell you don't mix type variables with variables in the same line, that is not what happen in crystal.

apply :: (a -> b) -> a -> b
apply f a = f a -- here a is a value of type a

bcardiff on 14 Sep 2016

One last attempt at more explicitness :P

def (forall U) push(x : U, y: Array(U))
end

If you want to introduce more than one free var:

def (forall U,V,W) zip(us : Array(U), vs : Array(V), ws : Array(W))
end

mverzilli on 14 Sep 2016

Based on a discussion that fired because of @mverzilli 's message :-), we finally settled on this syntax:

def map(&block : T -> U) forall U
def map(&block : T -> U) : ReturnType forall T, U

Pros:

Using forall U instead of (U) is more human, and there are less parentheses and symbols
Methods are still greppable (I can search def foo, with def (U) foo I can't do that anymore)
forall is used in Purescript, so it's not just a crazy idea of ours

Cons:

A method named forall can't be invoked in the same line as in the method definition (it still can be invoked with self.forall, like any keyword). We don't think this is a real breaking change.

asterite on 14 Sep 2016

👍3 👎2

Is it now a _requirement_, or is it just a _feature_ for safety?

ozra on 14 Sep 2016

👍1

@ozra In the next version both forall and using a single-letter name will make a type be considered a free variable. Then in the subsequent version we'll remove the single-letter name rule and you'll have to use forall.

asterite on 14 Sep 2016

Only for freevars or also for generics?

ysbaddaden on 14 Sep 2016

@ysbaddaden Only for freevars.

In the standard library I only found 64 uses of freevars.

asterite on 14 Sep 2016

ozra on 14 Sep 2016

Was this page helpful?

0 / 5 - 0 ratings