Right now a free variable in a type restriction is one with a single letter, optionally followed by a digit. For example:
def foo(x : T)
T
end
foo(1) # => Int32
foo("hello") # => String
Here T
takes the type of the argument x
.
This is used for example to create an Array filled with an initial value:
class Array(T)
def initialize(size : Int, value : T)
# ...
end
end
Array.new(3, 'a') # => ['a', 'a', 'a']
Here T
isn't set (because we didn't do Array(Int32)
, we didn't specify T
explicitly) so T
becomes Char, and because T
is also that T
of Array(T)
, the returned array is Array(Char)
.
Another use case is Enumerable#map
:
module Enumerable(T)
def map(&block : T -> U)
ary = [] of U
each { |e| ary << yield e }
ary
end
end
Here U
becomes the block's type, and so we can create an array of the block's type.
The problem with this single-letter convention is that a type like U
can't exist, because if someone defines class U
at the top level, then U
isn't a free variable anymore: U
doesn't take the block's type, but must be that class U
.
Because of this last problem, we disallowed naming types with single letters at the top-level.
Another bad side-effect of this is that type arguments, like the T
in Array(T)
, must only be single letter because in the Array#initialize
example above we needed T
to be a free variable but also match the type argument.
This feels like the language is limiting us too much.
For the case where T
can either be a free variable or match a type argument, like in the Array#iniitalize
example above, we could actually note that the T
is a type argument, and we never match it against a top-level type named T
. We can implement this right now and this is backwards compatible and solves part of the problem.
The problem still remains for that U
that isn't a type argument but we still want to set to a type inferred by the compiler. Or the first foo
example above.
I propose we introduce syntax to say "this is a free variable, please deduce the type of this". Possible syntax can be:
def foo(x : 'T)
def foo(x : `T)
def foo(x : %T)
def foo(x : ^T)
def foo(x : @T)
def foo(x : ?T)
def foo(x : $T)
Or maybe another symbol. I'm personally inclined towards something that has good syntax highlighting out of the box :-)
I think this will be needed in very few places, so I'd prefer this and be able to name types and type arguments however I want than to be limited.
Enumerable#map
then becomes (assuming we go with %
):
module Enumerable(T)
def map(&block : T -> %U)
ary = [] of U
each { |e| ary << yield e }
ary
end
end
/cc @bcardiff @waj
How do you imagine when the type variable need to be used multiple types?
def merge(a : Enumerable('T), Enumerable('T))
# ...
end
I find that tedious.
If we want to differentiate type variables from constants we should introduce a scope for them. In types we have that using generics Array(T)
. So that will hide any T
constant.
In generic types we need an order so we can create them. In methods we don't and that is why we avoid the issue.
I am still fine with the single letter(+ digit) convention.
But if we want to get rid of that I would vote for something like:
def (T) merge(a : Enumerable(T), Enumerable(T))
# ...
end
Or
class Array(T)
typevar S # so S can be used for all methods inside the lexical scope as a type var. (yet is reminds me to C++ tempates...
def merge(other: Array(S)) : Array(T|S)
end
end
Hmm... I didn't find your first example tedious. The problem with def (T) ...
is that now a Def ASTNode will need an array of free variables, making them all a big bigger and the memory footprint will increase. With 'T
or %T
it's just a new AST node, FreeVar
, that is only used when needed, and its analysis is straight-forward (no need to check if something is a free var relative to the def where it's used).
The typevar S
syntax is more confusing for me. I think Swift and Rust have something like that and I find it extremely confusing (and I remember @waj told me the same thing regarding Swift).
In any case, they are valid proposals, we are just brainstorming :-)
what about _T
, _U
def (T) ...
may not increase the memory footprint if the freevars list is a context that will help you choose if T is a freevar or a constant. And after parsing the def def you will get rid of that context.
If we go for 'T
then Array(T)
should be Array('T)
for consistency IMO. And it feels like again, just splitting the grammar with a rule.
a) starting with ' , or
b) single leters (+digit)
the other alternative is lexical scoping as I propose.
Any other will be inconsistent.
for sure: brainstorming open minded mode on ;-)
I don't understand why Array(T) needs to change to Array('T)... where? In the class declaration? When using it as a type restriction?
class Array(^T)
# T is a constant
# ^T is binding to the free var T
end
otherwise i find confusing what will the following def will accept
def merge(a: Array(^T), b: Array(T))
end
both a & b are of the same type or other.
And what does T
means inside that def.
please without '
, strange symbol, when single, not like it in Rust also, and editors should change syntax parsing.
python style __U__
:)
In:
def merge(a: Array(^T), b: Array(T))
end
assuming ^T
is the syntax for free variables, ^T
is a free variable. Array(T)
will try to match T
to an existing type T
and will of course fail if there's no such type. But I don't see why we need to declare class Array(^T)
.
It basically will work like this:
class Array(T)
def foo(x : T) # this T is the T of the array
T
end
def bar(x : ^T) # this is a free variable
T
end
end
def foo(x : ^T) # this is a free variable
T
end
def error(x : T) # this is an error, because there's no top-level type T
end
ary = Array(Int32).new
ary.foo(1) # => Int32
ary.foo('a') # error, Char doesn't match Int32
ary.bar(1) # => Int32
ary.bar('a') # => Char (because ^T is a free variable)
foo(1) # => Int32
foo('a') # => Char
But also:
class Array(T)
def initialize(size : Int, value : T) # this T matches the type argument
# ...
end
end
# 'a' matches T, which is the array type argument.
# Because it's not yet set, and because this is an `initialize` method
# (for `new` it would be the same), T becomes Char, which also becomes
# the type argument, so we get Array(Char)
Array.new(3, 'a') # => ['a', 'a', 'a']
# Here T is set to Char, so the T in the method will be Char,
# and it doesn't match Bool
Array(Char).new(3, true) # Error, Bool doesn't match Char
This last case seems a bit strange, but I think it makes sense, and it will keep the number of ^T
low.
So in most cases, in generic types, we won't need to use that annoying symbol ^T
. We only need it when we really want a free variable that's not an existing type argument, for example in Enumerable#map
or Array#+(other : Array(^U))
, so in very few places, so chances of ridding the language with strange symbols are very low.
I find this example confusing:
def merge(a: Array(^T), b: Array(T))
end
Are ^T
and T
the same type or not? If they are not (which I find less confusing than if they aren't) , then I should use ^T
everywhere in the method definition. In other words, does the ^
symbol become part of the type name or not? That makes the syntax for locally (to the method) defined type vars different to class type vars.
If we wanted to keep the same syntax for the type vars, then we need a way to declare the type vars at the method level. If we had choose angle brackets this would be really simple (like in C#, Java or C++):
class Foo<T>
def bar<Q>(x : Q)
end
end
Instead we decided to use parentheses, easier for the eyes and for the parser. But having two pair of parentheses one after another in the method definition doesn't look nice (and is probably ambiguous to parse), so I think @bcardiff suggestion of putting them before the method name looks more appealing.
class Foo(T)
def (Q) bar(x : Q)
end
end
I find this syntax much more homogeneous, more clean, without strange characters added.
Given all this said, I don't find specially limiting the fact that single letter class names are reserved for free type vars. It's a really simple rule to learn that makes the syntax even simpler.
What if we allow longer type var names but still forbid declaring single letter classes? Type vars at the class lever are always declared explicitly so this is easy to fix. At the method level, if you want to use long type vars, use the @bcardiff syntax. Otherwise nothing has to be done, keep using the simpler syntax:
class Foo(Type)
def bar(x : Q)
end
def (Another) bar(x : Another)
end
end
The other alternative is to go with Haskell type variables that match variables grammar and not constants. As expressed in https://github.com/crystal-lang/crystal/issues/3112#issuecomment-239858693
The main issue with disallowing single-letter top-level type names is that examples become a bit longer: class A; end; ...
vs; class NeedToThinkANameMaybeFoo; end; ...
. I don't know, I can create a class named T
in Java, C#, Ruby... I think I should be able to do that in Crystal too. Maybe it's not a big deal :-)
It's true that we can remove the restriction of type variables being single letter names right now because in Hash(Key, Value)
, using Key
inside Hash
will always mean the Key
of Hash
, never a free variable (well, it will act as a free variable if it's a type parameter of the uninstantiated generic class that has the method, and that's OK). I'll at least implement this change and it'll make the language a bit more flexible.
I now understand the issue with ^T
and T
. In that case I would give an error, maybe at parse time, because T
is used both as a type and as a free variable.
^T
isn't part of the type name, it just means: T
is a free variable in this method, assign it the type you deduce, either from the argument type or from the block's type.
Basically, what I proposed is that this (I'll use ~
because I find it less noisy):
def foo(x : ~T)
end
is the same as what you propose as:
def (T) foo(x : T)
end
The second form is longer, and now we have two pairs of parentheses, and a duplicated T. On the other hand we don't need to think of a new symbol, so either way is fine. I maybe prefer ~T
only because I feel its uses are very little, so adding this as part of a method signature (explaining that a method can have an optional list of free variable names) instead of seeing it as a modifier (in a type restriction, ~T
means T
is a free variable) feels like the user has to learn more. Plus it increases a Def's size, and remember a lot of Defs are created (one for each instantiation plus one untyped).
For example:
module Enumerable(T)
def (U) map(&block : T -> U)
end
# vs.
def map(&block : T -> ~U)
end
end
In the first method I feel my 馃憖 are going from left to right, while in the second form I just read it and when I read ~U
I see it's a free var.
In any case, either way is fine with me, we just need to settle on a syntax, or decide to leave things like they are now.
I'm not fond of introducing a new syntax merely to reallow single letter types. In examples one may use Foo and Bar instead or A and B. It's not such a big deal.
I like how generics/freevars are a single letter. It helps understand that it's a placeholder type.
That being said, I prefer a definition that isn't separated from the freevar itself (def (T) foo(x : T)
is weird). As long as the syntax is only for freevars and not for generics, I can live with ~T
in the rare places it would be required.
If the case that you're trying to address with ~T
is going to be rare, then why does it have to be identified by a single cryptic prefix-character? Why not use something more verbose for this, and save single-character prefixes for something which is going to be typed a lot of times?
+1 to the "I don't really see the single letter restriction as a problem" crowd.
+1 to the "def (T) foo(x : T)
is weird" crowd. At first sight it looks as it could be a return type annotation (I think it will especially feel like that for C-and-derivatives programmers).
Finally, +1 to @drosehn: if it really needs to be fixed, and if it really is so infrequent, let's go for something that shouts it's meaning to your face.
def foo(x: newtype(T))
or something of the sort (just to provide a strawman :P).
Symbols combined with type names feel quite "rusty". I know it's just syntax and it's really silly, but whenever I think of giving Rust another try, reading code with that kind of syntax makes me look for some other thing to do with my spare time.
I like the idea of using a word like newtype(T)
, it's true that we don't need to use symbols for this.
I forgot to mention another example of why disallowing single-letter names at the top-level is "bad": #3269 . For example:
N = (ARGV[0]? || 10).to_i
N.times do
end
Or what about making it easy to use E
in a short script?
E = Math::E
# ...
That is, top-level constants like these are also currently disallowed.
I prefer to use sometext(T)
in a few places (I think most users will never need this) than limiting all scripts, benchmarks or small prototypes where one can use very short names to try things out.
I've bashed my head back and forth and written syntax upon syntax for this very issue back in the days, in my various toy-langs, (and coded C++ for shitloads of years) - with those biases in mind, this is my 2 satoshis take:
I've used the terms "typeparam" for T in class Foo(T)
and "typevar" for T in def foo(x : T)
.
class Foo(Bar, Qwo)
.Exp = Math::E
or TMP = (ARGV[0]? || 10).to_i
... or class Foo
, class Bar
, class FooSub
, etc.?class Foo(T)
, and then foo(x : T)
- don't do that!ElementT
etc./[A-Z][0-9]/
for typevars is very much OK.def (Q) foo(x: Bar(Q), y: Q)
instead of resorting to decoration. It does look weird. Agree with that. As @drosehn points out, decoration should be saved for things that are done often. Also, decoration, if done only on first use, looks... sketchy: def foo(x: Bar(~Q), y: Q)
, in that case, it should be part of the name. But I don't like that.Type<T>
, with that def foo<T>(x : T)
becomes natural as per @waj 's example. But, agreed, with the paren-notation of Crystal, it does look at bit weird. But still better than decoration.T
is not used in the method formal parameters_, or not even a typevar, but a "generic configuration" (make a variation of the method based on some constant value, etc., just like for generic types), for instance (the code from a PoC database I intend to re-implement from scratch in Crystal):template <typename T>
inline auto read_varilen_integer(u8*& data) -> T {
T unsigned_value = read_varilen_natural<T>(data);
return ((unsigned_value & 1) ? ~(unsigned_value >> 1)
: (unsigned_value >> 1));
}
Here the T is used only to generic-parametrize the _call_ to read_varilen_natural
(defined similarily) and for the _return type_. This would still require additional syntax for calls in Crystal though. An alternative for this in Crystal is of course to pass the type as an argument - but it's not certain that it will be optimized away by LLVM.
I know we all have very different backgrounds into Crystal. I want a cleaner, more productive, replacement for C++. Others come from Ruby and see everything in a dynamic perspective. I prefer to have control to make efficient stuff when _I need too_, and as clear straight forward readable code in most cases. If performance wasn't an issue, we'd all code in functional languages - right? The only reason for imperative langs to exist imo is to be able to tell the machine _how_ to do what we want, to squeeze those extra cycles.
If performance wasn't an issue, we'd all code in functional languages - right?
I disagree with this. I mean, some things are just easier to represent with objects. In particular, plugins are nearly impossible with statically-typed functional languages, short of doing ugliness like:
data MyObj = MyObj { doSomething :: Int -> Int }
and then explicitly construct it:
myPluginEntryPoint = MyObj { doSomething = (\a -> a) }
But that's not any more readable than just using ABCs! Even functional languages that _do_ have plugin systems usually use objects.
Back on topic: my vote:
crystal
def f(normal_arg : NormalType, T)
T
isn't really an argument; it's just a generic parameter. So the type _appears_ to be a normal argument, but it's all compile-time. Someone could call it like f(arg, SomeType)
.
Failing that, Haskell's solution could also work: use lowercase letters for generic parameters. They're not allowed as types to begin with, anyway. This wouldn't really hurt readability, and you don't end up with the weirdness of using symbols (e.g. %T
).
I think the current situation is fine as it is, but merging #3294 would be nice. Not being able to define single-letter constants at the top level only affects short scripts, which use top-level code. And for that I say there's always a simple workaround: Just put it in a class/singleton.
@kirbyfan64 - yes, I exaggerated for effect, sorry for deviating from the subject.
I love the def f(normal_arg : NormalType, T)
idea!
So, today we discussed this with @waj and @bcardiff
The thing is... did you know that only top-level names like T
and T1
are disallowed? But not nested names. But with just this rule things can still break. For example:
module Moo
class Foo
def foo(x : U) # U is a free variable
end
end
end
# Later, someone reopens Moo and defines a class U
module Moo
class U
end
end
Oops, the free variable U
above now becomes bound to Moo::U
, so compilation breaks.
I know, the solution is simply to disallow these short names as types everywhere, right? Well, what about Math::E
? Should we rename it because of this?
In fact, I initially implemented the rule above and later decided to only do this for the top-level because Math::E
would break. But this rule is incomplete, as we just saw.
So, for us it doesn't feel right that the language constraints names we can choose... it would take a lot of fun out if we'd have to rename Math::E
to something else.
Also, free variables are not a very common feature, so adding syntax for that instead of disallowing short type names feels like the correct thing to do. But, free variables are usually short, like T
and U
, and we can still do that, so we can still use that as a convention, while at the same time allowing T
, U
and E
for type names, be it for small scripts, sample code or math constants.
What syntax to use? We've settled with:
module Enumerable(T)
def (U) map(&block : T -> U)
end
end
This is because if we use something like ~U
or newtype(U)
it would be confusing to use U
and ~U
in the same method. With def (U)
that's impossible to happen.
Don't worry, using free variables like that is very uncommon, most occurrences of that are in the standard library, and the syntax is not hard not full of strange symbols, so we think it's still a happy syntax.
We'll also free the names for type variables, so class Foo(Bar)
can be written if someone wants too (#3294), though by convention one would use a short name.
馃憥 for, looks too strange:
module Enumerable(T)
def (U) map(&block : T -> U)
end
end
newtype(U) - :+1: interesting, not any change in syntax, just new compiler method, like is_a?
Uhhh...I'm sorry, but that looks kinda weird. :O
And the idea to use lowercase letters never even got any consideration. :(
@kostya I don't know why you say that it looks weird, in Java it's almost the same.
public <T> void foo(T x) { }
This:
def (U) map(&block : T -> U)
end
is read: "given U a free variable, the method map
has the following signature".
Also, what about this:
def push(x : newtype(U), y : Array(newtype(U)))
end
Seems a bit long, we have to use newtype
everytime we need to mention U
as a free variable. With the proposed syntax:
def (U) push(x : U, y: Array(U))
end
Much simpler to read and to analyze.
@kirbyfan64 We considered lowercase letters, but it's weird:
def foo(x : a)
a = 1 # ???
a::T # ???
end
a
usually means var or call, and now it could also mean a type, and some things, like a::T
are not supported by the current grammar, so this would be a huge change.
The main cons of lower case is that simple highlighters won't be able to color : a
as a type.
So from reading the source is not easily depicted which variables are type variables and which normal variables.
I really tried to push Haskell like type variables.
In haskell you don't mix type variables with variables in the same line, that is not what happen in crystal.
apply :: (a -> b) -> a -> b
apply f a = f a -- here a is a value of type a
One last attempt at more explicitness :P
def (forall U) push(x : U, y: Array(U))
end
If you want to introduce more than one free var:
def (forall U,V,W) zip(us : Array(U), vs : Array(V), ws : Array(W))
end
Based on a discussion that fired because of @mverzilli 's message :-), we finally settled on this syntax:
def map(&block : T -> U) forall U
def map(&block : T -> U) : ReturnType forall T, U
Pros:
forall U
instead of (U)
is more human, and there are less parentheses and symbolsdef foo
, with def (U) foo
I can't do that anymore)forall
is used in Purescript, so it's not just a crazy idea of oursCons:
forall
can't be invoked in the same line as in the method definition (it still can be invoked with self.forall
, like any keyword). We don't think this is a real breaking change.Is it now a _requirement_, or is it just a _feature_ for safety?
@ozra In the next version both forall
and using a single-letter name will make a type be considered a free variable. Then in the subsequent version we'll remove the single-letter name rule and you'll have to use forall
.
Only for freevars or also for generics?
@ysbaddaden Only for freevars.
In the standard library I only found 64 uses of freevars.
Ok
Most helpful comment
I'm not fond of introducing a new syntax merely to reallow single letter types. In examples one may use Foo and Bar instead or A and B. It's not such a big deal.
I like how generics/freevars are a single letter. It helps understand that it's a placeholder type.
That being said, I prefer a definition that isn't separated from the freevar itself (
def (T) foo(x : T)
is weird). As long as the syntax is only for freevars and not for generics, I can live with~T
in the rare places it would be required.