Julia: return type declarations

Created on 27 Jul 2012  Â·  105Comments  Â·  Source: JuliaLang/julia

Provide this convenient shorthand:

function foo(x)::T
  ...
  return z
end

for this:

function foo(x)
  ret::T
  ...
  ret = z
  return ret
end

Most helpful comment

Yes, that's really nice. On the input direction, this is similar to why I think it would be nice to do implicit conversion to the declared type of an argument, e.g.:

f(x::Nullable{Int} = 0) = x

f() # => returns Nullable(0)

Otherwise you have to write this as

f(x::Nullable{Int} = Nullable(0)) = x

which just seems obnoxiously redundant and unJulian.

All 105 comments

Glad to hear you're planning to do this. Will we be able to specify that another function only accepts as inputs functions with specific types of returned values?

Will we be able to specify that another function only accepts as inputs functions with specific types of returned values?

What you're talking about are function types, and I suspect we'll probably not support that, because covariance/contravariance/subtyping issues get pretty complicated and confusing. What kind use cases did you have in mind for this? We did intend to have this at one point, and the syntax String-->Int would be used for a method that takes Strings and returns Ints.

The use cases are things like optimization functions: you want to insist that the function that purports to return a Hessian at a minimum is returning a Matrix. It's not so valuable that I'd worry about it if building support causes headaches.

There will still always be run-time checks for things like that, and potentially compiler modes that can do inference and warn if you're doing something that looks wrong. But I'm not too sold on adding function types back in. They're a lot of trouble for little gain in a dynamic system like Julia. Of course, in Haskell you simply need them.

Understood. I have no feel for these things, so I'm sure you're right.

Well, I'm not at all sure I'm right, so there's that ;-)

I agree with like davekong says on #1078

Uint8 + Uint8 = Int64

Looks too strange and unexpected.

I understand what Stefan says on that issue, and I think that something like type declaration of returns values can help. Because maybe its better promote into Int for calculation, but it's good gives to the user the expected type.
Uint8 + Uint8 needs to be a Uint8, even if for calculations are promoted to Int64 (unless you need more bits).

If you have and Array of Uint8 occupying N bytes on memory, in only and scalar multiplication you get an object consuming 4 times more memory...

Or in my case, I defined my bitstype of 8 bits and the following promotion rule [ I'm not sure of let be T, Int or Uint8 ]

-(x::NucleicAcid, y::NucleicAcid)   = int(x) - int(y)
promote_rule{T<:Number}(::Type{NucleicAcid}, ::Type{T}) = T

For simply get uppercase letters, I need to explicitly convert the output. But, if I set the promotion rule to NucleicAcid, I going to get a worse performance how Stefan point out before...

julia> seq = nt"ACTG"
4-element NucleicAcid Array:
 A
 C
 T
 G

julia> seq .+ 32
4-element Int64 Array:
  97
  99
 116
 103

julia> nucleotideseq(seq .+ 32) 
4-element NucleicAcid Array:
 a
 c
 t
 g

I know I can simply definite a function taking a NucleotideSeq and returning on the same type, and user are not going to see the explicit conversion... But. In general is not intuitive.

I'm not sure if return type declaration can be useful here.
...Maybe something like the next ?

promote_rule_IO(::TypeOne, ::TypeTwo) = TypeForOperate, TypeForOutputOnBinaryOperations 

In the case of arrays, I agree. See #1641 .

Having spent more time working with optimization, I have to say that am now much more interested in one day adding the ability to do dispatch on functions typed by the combination of their input types and their return types. It would be nice to have the ability to write separate definitions for:

  • derivative(f::Real -> Real)
  • derivative(f::Real -> Vector{Real})
  • derivative(f::Vector{Real} -> Real)
  • derivative(f::Vector{Real} -> Vector{Real})

This is why in all my optimization routines, I pass the gradient into the objective function. Once it's an argument, you can do dispatch on it.

One reason I'd like to be able to specify return types for functions is that it would improve looking at functions in the REPL. With return types I can much more easily tell at a glance what the functions in the list given by * do.

If anything this would add a lot to the self-documenting nature of a functions source code. I really miss the type of the return value from the source / documentation of basically every other popular dynamic system. It would also help in mentally planning the body of the function while writing it. Would discourage implementation of annoying behavior like PHP's tendency to return a NULL, or a false, or a normal value. Such behavior could totally be made explicit to the reader with the union types. Looking forward to this feature!

I'm working on a medium-ish Julia project. Even if this wasn't involved in the type system (i.e. if it just automatically added an assert to the bottom of the function so I don't have to do this manually everywhere), this would prevent a lot of errors and help with self-documentation

I think the big win here is self-documentation. The ability to look at documentation and/or code and immediately understand the type intent is invaluable.

The main question here is monotonicity. It would be a nice property if declaring that foo(::AbstractA, ::AbstractB)::C implies that foo(a,b)::C for all a::AbstractA and b::AbstractB. However, this implies that return types declared for very generic methods apply to more specific methods as well.

Admittedly not the best solution, but why not just document the most specific return type that is returned for any input types? That's what I've done while starting to document some packages.

In one sense, allowing the most abstract method definition to constrain all other specific methods to return objects of the same type is a feature. It makes everything more predictable since you don't have to worry about whether a specific type will give unexpected results. Then, the author of the generic function has to choose the best abstraction level to leave enough room for more specific implementations (or leave it unspecified if needed).

The monotonicity constraint is indeed a feature. The main motivation was to use it to let map, broadcast, etc pick a suitable element type for the result in a sane and predictable way. (Unlike exploiting type inference for this, which is somewhat unpredictable)

I definitely agree that it's a good feature. It's one that has to be rather carefully designed, however.

The main thing I missed in Python compared to Matlab was that there are no return "types". Well, what Matlab does is not really return types but still, it specifies which variables are returned which helps both with documentation and correctness. So +1 for this.

What @JeffBezanson originally proposed is just syntactic sugar, so as far as I can tell, it's just a matter of implementing it (or not). Although I would propose to have it as a type-assert and not a type-convert:

function foo(x)::T
  ...
  return z
end

would be equivalent to

function foo(x)::T
  ...
  return typeassert(z, T)
end

(for a discussion of the subtle difference to Jeff's original proposal see https://groups.google.com/d/msg/julia-dev/pGvM_QVmjX4/V6OdzhwoIykJ )

The somewhat related topic which has been discussed here, is whether the types of functions and methods should contain their calling and return signature. I think this a sufficiently different topic that is should actually be a different issue. However, the present issue could be a stepping stone for that one.

I definitely see the argument for using a typeassert. But a big part of the value of doing a convert is that it makes it easier to write type-stable functions. For example you can write

f{T}(x::T)::T = foo ? x : x+1

and know you have a T->T function, without worrying about weird behaviors that + might have, and without getting nuisance assertion failures.

Keep in mind that LLVM is generally smart enough to figure out that if it has some code that goes from say Int8 to Int8 but passes through Int that it can just cut out the middle man, so conversion is generally more convenient and no less efficient.

I see. Two counter arguments:

  • The :: in the argument part of a function has a typeassert character and will throw an error. Thus it is a bit confusing to have two subtly different semantics of :: so close together.
  • From a perspective of writing more complex numerical functions: then I want type-stable code throughout the function and not a conversion at the end. The typeassert could provide at least some of that certainty. More generally, it could also catch some errors instead of silently converting. Also, presumably LLVM will struggle to optimse more complex functions?

Either way it would be good sugar to have.

(These two answers also clear up some of the questions I had in that referenced mailing list post, thanks)

Will this shorthand also be available for the function shorthand syntax?

incr(x::Int)::Int = x + 1

Which, by the way, makes me wonder if Julia might benefit from the syntax x::Float32 = 1 as sugar for x = convert(Float32, 1)? Similarly foo(x::T1)::T2 = bar might desugar to foo(x::T1) = convert(T2, bar)

Yes, that will be how it works.

Currently x::Float32 = 1 _declares_ x to be a Float32, in a manner similar to C. All assignments to x then call convert(Float32, ...), so we already do that part.

Important relevant discussion here, quoted below:

@toivoh:

The whole point of the monotonic return types idea (see the thread and discussion in #1090) was that it would provide upper bounds on return types that would be more predictable and not depend on type inference. Of course, adding return type annotations on methods could be seen as a slight duplication, and I know it's probably a bit complex to implement.

Still, anyone care to comment on this alternative?

@StefanKarpinski:

I still like the idea of monotonic return types, but I worry that it has an extreme non-locality property: you write a new method and it returns a value of a type that is completely unrelated to the code you wrote. By the same token, behavior of code you write can change when someone else changes their code. One possibility to deal with that if a "supermethod" has a type annotation, then all its "submethods" must have a type annotation or a warning is printed; if the type annotation of submethod is not a subtype of its supermethods, that should also be a warning (or maybe an error). That lets you hack things out without bothering with type annotations, getting a few warnings, but means that you get early warning if a non-local interaction between supermethods and submethods changes.

@toivoh:

Completely agree, there should at least be a warning if there is a submethod that has a less strict return type annotation than its supermethods, otherwise this could lead to lots of unintended breakage. That will cause some inertia in the return type annotations, especially in deep hierarchies, but I think most hierarchies will not be that deep. I guess an absent return type annotation would be the same as ::Any.

@StefanKarpinski:

That seems good. Having supermethod(args...)::Integer and submethod(args...) would be equivalent to submethod(args...)::Any, which would generate a warning since Any ⊈ Integer. The nice thing about that is that there's only one rule: submethods must have a return type annotation – explicit or implicit – which is a subtype of any supertype annotations. Since Any is the default return type – as it is for everything – the requirement that submethods of methods which are annotated with something stricter than ::Any also be annotated just falls out naturally.

In a sense, the entirety of this behavior is:

  1. return type declarations either raise an error or convert the return value (tbd)
  2. if a submethod return type is not a subtype the return type of its supermethod, emit a warning

In this form, there's no non-locality, which is good. The only non-local behavior is that a change in one place can cause another place to emit a warning, which is something we already do. Of course, there's a danger of this being as annoying as the ambiguous method business is, but then again, maybe not.

@toivoh made the observation that monotonic return types can help eliminate ambiguous method warnings since we can know that for certain method intersections, there is no return type that can satisfy monotonicity. In that case, we can automatically insert an error for the intersection method.

With #9364 this becomes even more appealing even with a simple implementation. When people start using return null in their functions you'll start seeing types of the form Union(Nullable{Union()}, Nullable{T}) which defeat the entire purpose of Nullables - you're better off using Union(Void, T) instead like in 0.3. These are easily fixed by declaring the return type as Nullable{T}, which makes functions with nullable returns as readable as C functions returning pointers, like in this toy example:

function mayberealsqrt{T <: FloatingPoint}(x :: Complex{T}) :: Nullable{T}
    !isreal(x) && return null
    xr = real(x)
    !(xr >= 0) && return null
    sqrt(xr)
end

As originally suggested, this would simply translate to:

function mayberealsqrt{T <: FloatingPoint}(x :: Complex{T})              
    ret :: Nullable{T}
    !isreal(x) && (ret = null; return ret)
    xr = real(x)
    !(xr >= 0) && (ret = null; return ret)
    ret = sqrt(xr); return ret
end

This is simple to understand and implement and neatly solves an immediate problem: making code people find natural to write (i.e. return null) work properly instead of becoming a pitfall.

The alternative of using typeassert seems to be more of a solution looking for a problem. Checking global monotonicity properties and issuing warnings if global consistency is broken sounds like a separate and more complicated issue to me and is compatible with the simple solution to this issue anyway. The problem with complicated issues is that they take forever to close. Another separate complicated issue is #8027 (so one could use Nullable{sqrt(real(T))} as the return type above and avoid specialization to Complex FloatingPoint), but that's compatible with the simple solution too.

I completely agree that the convert behavior is the right one, and Nullables are indeed a good example use case for it.

However I don't think Union(Void, T) is somehow a "better" type than Union(Nullable{Union()}, Nullable{T}). The union of Nullables does not defeat the purpose of Nullables, but complements it nicely. That union is still a subtype of Nullable, plus Nullable{Union()} performs the nice trick of delivering an empty type to future inference on get, effectively removing that branch of the union later on.

I simply looked at _X_=Union(Nullable{Union()}, Nullable{T}) and reasoned that Nullable{Union()} is an inefficient equivalent of Void (isomorphic types), so _X_ is an inefficient equivalent of Union(Void, Nullable{T}), which itself is an inefficient equivalent of Union(Void, T), which is the plain old type we started with. Surely adding two layers of inefficiency will not improve things? Wasn't the point of nullable types to get performance gains by avoiding union types? They are not avoided in _X_, it's even more complicated union type than we had before nullables. That's what I meant by defeating their purpose.

There isn't any reason Nullable{Union()} is an inefficient equivalent of Void. Adding an extra type wrapper doesn't mean "please make this less efficient"; you have to look at and time actual code to see if there is any inefficiency.

Having more characters does not necessarily equate to "more complicated". The common ancestor of Void and Float64 is Any, while the common ancestor of Nullable{Union()} and Nullable{Float64} is Nullable, which is a smaller type. From that perspective (which is the one that matters to the compiler), _X_ is the less complicated type.

As I mentioned in the other thread, inference of get on _X_ where T=Float64 will give Float64:

julia> f() = get(randbool() ? Nullable{Union()}() : Nullable(1.0))

julia> code_typed(f,())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[symbol("#s2")],Any[Any[symbol("#s2"),Any,2]],Any[]], :(begin  # none, line 1:
        unless rand(GetfieldNode(Base.Random,:GLOBAL_RNG,Base.Random.MersenneTwister),Bool)::Bool goto 0
        #s2 = $(Expr(:new, Nullable{Union()}, true))::Nullable{Union()}
        goto 1
        0: 
        #s2 = $(Expr(:new, Nullable{Float64}, false, 1.0))::Nullable{Float64}
        1: 
        return get(#s2::Union(Nullable{Float64},Nullable{Union()}))::Float64
    end::Float64))))

Of course you are right that it's better to use Nullables to avoid Union types in the first place, but nothing about the brevity of Union(Void,T) makes it more "efficient" than the Union of Nullables.

The code does not look any worse for Union(Void, Float64), though:

julia> myget(::Void) = error()
julia> myget(x) = x
julia> myf() = myget(randbool() ? nothing : 1.0)
julia> code_typed(myf,())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[symbol("#s2")],Any[Any[symbol("#s2"),Any,2]],Any[]], :(begin  # none, line 1:
        unless rand(GetfieldNode(Base.Random,:GLOBAL_RNG,Base.Random.MersenneTwister),Bool)::Bool goto 0
        #s2 = nothing
        goto 1
        0: 
        #s2 = 1.0
        1: 
        return myget(#s2::Union(Void,Float64))::Float64
    end::Float64))))

The LLVM code looks a bit better for myf() than f(); at least there are no calls to @allocobj. Time measurements also indicate myf() is faster once I used rand() < 1e-16 to get fewer exceptions. Neither holds candle to f2() = get(randbool() ? Nullable{Float64}() : Nullable(1.0)), where the get() call gets inlined. The return type optimized version would be f3() = let x :: Nullable{Float64}; randbool() ? (x = Nullable{Union()}()) : (x = Nullable(1.0)); get(x); end.

Anyhow, the point was that return type declarations mesh well with nullables and getting good performance amounts to adding one return type declaration to the obvious code. For some reason f3() isn't quite fast as f2(), but it's very close. The ratios I got using @time for loop were f: 359%, myf: 258%, f2: 100% and f3: 115%.

My most recent thinking is that you may actually want two distinct features here:

  1. return annotations on methods implicitly convert the return value to that type
  2. typed generic functions, which raise errors if any method violates the asserted signature

Example of 1:

f()::Float64 = 1

julia> f()
1.0

Example of 2 (very hypothetical syntax):

function f::Float64 end # no methods + guaranteed return type
f() = 1

julia> f()
ERROR: TypeError: typeassert: expected Float64, got Int64

The reasoning is this. There are two reasons you want return-type annotations:

  1. as a shortcut to make sure that every exit point of a method definition returns the same type
  2. to allow type inference to reason about the return type of a generic function better (e.g. convert)

For the former, you want convert behavior, while for the latter convert behavior is bad since it introduces very non-local effects – namely that the behavior of a method can be massively and unexpected changed by a type annotation in a totally unrelated place. Arguably, you should also only be able to put a "function type" on a generic function when it's created, not at any point in time.

See also: #210, #964, #8283.

:+1: @StefanKarpinski I like your breakdown of the cases very much, and I want both... also I think your hypothetical syntax looks good.

@StefanKarpinski I like the idea.

I guess to handle methods with different signatures we would need to allow the extending the constraint as well just like how we extend functions now (although much more generic).

Would we create another method-definition table?

Examples of what I'm thinking about.

function convert{T}(::Type{T})::T # and somehow we need to tell the difference between this and a normal function definition. Probably by replacing the function keyword or by a macro?

function +{T<:Number}(::T...)::T
function +{T<:Number}(::AbstractArray{T}...)::AbstractArray{T} # OK I know this example is stupid but you get the point....

Maybe something like this:

function convert{T}::{Type{T},Any}-->T end

Kind of depends on how the new syntax for union all ends up looking after #8974.

So, that means a convert with a Type as the first argument, any as second, must return that type? seems pretty intuitive, and I don't think --> would cause any parsing ambiguities...

(2) would be super cool to write performant numeric higher order functions, especially if parametric versions become available. But even a simple non-parameteric one would be great for starters.

One of the original reasons for return type annotations, not recently discussed here, is to easily determine from reading the signature what the return type is. Especially with a penchant for automatic conversion this can be hard to identify or verify from the body of a method, and seems like a silly thing to require reading implementation code to determine. This information, like the type of parameters in a dynamic language, could be given in a comment instead. But making it part of the language would standardize the presentation and have other uses described elsewhere. Sorry if all this is too obvious to state. Parametric return types might way beyond Julia's type system at this point and doesn't seem that crucial.

I think return types could play a very important part of any super-overloading strategy. For example, imagine a strategy which looks something like this:

using DataFrames

type CSVFile
  file::String
end

function read(csv::CSVFile, args...)::AbstractDataFrame
  readtable(csv.file, args...)
end

function read(csv::CSVFile)::AbstractArray
  readcsv(csv.file)
end

This kind of strategy could, for example, extend IO operations like read and write to any combination of input and output. This corresponds to (2) return types. See #11835

@bramtayl, is that intended for return-type overloading? How would it work?

@StefanKarpinski, I'm not sure I understand your question? An important component of that particular IO system above is to classify file names into individual types:

function classify_file_type(string)
  pieces = @> string split('.')
  extension = pieces[end]
  if extension == "csv"
    CsvFile(string)
  end
  if extension = "sqlite3"
    SQLiteFile(string)
  end
end
# and any many more

Of course, using file extensions is dangerous, but users can also just specify the file type themselves.
For example, my_db_file = SQLiteFile("my.db").

After this is implemented, a function call like read(external_file)::Julia_type and write(Julia_object)::external_file_type could choose from methods based on a user's particular desired inputs and outputs.

My more general point, though, is that with coordination with package authors as well as using return type within the function overloading system, Julia could implement overloading to a much larger extent than the present.

@bramtayl I don't see how that's related to return type declarations. You can already do what you're proposing. Why do you think this would allow to " implement overloading to a much larger extent than the present"?

Maybe another example would help? Imagine that we've defined a type for a database table:

type DataBaseTable
  conn::DatabaseHandle
  name::String
end

Let's say that external is a DataBaseTable

answer = external[[:column_1, :column_2]]::AbstractArray

could import the the table into Julia and then select columns 1 and 2.

But

answer = external[[:column_1, :column_2]]::DataBaseTable

wouldn't need to import anything into Julia. Instead, it could submit a select SQL command to the database to have the database create new table called answer with just columns 1 and 2, and return a DataBaseTable pointer to that object. For a practical implementation of this, see http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html. In Julia, this would correspond to a parallel overloaded version of DataFramesMeta built entirely for DataBaseTable types.

OK, I see. This is dispatch on return type, as mentioned above in this thread: https://github.com/JuliaLang/julia/issues/1090#issuecomment-12153946

Yup. Apparently, @johnmyleswhite arrived here first, which is surprising given his luke-warm reaction in JuliaDB/DBI.jl#8

That syntax already means something – it's a type assertion that the returned value is of some type. You can accomplish the same thing, however, by making the desired type an argument of the function and dispatching on that using ::Type{T}, e.g.: read(DataFrame, file, args...).

Dispatching on return types requires expressions to have well-defined types, which makes perfect sense in statically typed languages – where every expression has a type. In a dynamic language, expression don't have types, values do. As a result, dispatch on return types doesn't make sense since it's not generally meaningful to talk about the type context in which a call occurs.

read(DataFrame, file, args...)

could work, but somehow the syntax seems misleading because DataFrame is not an argument, but a return type. Maybe there's a way to do this with a bit of syntactic sugar.

I am new to Julia and this thread is very interesting. Is there a time scale for when return types will be implemented? I can't tell if there is consensus regarding how return types will be implemented.

I think it's a safe bet to operate on the assumption that the time scale is unknown.

That's unfortunate. I just listened to your presentation at juliacon 2015 https://www.youtube.com/watch?v=lf1_FhMR7xA&index=35&list=PLP8iPy9hna6Sdx4soiGrSefrmOPdUWixM were you talk about the problem with lifting for Nullable types and the need for return types. I come from the Statistics/R/C++ side and I think Nullable types are a welcome addition but I agree with the point you made about lifting. Even when declaring basic operators for example arithmetic, logical operators you certainly need to specify return types or you'll find yourself re-writing code over and over again for different cases. Otherwise you climb into some kind of rabbit hole of trying to infer types somehow fighting with inconsistent type returns and other nasties.

For Julia to be successful in the R community it definitely needs to do Nullable types simply and I think return types are important in trying to implement this.

What is the sticking point over implementing return types Julia? Is it because it is a very large effort or does it break Julia's functionality somehow? Or perhaps it is on how it should be implemented?

You mention simply tagging onto a database e.g. sqlite. What about Cassandra? It now has a C API that Julia can tag onto and its type system is sufficiently rich as to allow data to be represented however you want and of course it is a high performance big database. It may be possible to write an API that allows computation within the database itself from Julia using Julia as a command big data statistical analysis interface to Cassandra. I think I may be getting ahead of myself there.

An oversimplified summary might be: you can either have a dynamically typed language or you can have static return types. Once you don't have static return types, getting the semantics right becomes much harder.

If you're interested in getting up to speed on this issue, I'd recommend going through all of the currently open issues for Julia and figuring out how they fit together. After that, you'll be in a good position to understand where things stand.

Thanks for the tip

Get started with Jeff's talks at JuliaCon 2014 and 2015: https://www.youtube.com/user/JuliaLanguage/playlists

I am not sure if this is the right place to ask this question, but where is the "function" keyword/type implemented? I have been searching through the Julia source code and have not been able to find it.

No, questions go to julia-users mailing list although this one might go to julia-dev (not both though!).

@mauro3 thanks Jeff's talks are very good suggestion.

If this feature is implemented as originally described and one additionally defines return_type(f, argtypes) to return the declared return type of f(argtypes) if it exists and Any otherwise, then one could use this to define the return type of map more stably than with Base.return_types or with value-dependent computation. You could just do

@generated function mymap(f, x)
    T = return_type(f, (eltype(x),))
    :($T[f(x[i]) for i in 1:length(x)])
end

assuming https://github.com/JuliaLang/julia/pull/13412. Note that return_type only needs to work for leaf types for practical optimization purposes, defaulting to Any otherwise. That way the type changes only if the dispatch target changes.

This works even with functions like exp, which do not have a reasonable return type for the generic function more specific than Any. Using something like Stefan's example

function f::Float64 end

might seem like a reasonable restriction for exp initially, but then someone will want to define the exponential for BigFloats or complex numbers or matrices or some funny Lie algebras. Even for matrices you could have something like exp: SkewSymmetricMatrix -> RotationMatrix. This isn't ad hoc from the POV of Lie theory, but from the POV of the compiler's type system it definitely is. Then there's exp: Int -> Float64 and exp: BigInt -> BigFloat, which makes sense, but is also very ad hoc.

There is no hope in trying to reason about exp in the aggregate, but reasoning about individual methods is very doable, and seems to suffice.

Dispatching on return types could be implemented by adding a return_type keyword argument for all functions in base Julia

function add_together(number_1::Int, number_2::Int; return_type::Type = Number)
  if return_type <: Number
     number_1 + number_2
  else
    "$number_1$number_2" end end

Does #13412 allow a solution here? A naive idea is a return-type Function type...

+1 to this eventually being nailed down.
Return types would be really useful for the Atom package I just released. (Check it out!)
https://github.com/jamesdanged/Jude
In it, I run a full syntax parse in Javascript of your Julia codebase to resolve names. It's not a huge step to do type checking. Return types would make it a lot easier to do type inference statically in a lot of situations, which could help catch a lot of errors. It would also enable better auto complete and safer refactoring capabilities. I realize that in Julia, objects are typed, not variables. But it's already good practice to write type stable code, and a lot of code actually is implicitly typed.

Note that if function foo(...)::T is just syntactic sugar for a convert(T, ret)::T as @JeffBezanson indicated, it should be possible to make T an arbitrary expression computed from the parameters. e.g.

function foo{A<:Number,B<:Number}(a::A, b::B)::promote_type(A,B)
    ...
end

This is pretty important in order have type-generic code. We don't want to encourage a lot of hard-coded ::Float64 declarations. (People already habitually over-type their arguments, often from the mistaken impression that it helps performance.)

However, this means that @jamesdanged won't necessarily be able to use the type declarations in any useful way from his Javascript parser.

Though a good deal of code is generic across multiple input types

function foo{A<:Number,B<:Number}(a::A, b::B)::promote_type(A,B)
    ...
end

there's also plenty of code that is more simply generic that would easily benefit

function foo{T<:Number}(a::T, b::T)::T
    ...
end

Also, if promote_type(A,B) uses multiple dispatch

function promote_type(::Type{A}, ::Type{B})
  C
end
function promote_type(::Type{C}, ::Type{D})
  E
end

then tooling like Jude could probably parse this.

It's probably not possible or desirable to have everything statically typed, but I think there's definitely plenty of room to incrementally type.

Maybe I'm pointing out the obvious, but if we combine expressions for the return type, individual method return type annotation, and the requirement of monotonicity, the desired behavior needs to be clarified a bit. Assume

function foo{T}(x::T)::bar(T) ... end

Then monotonicity would require that for any type T1, T2, if T1<:T2, then bar(T1)<:bar(T2). So far so obvious. Now we add

function foo(x::Int8)::Int32 ... end

monotonicity would definitely require that for any T, if Int8<:T and T!=Int8, then Int32<:bar(T). But would we also require Int32<:bar(Int8)? At first glance, I'd say no. In fact,

function /{T<:Number}(x::T, y::T)::T ... end
function /{T<:Integer}(x::T, y::T)::Float64 ... end
# and probably some more cases

doesn't look too bad. However, picking up @yuyichao's convert example, here it would make sense for the general declaration to enforce the return type expression on all specialized methods. Do we need to distinguish two cases here?

Frankly, I think that a monotonicity requirement is unworkable and would change the nature of the language too much. (It's certainly not a constraint that one would want to enforce in all cases, so at the very least you'd need a way to specify which return-type declarations are supposed to be monotonic and which are not.) @JeffBezanson's initial suggestion of return types as mere syntactic sugar seems a lot more sensible to me, and doesn't change Julia semantics.

It's a bit more than syntactic sugar, isn't it? I assume you can somehow query it (return_type?). Here is a scenario where this would be convenient:

function lazy_map{T}(f, xs::Vector{T})
   R = return_type(f, T)
   rs = Vector{Future{R}}()
   for x in xs
      push!(rs, @remote_async_lazy_delayed f(x))
   end
   rs
end

Here I want to allocate a collection, using a specific type (here R) that I know only after the function f has finished. However, I don't want to wait until f has been called. I don't know how to do this without knowing (a good approximation of) f's return type ahead of time.

And for this, monotonicty is needed: If given a Vector{Number}, all individual results types should be<:return_type(f,Number).

If f is declared (or inferred) to return a type R, then it must do so. But that is surely a trivial statement? If f violates the declaration, or if type inference returns a wrong result, that would clearly a bug.

I assume you're instead speaking of different methods for the same function f, and want to impose certain conditions on the relation between (the return types) of different methods. For my case here, I'm only speaking of a single method f that might even be selected by the caller, or f might be a lambda expression that has only a single method.

Yes, I had an f with different methods in mind. From your example, it is not at all clear that should not be the case. (Unless T is forced to be a concrete type, of course.) But even with a single method for f, allowing an arbitrary expression in the return type annotation, one could do crazy stuff. And yes, that probably should be considered a bug if it violates monotonicity.

Whether - or to what extent - monotonicity should be enforced by the compiler is another question, though.

See e.g. recent discussion in #11034. This has been discussed _a lot_.

Declaring return type T would cause return values to be wrapped with convert(T, x)::T, so you could be certain the requested type would be returned, or you'd get an error. As I see it, this really is just syntactic sugar. There is no guarantee that you can "look up" this declared type, since after all the declaration is optional. I do not acknowledge the return_type function.

In practice, declaring return type T will certainly cause the compiler to infer a type at least as specific as T. However making this a semantic guarantee is going too far. To the extent you can call the mythical return_type, you have to be prepared for it to return Any whenever it feels like it.

To the extent you can call the mythical return_type, you have to be prepared for it to return Any whenever it feels like it.

I still think this is worth it.

Fair enough --- the only question is how much people would complain when the result changes from version to version, or perhaps if you statically compile a program instead of running it with the JIT.

How about a dumb implementation of return_type that only looks up the matching method and returns the return type that would be computed (possibly based on abstract argument types)?
Though I guess there might not always be a matching method for abstract enough argument types (or in the case of method ambiguity, there might be several).

I certainly don't want to tell people to add a method to return_type to make the return type of their function discoverable. To me that would be far worse than the mostly-theoretical problems I described.

I certainly don't want to tell people to add a method to return_type to make the return type of their function discoverable.

Well, right now people have to add a method to (the somewhat misnamed) promote_op for that purpose. (Ok, promote_op is only used in a few places, but still...) Having a return type annotation automatically generate a suitable return_type method sounds like an improvement to me. The question is what to do about functions without return type annotation. Taking the promote_op way to default to promote_type(ArgumentTypes...) is somewhat arbitrary (but seems to be a surprisingly good heuristic), Any seems like a safe but potentially inefficient choice. Relying on type inference in my eyes would be way too fragile. Maybe it could even be worth to make the fact that no return type annotation is provided discoverable to switch to other heuristics then.

My new package ResultTypes has a really nice use case for this.

Ah, that is quite nice!

Yes, that's really nice. On the input direction, this is similar to why I think it would be nice to do implicit conversion to the declared type of an argument, e.g.:

f(x::Nullable{Int} = 0) = x

f() # => returns Nullable(0)

Otherwise you have to write this as

f(x::Nullable{Int} = Nullable(0)) = x

which just seems obnoxiously redundant and unJulian.

so I'd be able to write something like...

function groupby{A,B}(xs::Vector{A}, f::A -> B)::Dict{B, Vector{A}}
  result = Dict{B, Vector{A}}()
  for x in xs
    key = f(x)
    result[key] = push!(get(result, key, Vector{A}()), x)
  end
  result
end

No, this is not about function types which contain their return type. The best you can do is:

function groupby{A,B}(xs::Vector{A}, f, ::B)::Dict{B, Vector{A}}
  result = Dict{B, Vector{A}}()
  for x in xs
    key = f(x)
    result[key] = push!(get(result, key, Vector{A}()), x)
  end
  result
end

i.e. you'd need to manually pass in an instance of B.

I was kinda hoping if the function knew its return type, I'd be able to use it as the key in the dictionary.

Either way, I think it is an improvement. If nothing else, it should help with the documentation.

Actually, this works:

julia> fn(g, x)::Dict{Symbol, typeof(g(x))} = Dict{Symbol, typeof(g(x))}()
fn (generic function with 1 method)

julia> @code_warntype fn(sin, 5.6)
# ... looks good

Note that type-inference is fine irrespective of whether you use the return-type annotation.

That is pretty awesome!

I was hoping I had a way to look at a function and see what it would return for a type.
Sin returns number, but.. without handing it a number and calling it, I can't know it does that - However I'm pretty sure the compiler _does_ know.

Anyway - it is no big deal, it would make some code a bit more efficient, and make a couple of things a bit nicer. But overall? I can work around it pretty easily.

fn(g, x)::Dict{Symbol, typeof(g(x))} = Dict{Symbol, typeof(g(x))}()

Not sure what this tries to achieve but FYI g(x) is called twice.

My understanding was that the purpose is to annotate the return type. I'm aware that this comes at the cost of two function evaluations. Is there a way which avoids the two evaluations? (The frowned upon) Base.return_types cannot do it:

julia> fn{X}(g, x::X)::Dict{Symbol, Base.return_types(g, Tuple{X})[1]} = Dict{Symbol, typeof(g(x))}()
fn (generic function with 1 method)

julia> @code_warntype fn(sin, 5)
# looks bad

But yes, probably better to leave the annotation off.

Hi guys, I recently came across the Sparrow programming language a static compiled programming language with very simple syntax, but with hper-metaprogramming capabilities, that is being able to go from runtime->compile time. I think that this is revolutionary step forward in computing and his approach would solve the challenges Julia has been facing. Since the language is static, it already has return types but it does not suffer from the problems caused by the compiler not recognising types for example the challenges with data frame (http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/).

Its something the D community has been seriously looking at (https://forum.dlang.org/thread/[email protected]). You can find the link to the creator's PhD thesis - one of the best texts I've read on programming period. I think this approach is something to be seriously considered.

Since the language is static, it already has return types

I might be missing something not having read all the details, but this sounds to me like a well-understood tradeoff: yes, with a static type system you can have arrow types, and instead of performance problems from a lack of type information, you get a compile-time error.

The case of DataFrame getindex could be solved, for example, with something like a @generated function that could also see (constant) argument values.

I'm hesitant to comment on an old, closed thread, but I would note that several of us have converged on translating DataFrames into row-iterators that generate rows as well-typed tuples (which is possible now that we have Nullable scalar objects that can handle missing values). I think it's likely that we'll simply remove getindex from a future variant of DataFrames rather than try to improve on it.

I didn't realise that the DataFrame issue was solved. @JeffBezanson why would you get a compile-time error?

@johnmyleswhite does this mean that in the future DataFrames will be able to efficiently process tables with columns of arbitrary type or will the types be bound to a specific set?

Let's have that conversation elsewhere and in a few months from now. :)

Fair enough

why would you get a compile-time error?

That's generally what happens when a compiler for a statically-typed language can't figure out the type of something. Which will happen eventually. To say much more we'd have to drill down on what "approach" you're talking about more specifically. For example Sparrow supports calling functions on constants at compile time, but the name-to-type mapping in a DataFrame isn't a compile-time constant.

+1 To row iterators. I've had success using tuples and NamedTuples with that approach.

As far as I can see from his thesis he has developed semantics for the user to decorate what should be done in compile time and run time as well as the default state where the compiler works out what should be done depending on the nature and typing of the inputs. "If a metaprogram has bugs, it will cause the compiler to crash or behave incorrectly" as would be the case in a run-time system. But I think he is the best person to speak about this aspect

I think however, it would certainly be possible to create rules that deal with this aspect as a sort of compile time exception handling

I have a completely type stable row iterator for DataFrames in Query here. It uses NamedTuples to represent rows and works great. You can write arbitrary queries against DataFrames with Query and there are no type instability problems because that iterator essentially solves that problem

@JeffBezanson one of the major points of statically-typed languages is to get errors. They show you that most probably you got something wrong. In my view, if you want efficiency, you would better be in this category of languages. Moreover, I would argue that it's really important for one to easily figure out how a particular code would be translated in the machine language (at least to some degree). If the language is allowed to do a lot of "clever" things that makes it harder for one person to actually understand the performance characteristics of the code. That inevitably leads to less efficient code. I think the example illustrated by @johnmyleswhite in http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/ is perfect here.

If, on the other hand, the language allows you to write simple code that everyone understands how it translates to machine code (again, to some degree), then, the language must make it pretty clear what run-time is. That means the language needs to create a clear distinction between run-time and compile-time.

Yes, the distinction between run-time and compile-time can be annoying in some cases, but I would argue that those cases appear very seldom in practice. After all, we know that our programs will never be like "compiler, please solve my problem (and you should be able to infer which problem I'm referring to)"

My 2 cents,
LucTeo

Let's not continue this broad design conversation on this issue as every comment sends a notification to many people.

@lucteo: Julia is not currently a static language and won't become one in the future, so I'm not sure what the point of your comments is. The alleged simplicity of having semantically different compile-time and run-time phases is contradicted by the many confusions and troubles that arise from this distinction in static languages: virtual vs non-virtual methods, overloading vs dispatch, etc. – these are some of the most chronically problematic and hard-to-explain issues in static languages.

I'll reply on the julia-dev list.

Another problem is if the argument list goes too long and you want to put the ::ReturnType at the second line, Julia complains about invalid syntax: "ERROR: syntax: invalid "::" syntax".

If you put the closing ) on the last line then it parses:

julia> f(a, b, c
       )::Int = 3
f (generic function with 1 method)

It may be more aesthetic to do it this way:

julia> f(
           a, b, c
       )::Int = 3
f (generic function with 1 method)
Was this page helpful?
0 / 5 - 0 ratings