Julia: make `any` and `all` interface consistent with `map`

Created on 22 Jan 2017  Â·  16Comments  Â·  Source: JuliaLang/julia

i'm not sure whether it's better to ask this on Discourse, but has it been discussed to have Function{N} where N is the number of input arguments?

my use case is that i'd like to make the interface to any and all consistent with that of map. currently, the predicate versions of the former can not take a variable number of iterables like the latter can. rather, you have to manually zip them up. no big deal, but consistency is nice.

the problem is that any and all have methods which specialize on Arrays and regions. and so a simple change of the signature to end with a var arg gets confused when there are exactly two. a conceptually simple fix to this would be to dispatch on how many input args the predicate took.

broadcast design speculative

Most helpful comment

That's not a general rule for deciding what methods should be supported — it's just something that should be taken into account when deciding if we should combine two distinct and orthogonal operations into one method. But even then, my _only_ was far too strong there. The existence of an optimization isn't the only thing to consider.

I hope you'd agree that the primary purpose of all (and sum and friends) is to perform a reduction over a single data structure. With that in mind, the extension from all(X) to all(isnull, X) is not only is efficient and convenient, but it reads nicely, too. And it's a nice pattern all throughout the reduce shorthand functions: sum(abs, X), etc.

It's not as obvious to me that all(|, A, B) should mean all(a | b for (a,b) in zip(A, B)), nor is it as obvious that it's as high in demand.

All 16 comments

or make dims a keyword argument for the array/region signatures...

Functions have their own types, but any function can be defined to have arbitrarily many methods with any number of arguments. So saying that, for example, typeof(+) <: Function{2}, isn't really meaningful; + is intuitively binary but you can define +(a, b, c), at which point typeof(+) would no longer be a subtype of Function{2}. Perhaps I'm misunderstanding your request, but it seems to me that what you're describing would require a massive change to the type system.

Functions have their own types, but any function can be defined to have arbitrarily many methods with any number of arguments.

I the suggestion in this issue might come from a misunderstanding of the differences between a function and a method.

This is basically #17168 and #16191.
If changing the behavior of any and all specifically is considered useful, we can make this issue about that.

great to see parameterizing Function has already been discussed.

using a dims keyword would make the interface inconsistent with mean and friends.

title of this issue changed to reflect the desire to make any and all consistent with map.

using a dims keyword would make the interface inconsistent with mean and friends.

Not if we also switched to using a dims keyword for mean and friends at the same time.

Wouldn't a keyword argument incur too large a penatly for all and any?

changing mean and friends to use a keyword for the region would be hugely disruptive. i'd suggest doing that only if it is decided to not parameterize Function.

@nalimilan it's not clear to me from the performance tips whether or not there is a penalty for keyword args. it says there is not if you call a function that has them with only positional args, but what if you call that same function with them?

lastly, could someone please elaborate on why Function types and generic functions are like oil and water ?

The best way to find out whether the penalty is significant is to do some benchmarking.

could someone please elaborate on why Function types and generic functions are like oil and water ?

See section 4.6 of https://github.com/JeffBezanson/phdthesis/blob/master/main.pdf.

I'm pretty strongly against this idea — that is, allowing all(f, A, B, C) to mean all(args->f(args...), zip(A, B, C)). We've been generally working towards decoupling APIs into discrete operations, not combining them. I think it's fair to say that reductions operate on one data structure.

We allow mapping a function at the same time _only_ because there's a major optimization to be had… and it's an optimization that we don't have a good alternative for. In this case, there's no inherent reason why zipping shouldn't be as fast as a varargs.

what is the "major optimization" with varargs in map?

That's not a general rule for deciding what methods should be supported — it's just something that should be taken into account when deciding if we should combine two distinct and orthogonal operations into one method. But even then, my _only_ was far too strong there. The existence of an optimization isn't the only thing to consider.

I hope you'd agree that the primary purpose of all (and sum and friends) is to perform a reduction over a single data structure. With that in mind, the extension from all(X) to all(isnull, X) is not only is efficient and convenient, but it reads nicely, too. And it's a nice pattern all throughout the reduce shorthand functions: sum(abs, X), etc.

It's not as obvious to me that all(|, A, B) should mean all(a | b for (a,b) in zip(A, B)), nor is it as obvious that it's as high in demand.

my question still stands. what's the optimization with varargs in map? if there isn't much of one, i'd suggest changing its interface to use zip.

Allowing multiple arguments to map is quite traditional.

I did this on a branch (kf/multiany for those interested, but not really in a usable state) because I thought I wanted it, but frankly I don't like it. The biggest question is what the truncation behavior should be if you pass multiple arguments of unequal length. E.g. zip truncates by default, and though that seems ok for any perhaps, it doesn't quite seem right for all, because something like all(==, a, b) sure looks like it would make sure all the elements are equal. So you might think that that should just throw an error, but turns out that behavior is actually quite useful in a number of cases, so you introduce a keyword argument for whether the arguments should be equal, but now you basically just have a keyword argument that turns end-of-iterator into an unconditional error. Also you've now introduced data-dependent errors, because oftentimes any/all, they will short circuit, so it didn't even help you find the bug that you passed unequal lengths. And you have to now remember what the default of the keyword arguments are.

All that to say, I don't like it, so here's a simpler design, I think:
We'll define:

splat(f) = args->f(args...)

and people can write

all(splat(|), zip(itrs...))

That's not too many extra characters and the behavior is quite clear (truncation)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  Â·  3Comments

TotalVerb picture TotalVerb  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

felixrehren picture felixrehren  Â·  3Comments

m-j-w picture m-j-w  Â·  3Comments