Julia: Support guards/filters in comprehensions

Created on 9 Mar 2012  Â·  39Comments  Â·  Source: JuliaLang/julia

Quoting @rtzui in #547:
On the other hand more powerful:

x=[1,2,3,4,510,1,2,3,1,9]
y=[sqrt(a) | a ∈ x, x%2==0]

(Discussion from @pao)
There is an open syntactic question. The Haskell syntax (shown above) has guards as boolean expressions separated by commas and interspersed with membership assertions, which works as long as you can tell the difference between an expression evaluating to a boolean and a loop assignment statement. Python uses the keyword "if" to precede each guard. I'm sure there are other approaches as well.

(From @JeffBezanson)
It's maybe not ideal, but you can accomplish this as:

x=[1,2,3,4,510,1,2,3,1,9]
y=[sqrt(a) | a in x[x%2==0] ]

Most helpful comment

I know there's been a lot of discussion around final 0.5 feature triage, but I think this would be a great candidate. I'd hate for us to release 0.5 with generators only to have python-users everywhere wonder why this didn't come with it. From the discussion in https://github.com/JuliaLang/julia/pull/15023, it sounds like a pretty straightforward parsing re-write, which does limit the number of devs who'd feel comfortable making the change, but I think it'd be well worth it.

All 39 comments

Jeff's workaround for this particular case is fine, but in general guards could include multiple comprehension variables, as in the Cartesian product except the diagonal:

Prelude> [(a,b) | a <- [1..4], b <- [1..4], a /= b]
[(1,2),(1,3),(1,4),(2,1),(2,3),(2,4),(3,1),(3,2),(3,4),(4,1),(4,2),(4,3)]
>>> [(a,b) for a in range(1,5) for b in range(1,5) if a != b]
[(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]

I don't see how to get there with the current list comprehension capabilities.

I call what we have array comprehensions. The size of the output array is determined by the ranges before the computation starts, and that would be difficult with guards. One could always extract a subarray at the end though, if guards are used, and they should not be too difficult to implement either.

We do need to debate if we want this, and the actual syntax. @StefanKarpinski Given that you came up with the original comprehensions idea, what are your thoughts?

Filters and guards don't mix with multidimensional comprehensions. For 1-d comprehensions, I'm not convinced that it's really worth the additional syntax. What's the case for making it part of the syntax instead of just using a filter function?

/edit. here was something else, but it was BS.

I don't see the advantage of guards to array slices a la x[x>3], but it don't see the problem with multi dimensions since the slices already operate on multi dimension arrays.

multiple comprehension variables would be cool thou, but I think it would be cooler this way:

((1..4) insert appropriate operator here (1..4))[ (x,y) -> x /= y]

Because it doesn't work in higher dimensions: you can't just excise arbitrary items out of a matrix and still get a matrix.

One could set values to be excised to zero or some other default provided by the user. One could implement diag, triu, tril, spdiags, etc. using comprehensions. This kind of stuff would be great for experimenting and exploration. However, these would be bad implementations, since even though the running time complexity is the same, one ends up consuming many more flops than necessary.

-viral

On 09-Mar-2012, at 3:24 PM, Stefan Karpinski wrote:

Because it doesn't work in higher dimensions: you can't just excise arbitrary items out of a matrix and still get a matrix.


Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/550#issuecomment-4413593

julia> x=[1 2 3 4; 5 6 7 9]
2x4 Int64 Array:
 1  2  3  4
 5  6  7  9

julia> [ a | a = x ]
{1, 5, 2, 6, 3, 7, 4, 9}

Today the output is always one dimensional. Or am i missing something?

Above comments re: full array comprehensions seem reasonable. There are contexts (like the Cartesian product minus the diagonal) where you'd be okay with that array flattened, and some where you would really prefer to never evaluate "invalid" pairs:

[ does_something_useful_but_throws_exception_if_equal(a, b) | a in 1:4, b in 1:4 ]

It might be the case that we'd rather this idiom be written with a double-for-loop-with-inner-if, but adding a guard to this expression is definitely more concise Cartesian for loop and if statement, which was mentioned in #330 but doesn't seem to be documented in the Control Flow section of the manual.

@rtzui Try [(a,b) | a in 1:4, b in 1:4]. The final array has one dimension per expression on the RHS of the bar.

I've written a small patch for the manual to mention the Cartesian for feature, JuliaLang/julialang.github.com#8.

I guess this could be implemented by transforming 1d comprehensions with guards to loops using push.

I guess this could be implemented by transforming 1d comprehensions with guards to loops using push.

That's a solid idea. Another option would be to pre-allocate the whole thing and then shrink at the end. Might want to choose between the two approaches based on a threshold. (Guards would still only work in the 1d case.)

This is pretty absurd, but with https://gist.github.com/3677645 (UPDATE 2013-04-03: or Monads.jl) you can do:

julia> @mdo begin
         a <- MList(1:3)
         b <- MList(1:3)
         guard(MList, a!=b)
         return (a,b)
       end
MList([(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)])

Would be good have a while sentence for list comprehension too.
This is the proposal for python: http://www.python.org/dev/peps/pep-3142/

I would like to write Symmetric matrices using list comprehension:

symmat = [ f(x[i],y[j]) for i in 1:length(x), j in i:length(y) ] 

or why not, something like

symmat = [ f(x[i],y[j]) for i in 1:length(x), j in 1:length(y) if i<=j ] 

Maybe some operator, like :, can be overload in order to generate the combinations

symmat_with_diag = [ f(pair) for pair in x:y ] 
symmat_without_diag = [ f(pair) for pair in x:x ] 

Maybe symmetric matrix type can be a vector for only the upper o lower part, with special getindex and setindex methods. And the ability of be printed has complete matrix or list.

S = [1, -1, 2, -2, 0]

As mentioned in dupe issues, comprehensions in other languages allow something along the lines of

[x for x in S if x > 0]

The work around for this problem seems to be

S[[x for x in S] .> 0]

Is this going to be the preferred choice? Or is there a more Julian way?

There is of course the filter method.

filter(x -> x > 0, S)

but this is getting further from the beauty of comprehension syntax.

S[[x for x in S] .> 0] is _O(n²)_, isn't it ?

It's O(2n), but that 2 is pretty important.

Yeah, I think the whole I idea of comprehension filtering is that the filtering is happening during construction, like using an if-else block inside a for loop instead of running a full for loop and filtering afterwards.

I've come around to the idea that having an if clause in a comprehension should force the result to be one-dimensional and grow as necessary. In particular, I just found myself writing Pkg code and wanting to do this:

[ver for (ver,info) in avail if head == info.sha1]

This is pretty hard to express otherwise. It can be done with filter and then map, but it's a bit awkward:

[keys(filter((ver,info)->head == info.sha1, avail))...]

This took me several minutes to get right. Having an if clause flatten comprehensions would allow us to express things that we can't right now, such as writing [ f(x,y) for x=v, y=w if true ] and getting a vector. Currently, this requires either a reshape or a pair of for loops, both of which are a rather awkward ways to express something that's simple to write in other languages.

+1 for allowing if clauses in comprehensions by forcing them to be 1D.

I hate to suggest syntax, but [... if true] seems like an awkward spelling to get a construction that's different than the existing array comprehension. Is [[ ... ]] ambiguous?

It's a bit awkward, but it just kind of falls out and isn't the worst idiom I've ever seen. [[ ... ]] is totally non-obvious.

another +1 for allowing filters and forcing 1D array

One more +1 for the if clause and vectorized output.

[[ ... ]] is totally non-obvious.

Totally fair, hence the "hating to suggest syntax" preface.

I am interested in trying to create this functionality for three cases if there isn't someone more capable with time to address it although it might be several months before I could progress far. I would initially attempt 3 versions being Python-like 1d vector comprehension, Dict comprehension, and a Dict comprehension with a 1d output:

# [func(x) for x in vector if condition]

func(x) = x^3
vector = [1:10]
result = Array(Any,0)
for i = vector
    if i < 5 
        push!(result, func(i))
    end
end
result


# {func(K,V) for K,V in dict if condition}

func(x,y) = y^3
dict = {"a"=>1, "b"=>2, "c"=>3}
result = Dict{Any, Any}()
for i = keys(dict)
    if dict[i] < 500
        result[i] = func(i, dict[i])
    end
end
result


# 1d [func(K,V) for K,V in dict if condition]

func(x,y) = y^3
dict = {"a"=>1, "b"=>2, "c"=>3}
result = Array(Any,0)
for i = keys(dict)
    if dict[i] < 500
        push!(result, func(i, dict[i]))
    end
end
result

Does this seem useful?

It's a bit of a challenging job, but should be doable. Code that does similar things is already an there. It will involve a bunch of hacking on the parser, which is written in scheme, and I can't guarantee that @JeffBezanson will be happy with your patch in the end, but it will certainly get the ball rolling on this feature, even if it takes some iteration.

You're definitely not stepping on anyone's toes. We're not very territorial around here. There's more than enough work to go around ;-)

PS. One thing thats suprised me is how unwrapping high end functions can get big speed improvements. One exercise correlates two rows from 330 csv files with large NA counts. The dataframe solution takes 1.75 seconds, with the best R solution at 1.53 but unwrapping it and reading in text drops it to 1.15 seconds.

Yes, in some sense it's a strength that you can do this and get good performance, but on the other hand it's a weakness that you have to. We're not planning on leaving it that way though.

@john9631 the comments re: cheat sheet, blog, etc. are great but not germane to having more generalized comprehensions. They're excellent items for discussion on either the julia-users or julia-dev mailing lists!

I am commencing this work and will document my successes and failures using iJulia. At this point failure seems the more likely but I'm looking forward to the challenge.

If anyone should see me making a mistake please feel free to provide information :-)

Has been any update on this subject?
May be I am too late but I would like to make a suggestion for the use of if in array comprehensions:

L = [ i for i in R if i%2 > 0]       # OK: returns a 1D array
L = [ i+j for i in R, j in S if i>j]    # ERROR: 'if' not allowed in matrix comprehensions
L = [ i+j for i in R for j in S if i>j ]    # OK: returns a 1D array (a la Python)

if would not be allowed in matrix comprehensions.
The use of two for would return always a 1d array (a la Python).
That also would allow the second variable to depend on the first.

L = [ i+j for i=1:10 for j=i:10 if (i+j)%2==0 ]   # OK: returns a 1D array

I think in this way array comprehensions are perfectly clear and it doesn't mess the current syntax.

I like Raul's idea for syntax.

Any updates on this? I would help if I wasn't such a noob! It would make Julia more attractive I think... I used list comprehensions with guards all the time in Python and they saved a lot of time/space.

S[S.> 0]

What is the future/present of this now that generators exist?

I know there's been a lot of discussion around final 0.5 feature triage, but I think this would be a great candidate. I'd hate for us to release 0.5 with generators only to have python-users everywhere wonder why this didn't come with it. From the discussion in https://github.com/JuliaLang/julia/pull/15023, it sounds like a pretty straightforward parsing re-write, which does limit the number of devs who'd feel comfortable making the change, but I think it'd be well worth it.

:+1:

I think @StefanKarpinski's proposed placement of if in #16389 is cleaner than Raul's trailing if, i.e.

x^2 if x % 3 == 0 for x = 1:100 # versus
x^2 for x = 1:100 if x % 3 == 0
Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

helgee picture helgee  Â·  3Comments

ararslan picture ararslan  Â·  3Comments

sbromberger picture sbromberger  Â·  3Comments