Julia: Support guards/filters in comprehensions

Created on 9 Mar 2012 · 39Comments · Source: JuliaLang/julia

Quoting @rtzui in #547:
On the other hand more powerful:

x=[1,2,3,4,510,1,2,3,1,9]
y=[sqrt(a) | a ∈ x, x%2==0]

(Discussion from @pao)
There is an open syntactic question. The Haskell syntax (shown above) has guards as boolean expressions separated by commas and interspersed with membership assertions, which works as long as you can tell the difference between an expression evaluating to a boolean and a loop assignment statement. Python uses the keyword "if" to precede each guard. I'm sure there are other approaches as well.

(From @JeffBezanson)
It's maybe not ideal, but you can accomplish this as:

x=[1,2,3,4,510,1,2,3,1,9]
y=[sqrt(a) | a in x[x%2==0] ]

Source

pao

Most helpful comment

I know there's been a lot of discussion around final 0.5 feature triage, but I think this would be a great candidate. I'd hate for us to release 0.5 with generators only to have python-users everywhere wonder why this didn't come with it. From the discussion in https://github.com/JuliaLang/julia/pull/15023, it sounds like a pretty straightforward parsing re-write, which does limit the number of devs who'd feel comfortable making the change, but I think it'd be well worth it.

quinnj on 9 May 2016

👍10 👎1

All 39 comments

Jeff's workaround for this particular case is fine, but in general guards could include multiple comprehension variables, as in the Cartesian product except the diagonal:

Prelude> [(a,b) | a <- [1..4], b <- [1..4], a /= b]
[(1,2),(1,3),(1,4),(2,1),(2,3),(2,4),(3,1),(3,2),(3,4),(4,1),(4,2),(4,3)]

>>> [(a,b) for a in range(1,5) for b in range(1,5) if a != b]
[(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]

I don't see how to get there with the current list comprehension capabilities.

pao on 9 Mar 2012

I call what we have array comprehensions. The size of the output array is determined by the ranges before the computation starts, and that would be difficult with guards. One could always extract a subarray at the end though, if guards are used, and they should not be too difficult to implement either.

We do need to debate if we want this, and the actual syntax. @StefanKarpinski Given that you came up with the original comprehensions idea, what are your thoughts?

ViralBShah on 9 Mar 2012

Filters and guards don't mix with multidimensional comprehensions. For 1-d comprehensions, I'm not convinced that it's really worth the additional syntax. What's the case for making it part of the syntax instead of just using a filter function?

StefanKarpinski on 9 Mar 2012

/edit. here was something else, but it was BS.

I don't see the advantage of guards to array slices a la x[x>3], but it don't see the problem with multi dimensions since the slices already operate on multi dimension arrays.

multiple comprehension variables would be cool thou, but I think it would be cooler this way:

((1..4) insert appropriate operator here (1..4))[ (x,y) -> x /= y]

rtzui on 9 Mar 2012

Because it doesn't work in higher dimensions: you can't just excise arbitrary items out of a matrix and still get a matrix.

StefanKarpinski on 9 Mar 2012

One could set values to be excised to zero or some other default provided by the user. One could implement diag, triu, tril, spdiags, etc. using comprehensions. This kind of stuff would be great for experimenting and exploration. However, these would be bad implementations, since even though the running time complexity is the same, one ends up consuming many more flops than necessary.

-viral

On 09-Mar-2012, at 3:24 PM, Stefan Karpinski wrote:

Because it doesn't work in higher dimensions: you can't just excise arbitrary items out of a matrix and still get a matrix.

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/550#issuecomment-4413593

ViralBShah on 9 Mar 2012

julia> x=[1 2 3 4; 5 6 7 9]
2x4 Int64 Array:
 1  2  3  4
 5  6  7  9

julia> [ a | a = x ]
{1, 5, 2, 6, 3, 7, 4, 9}

Today the output is always one dimensional. Or am i missing something?

rtzui on 9 Mar 2012

Above comments re: full array comprehensions seem reasonable. There are contexts (like the Cartesian product minus the diagonal) where you'd be okay with that array flattened, and some where you would really prefer to never evaluate "invalid" pairs:

[ does_something_useful_but_throws_exception_if_equal(a, b) | a in 1:4, b in 1:4 ]

It might be the case that we'd rather this idiom be written with a ~~double-for-loop-with-inner-if, but adding a guard to this expression is definitely more concise~~ Cartesian for loop and if statement, which was mentioned in #330 but doesn't seem to be documented in the Control Flow section of the manual.

pao on 9 Mar 2012

@rtzui Try [(a,b) | a in 1:4, b in 1:4]. The final array has one dimension per expression on the RHS of the bar.

pao on 9 Mar 2012

I've written a small patch for the manual to mention the Cartesian for feature, JuliaLang/julialang.github.com#8.

pao on 10 Mar 2012

I guess this could be implemented by transforming 1d comprehensions with guards to loops using push.

JeffBezanson on 4 May 2012

I guess this could be implemented by transforming 1d comprehensions with guards to loops using push.

That's a solid idea. Another option would be to pre-allocate the whole thing and then shrink at the end. Might want to choose between the two approaches based on a threshold. (Guards would still only work in the 1d case.)

StefanKarpinski on 4 May 2012

This is pretty absurd, but with https://gist.github.com/3677645 (UPDATE 2013-04-03: or Monads.jl) you can do:

julia> @mdo begin
         a <- MList(1:3)
         b <- MList(1:3)
         guard(MList, a!=b)
         return (a,b)
       end
MList([(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)])

pao on 9 Sep 2012

Would be good have a while sentence for list comprehension too.
This is the proposal for python: http://www.python.org/dev/peps/pep-3142/

diegozea on 4 Apr 2013

I would like to write Symmetric matrices using list comprehension:

symmat = [ f(x[i],y[j]) for i in 1:length(x), j in i:length(y) ]

or why not, something like

symmat = [ f(x[i],y[j]) for i in 1:length(x), j in 1:length(y) if i<=j ]

Maybe some operator, like :, can be overload in order to generate the combinations

symmat_with_diag = [ f(pair) for pair in x:y ]

symmat_without_diag = [ f(pair) for pair in x:x ]

Maybe symmetric matrix type can be a vector for only the upper o lower part, with special getindex and setindex methods. And the ability of be printed has complete matrix or list.

diegozea on 10 Apr 2013

S = [1, -1, 2, -2, 0]

As mentioned in dupe issues, comprehensions in other languages allow something along the lines of

[x for x in S if x > 0]

The work around for this problem seems to be

S[[x for x in S] .> 0]

Is this going to be the preferred choice? Or is there a more Julian way?

milktrader on 6 Jul 2013

👍2

There is of course the filter method.

filter(x -> x > 0, S)

but this is getting further from the beauty of comprehension syntax.

milktrader on 6 Jul 2013

👍1

S[[x for x in S] .> 0] is _O(n²)_, isn't it ?

diegozea on 6 Jul 2013

It's O(2n), but that 2 is pretty important.

StefanKarpinski on 6 Jul 2013

Yeah, I think the whole I idea of comprehension filtering is that the filtering is happening during construction, like using an if-else block inside a for loop instead of running a full for loop and filtering afterwards.

quinnj on 6 Jul 2013

I've come around to the idea that having an if clause in a comprehension should force the result to be one-dimensional and grow as necessary. In particular, I just found myself writing Pkg code and wanting to do this:

[ver for (ver,info) in avail if head == info.sha1]

This is pretty hard to express otherwise. It can be done with filter and then map, but it's a bit awkward:

[keys(filter((ver,info)->head == info.sha1, avail))...]

This took me several minutes to get right. Having an if clause flatten comprehensions would allow us to express things that we can't right now, such as writing [ f(x,y) for x=v, y=w if true ] and getting a vector. Currently, this requires either a reshape or a pair of for loops, both of which are a rather awkward ways to express something that's simple to write in other languages.

StefanKarpinski on 27 Aug 2013

+1 for allowing if clauses in comprehensions by forcing them to be 1D.

johnmyleswhite on 29 Aug 2013

I hate to suggest syntax, but [... if true] seems like an awkward spelling to get a construction that's different than the existing array comprehension. Is [[ ... ]] ambiguous?

pao on 29 Aug 2013

It's a bit awkward, but it just kind of falls out and isn't the worst idiom I've ever seen. [[ ... ]] is totally non-obvious.

StefanKarpinski on 30 Aug 2013

another +1 for allowing filters and forcing 1D array

milktrader on 2 Sep 2013

One more +1 for the if clause and vectorized output.

lsorber on 3 Sep 2013

[[ ... ]] is totally non-obvious.

Totally fair, hence the "hating to suggest syntax" preface.

pao on 3 Sep 2013

I am interested in trying to create this functionality for three cases if there isn't someone more capable with time to address it although it might be several months before I could progress far. I would initially attempt 3 versions being Python-like 1d vector comprehension, Dict comprehension, and a Dict comprehension with a 1d output:

# [func(x) for x in vector if condition]

func(x) = x^3
vector = [1:10]
result = Array(Any,0)
for i = vector
    if i < 5 
        push!(result, func(i))
    end
end
result


# {func(K,V) for K,V in dict if condition}

func(x,y) = y^3
dict = {"a"=>1, "b"=>2, "c"=>3}
result = Dict{Any, Any}()
for i = keys(dict)
    if dict[i] < 500
        result[i] = func(i, dict[i])
    end
end
result


# 1d [func(K,V) for K,V in dict if condition]

func(x,y) = y^3
dict = {"a"=>1, "b"=>2, "c"=>3}
result = Array(Any,0)
for i = keys(dict)
    if dict[i] < 500
        push!(result, func(i, dict[i]))
    end
end
result

Does this seem useful?

john9631 on 14 Oct 2013

It's a bit of a challenging job, but should be doable. Code that does similar things is already an there. It will involve a bunch of hacking on the parser, which is written in scheme, and I can't guarantee that @JeffBezanson will be happy with your patch in the end, but it will certainly get the ball rolling on this feature, even if it takes some iteration.

StefanKarpinski on 14 Oct 2013

You're definitely not stepping on anyone's toes. We're not very territorial around here. There's more than enough work to go around ;-)

PS. One thing thats suprised me is how unwrapping high end functions can get big speed improvements. One exercise correlates two rows from 330 csv files with large NA counts. The dataframe solution takes 1.75 seconds, with the best R solution at 1.53 but unwrapping it and reading in text drops it to 1.15 seconds.

Yes, in some sense it's a strength that you can do this and get good performance, but on the other hand it's a weakness that you have to. We're not planning on leaving it that way though.

StefanKarpinski on 16 Oct 2013

@john9631 the comments re: cheat sheet, blog, etc. are great but not germane to having more generalized comprehensions. They're excellent items for discussion on either the julia-users or julia-dev mailing lists!

pao on 16 Oct 2013

I am commencing this work and will document my successes and failures using iJulia. At this point failure seems the more likely but I'm looking forward to the challenge.

If anyone should see me making a mistake please feel free to provide information :-)

john9631 on 14 Nov 2013

Has been any update on this subject?
May be I am too late but I would like to make a suggestion for the use of if in array comprehensions:

L = [ i for i in R if i%2 > 0]       # OK: returns a 1D array
L = [ i+j for i in R, j in S if i>j]    # ERROR: 'if' not allowed in matrix comprehensions
L = [ i+j for i in R for j in S if i>j ]    # OK: returns a 1D array (a la Python)

if would not be allowed in matrix comprehensions.
The use of two for would return always a 1d array (a la Python).
That also would allow the second variable to depend on the first.

L = [ i+j for i=1:10 for j=i:10 if (i+j)%2==0 ]   # OK: returns a 1D array

I think in this way array comprehensions are perfectly clear and it doesn't mess the current syntax.

RaulDurand on 16 Nov 2014

I like Raul's idea for syntax.

Any updates on this? I would help if I wasn't such a noob! It would make Julia more attractive I think... I used list comprehensions with guards all the time in Python and they saved a lot of time/space.

chriscoey on 7 Jan 2015

S[S.> 0]

FGFW on 15 Mar 2016

What is the future/present of this now that generators exist?

diegozea on 15 Mar 2016

👍2

quinnj on 9 May 2016

👍10 👎1

:+1:

kmsquire on 9 May 2016

I think @StefanKarpinski's proposed placement of if in #16389 is cleaner than Raul's trailing if, i.e.

x^2 if x % 3 == 0 for x = 1:100 # versus
x^2 for x = 1:100 if x % 3 == 0

ararslan on 18 May 2016

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

The devdocs page for functions is outdated

ararslan · 3Comments

Should `where` be a keyword?

yurivish · 3Comments

Dates.parse(::AbstractString, ::DateFormat) removed without deprecation

omus · 3Comments

Broken booleans as numbers?

TotalVerb · 3Comments

0.5 `@test_skip` causes CI to fail

sbromberger · 3Comments