Julia: Filter is not documented

Created on 3 Jun 2016  路  6Comments  路  Source: JuliaLang/julia

After some experimentation, I found that Base.Filter creates an iterator from a predicate and another iterator. Example:

Filter(iseven, 1:7)

creates an iterator over the integers 2, 4, 6.

I couldn't find this in the documentation. But I think it's very convenient.

Note that this is distinct from the function filter, which instantiates a filtered collection, instead of an iterator.

doc

Most helpful comment

Iterators.filter is still not documented. Now that this is exported, it should get a docstring.

All 6 comments

A comment on your note, according to the source, this is not true.
filter(flt, itr) = Filter(flt, itr)
If none of the more special filter implementations match, filter is implemented as Filter.

I am new to julia, so this is just a thought, but this is inconsistent and should be at least be better documented.
Why should filter(flt, 1:4) return an array but filter(flt,it) an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.

I also noticed this inconsistency. Why not have filter always return an iterator? You can always force the Array to precipitate by calling collect(filter(flt,it)) if required.

Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity. Returning an Array results in O(N) memory complexity, returning an iterator gives the user the choice to process every returned object one at a time, keeping the memory complexity down to O(1).

julia> methods(filter)
# 7 methods for generic function "filter":
filter(f, a::Array{T<:Any,1}) at array.jl:952
filter(f, Bs::BitArray) at bitarray.jl:1746
filter(f, As::AbstractArray) at array.jl:937
filter(f, d::Associative) at dict.jl:273
filter(f, s::Set) at set.jl:166
filter(f, s::AbstractString) at strings/basic.jl:279
filter(flt, itr) at iterator.jl:112
julia> filter(iseven,1:4) #The datatype of 1:4 is a subtype of AbstractArray, so filter return array
2-element Array{Int64,1}:
 2
 4

julia> filter(iseven,(i for i in 1:4))  #input is an iterator, so filter return iterator
Filter{Base.#iseven,Base.Generator{UnitRange{Int64},##7#8}}(iseven,Base.Generator{UnitRange{Int64},##7#8}(#7,1:4))

julia> filter(iseven,[i for i in 1:4]) #input is an array, so filter return array
2-element Array{Int64,1}:
 2
 4

@berndbohmeier

Why should filter(flt, 1:4) return an array but filter(flt,it) an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.

julia> typeof(1:4)
UnitRange{Int64}

julia> UnitRange{Int64} <: AbstractArray
true

UnitRange is a subtype of AbstractArray in Julia, so filter return an array.

@krcools

Why not have filter always return an iterator? You can always force the Array to precipitate by calling collect(filter(flt,it)) if required.
Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity.

In my opinion, small or mediate sized arrays are more frequent used than large arrays among Julia users. If you want control memory, iterator can be used.

I am also new to Julia, just try to offer what I know. Hope it will help.

We discussed filter's behaviour over iterators in #13712. Returning an iterator is a performance gotcha, because if the result is iterated over N times, the filtering function will be called N times per element.

I would suggest that filter always return a collection, and Iterators.ifilter (currently non-existent) return an iterator. This would mirror map and Iterators.imap.

If you plan to iterate over a filtered collection more than once and do not want to apply the filter multiple times, you should collect the outcome of filter. This is explicit in the code and indicates that the author is more worried about CPU cycles than memory. I strongly believe that in computational science the CPU and memory complexity should be part of the API, meaning you should be able to read it from the code.

I agree that there might be cases where a single function doing the filtering and collecting is more efficient than calling collect(filter(...)). Reading the Julia style docs a function called collect_filter is suggested...

Iterators.filter is still not documented. Now that this is exported, it should get a docstring.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  路  3Comments

helgee picture helgee  路  3Comments

Keno picture Keno  路  3Comments

i-apellaniz picture i-apellaniz  路  3Comments

yurivish picture yurivish  路  3Comments