After some experimentation, I found that Base.Filter
creates an iterator from a predicate and another iterator. Example:
Filter(iseven, 1:7)
creates an iterator over the integers 2, 4, 6.
I couldn't find this in the documentation. But I think it's very convenient.
Note that this is distinct from the function filter
, which instantiates a filtered collection, instead of an iterator.
A comment on your note, according to the source, this is not true.
filter(flt, itr) = Filter(flt, itr)
If none of the more special filter implementations match, filter is implemented as Filter.
I am new to julia, so this is just a thought, but this is inconsistent and should be at least be better documented.
Why should filter(flt, 1:4)
return an array but filter(flt,it)
an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.
I also noticed this inconsistency. Why not have filter
always return an iterator? You can always force the Array
to precipitate by calling collect(filter(flt,it))
if required.
Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity. Returning an Array
results in O(N) memory complexity, returning an iterator gives the user the choice to process every returned object one at a time, keeping the memory complexity down to O(1).
julia> methods(filter)
# 7 methods for generic function "filter":
filter(f, a::Array{T<:Any,1}) at array.jl:952
filter(f, Bs::BitArray) at bitarray.jl:1746
filter(f, As::AbstractArray) at array.jl:937
filter(f, d::Associative) at dict.jl:273
filter(f, s::Set) at set.jl:166
filter(f, s::AbstractString) at strings/basic.jl:279
filter(flt, itr) at iterator.jl:112
julia> filter(iseven,1:4) #The datatype of 1:4 is a subtype of AbstractArray, so filter return array
2-element Array{Int64,1}:
2
4
julia> filter(iseven,(i for i in 1:4)) #input is an iterator, so filter return iterator
Filter{Base.#iseven,Base.Generator{UnitRange{Int64},##7#8}}(iseven,Base.Generator{UnitRange{Int64},##7#8}(#7,1:4))
julia> filter(iseven,[i for i in 1:4]) #input is an array, so filter return array
2-element Array{Int64,1}:
2
4
@berndbohmeier
Why should filter(flt, 1:4) return an array but filter(flt,it) an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.
julia> typeof(1:4)
UnitRange{Int64}
julia> UnitRange{Int64} <: AbstractArray
true
UnitRange
is a subtype of AbstractArray in Julia, so filter
return an array.
@krcools
Why not have filter always return an iterator? You can always force the Array to precipitate by calling collect(filter(flt,it)) if required.
Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity.
In my opinion, small or mediate sized arrays are more frequent used than large arrays among Julia users. If you want control memory, iterator can be used.
I am also new to Julia, just try to offer what I know. Hope it will help.
We discussed filter
's behaviour over iterators in #13712. Returning an iterator is a performance gotcha, because if the result is iterated over N times, the filtering function will be called N times per element.
I would suggest that filter
always return a collection, and Iterators.ifilter
(currently non-existent) return an iterator. This would mirror map
and Iterators.imap
.
If you plan to iterate over a filtered collection more than once and do not want to apply the filter multiple times, you should collect the outcome of filter. This is explicit in the code and indicates that the author is more worried about CPU cycles than memory. I strongly believe that in computational science the CPU and memory complexity should be part of the API, meaning you should be able to read it from the code.
I agree that there might be cases where a single function doing the filtering and collecting is more efficient than calling collect(filter(...)). Reading the Julia style docs a function called collect_filter is suggested...
Iterators.filter
is still not documented. Now that this is exported, it should get a docstring.
Most helpful comment
Iterators.filter
is still not documented. Now that this is exported, it should get a docstring.