It would be good to be able to iterate over columns (rows) as follows:
for col in columns(A)
# some code
end
This can be part of a more generic interface slices(A, dims)
, with rows(A)
and columns(A)
being alias for slices(A, 1)
and slices(A, 2)
respectively.
With this we can consider deprecating mapslices(f, A; dims)
in favor of map(f, slices(A, dims))
You'll save one line of code compared to
for i in 1:size(M,2)
col = slice(M,:,i)
# some code
end
Would there be any other advantages?
It makes the code clearer and easier to understand. I would prefer it over slice
, though I would rather call it columns
.
for col in columns(matrix)
# some code
end
and
for row in rows(matrix)
# some code
end
If anything, I guess there would be a single method taking the index of the dimension, e.g.
for row in eachslice(array, 1)
# some code
end
If slice(M,:,i)
were expensive, then this iterator could be implemented more efficiently. Not saying we should do this, but it's a possible argument for this kind of thing.
+1 for @mooxmirror's suggestion.
And I agree that the main benefit of implementing these functions is the code that is clearer and easier to understand.
This is somewhat similar to sparse matrix iteration: http://docs.julialang.org/en/release-0.4/stdlib/arrays/#Base.nzrange. If something like this were to be added, could it be designed to work efficiently for sparse and other fancy matrices too?
This also potentially allows compiler to optimize loops to be stride efficient. For example when it sees the following code:
for row in rows(matrix)
# some code
end
it could transform it to the following
for col in columns(matrix')
# some code
end
Just to add that I'd also enjoy having something along these lines. I particular like the
.jl
for row in rows(matrix)
#
end
notation
+1
The twoline version suggested by @andreasnoack of course does the job, but in a rather cryptic way given iterating over rows is such a natural operation.
But I am not sure if we should adopt the sub
or the slice
behaviour, keeping or dropping singleton dimensions respectively. I guess when I iterate over rows(matrix)
I would actually expect row vectors, i.e. keeping the singleton dimensions.
I also agree @nalimilan that we should have a general iterator for higher dimensional arrays. Alternatively to eachslice
I would call it slices
, and furthermore also have subs
for keeping the singleton dimension, as natural extensions to slice
and sub
.
OT: Like @nalimilan I would naturally expect eachslice(m, 1)
to return rows, much like slicedim(m, 1, rownumber)
also selects a row. But mapslices(f, m, [1])
applies f to columns. On the other hand mapslices(sum, m, [1])
is consistent with sum(m,1)
, the column-summation, but actually 1 is the dimension index of the rows and 2 of the columns...
Is it just me or is anyone else also confused by these semantics?
@axsk +1.
It would be nice to have consistency in dimension convention across the methods you have mentioned.
Well rows
and columns
is clearer of what is doing while slice
would be a bit more obscure.
I agree with @nalimilan with having a generic iterator but I think that rows
and columns
must be present.
@axsk Yeah It confuses me too.
Well rows and columns is clearer of what is doing while slice would be a bit more obscure.
I agree with @nalimilan with having a generic iterator but I think that rows and columns must be present.
As long as we don't provide nrow(x) = size(x, 1)
and ncol(x) = size(x, 2)
, we shouldn't offer eachrow(x) = eachslice(x, 1)
and eachcol(x) = eachslice(x, 2)
either.
I'll put in my vote for the following:
If there's interest in this, I can put together a PR for this and a RowIterator
, which is similar. Should I define any other methods? What file might this live in?
I think it would be great. I've implemented something similar for myself and find it very useful. It would be good to have it in Base.
A possibility might be to allow (kw argument?) to specify whether a view or a copy is returned for each column.
@cortner That would certainly be possible... might be best to include that flag as a parameter of the ColumnIterator
type. Maybe:
for c in columns(X, copy=true)
# something
end
I think it would be type unstable to use a keyword argument so it might be better to have two different functions for the view and the copy version.
@tbreloff Why not offer a general slice iterator as proposed by @nalimilan? We could still have the handy names for 2D.
Agree about slice iterator + simplified terminology for rows and columns.
Re type instability: is there nothing on the horizon that will fix type instabilities for kwargs?
@andreasnoack Yes a generic slice iterator would be better in terms of LOC, but might require much more thought to get right. I'm willing to attempt it and compare.
type unstable to use a keyword argument
separate methods is fine by me... happy to follow what the group wants.
The type instability has (semantically) nothing to do with kwargs but that you want to return a different type depending on the value of an argument.
Was thinking about:
immutable ColumnIterator{T, A<:AbstractVecOrMat}
a::A
end
...
Base.next(itr::ColumnIterator{:slice}, i::Int) = slice(itr.a, :, i), i+1
Base.next(itr::ColumnIterator{:copy}, i::Int) = itr.a[:,i], i+1
...
columns(a::AbstractVecOrMat) = ColumnIterator{:slice}(a)
columns(a::AbstractVecOrMat, should_copy) = ColumnIterator{:copy}(a)
# does a slice
for c in columns(X)
# something
end
# does a copy
for c in columns(X, true)
# something
end
And we could get tricky and create one array and just overwrite it at every iteration for the copy version, to reduce allocations. This could be generalized to a SliceIterator
I'm sure.
Adding an unused argument seems like an abuse of dispatch. I would prefer different names over that ou
r alternatively have the second argument be a type like Array
, ArrayView
etc indicating the type you want returned from the iterator.
I had a thought about this a while ago, see https://github.com/gasagna/ArraySlices.jl . It might be useful to anyone willing to take a go at this.
This would also be useful for comprehensions over rows/columns.
a simple way would be
rows(M::Matrix) = map(x->reshape(getindex(M, x, :), :, size(M)[2]), 1:size(M)[1])
columns(M::Matrix) = map(x->getindex(M, :, x), 1:size(M)[2])
@wookay except now that you can't take the transform of a non-numeric array, rows
breaks.
It's julia -e 'println("$(Dates.year(now()))")'
. We really need row and column iterators over matrices.
Open a PR? Or make a package with it?
@tkelman I've included the functionality in my own package. As for a PR (presumably to Base), there is ample code here and it's still been under debate for nearly 2 years. Not exactly encouraging.
@sbromberger yeah, non-iterating rows has updated it by reshape. thanks.
I think that we would add these rows, columns, ColumnIterator functionalities to https://github.com/JuliaCollections/Iterators.jl package.
Nice solution advertised in https://github.com/JuliaLang/julia/issues/23306#issuecomment-324708199
@wookay Iterators.jl has been deprecated in favour of IterTools.jl
I've open a issue about this feature request (which is nearly 2 years old) https://github.com/JuliaCollections/IterTools.jl/issues/11
I was considering using Julia for scientific computation (in my design of evolutionary algorithms for optimization), and a so simple operation like that (that I use a lot in Numpy) is not implemented after more than two years, It's very discouraging. It has not to be included in Base, it is enough to be in a optional recommended package to use it, but I have not found it in any package until now. For people working a lot in two-dimensional matrixes like it, it is a real pain not to be able to iterate easily in them.
Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people; much more important stuff gets done all the time, and indeed the rate of evolution in Julia is insane. Since this is clearly all-important to you, I'd say just do it!
https://github.com/bramtayl/JuliennedArrays.jl
Does this not cover this functionality?
Ah yes, I'd forgotten that it also had those iterators. It's a brilliant package, highly recommended.
Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people
It is quite important, basic, common, and it has been implemented - there is a PR for this in IterTools.jl waiting to be merged, but I am not sure why package maintainers are reluctant to review and merge it.
Thanks all for your responses. I am very grateful for them, and for the language itself. I like specially its flexibility through packages. I expect to write my own ones when I will have more experience in Julia. Thanks for the information about the mentioned packages.
I agree it is important, and quite basic/common. In my eyes I think it is too basic for putting it in its own package, which I have to count on being maintained.
This is why I usually write for loops or other mentioned here, but I am annoyed every time it comes up.
Of course you can write it yourself, but this also holds for many other functions in Base, (e.g. slice...) and I believe a scientific language needs to assist in such basic tasks...
Considering JuliennedArrays, this seems way more specific and complex. And IterTools seems to disagree that it fits there (in which point I agree as well).
I don't fully agree on slimming out Base and moving everything to different packages for a language targeting scientific computing (thats another discussion though).
But imho this should at least be available in LinearAlgebra.
I fully agree with @axsk. It is just so much more readable to write something like:
xs = collect(columns(X))
compared to:
sx = [X[:,i] for i in 1:size(X,2)]
@StefanKarpinski has mentioned before that it adds only a bit of code. In my opinion, the difference between C and Julia to a large part is made up out of seemingly small conveniences like this, leading to a big difference when all put together.
I also would like to have columns
and rows
iterators in LinearAlgebra
Having eachslice
seems like a reasonable API; then eachrow
and eachcol
can be shorthands.
I believe eachcolumn
is more readable but if there are another *col*
functions in LinearAlgebra base you may want to prefer eachcol
for consistency.
slices()
, rows()
and columns()
is another alternative that I find to be more slick.
The each
prefix is quite standard to indicate that an iterator is returned instead of a collection. The verb slices
indicates that what is returned is an eager representation of the collection of slices. That may seem nice to have but it forces allocation of all view objects for all slices whereas an iterator that yields one slice at a time can either reuse a single view object or, with some clever compiler tricks, determine when a view doesn't escape and eliminate the allocation that way.
Opened a PR (#29749) which I since updated to reflect the naming conventions in this thread. So, eachrow(M::AbstractVecOrMat)
and eachcol(M::AbstractVecOrMat)
and a general eachslice(A::AbstractArray; dims)
which return an iterator over views into the rows/columns of M
.
Closed by #29749. Thanks @arnavs!
Most helpful comment
I fully agree with @axsk. It is just so much more readable to write something like:
xs = collect(columns(X))
compared to:
sx = [X[:,i] for i in 1:size(X,2)]
@StefanKarpinski has mentioned before that it adds only a bit of code. In my opinion, the difference between C and Julia to a large part is made up out of seemingly small conveniences like this, leading to a big difference when all put together.