Julia: enumerate() like equivalents for iterating over columns/rows of a matrix

Created on 27 Dec 2015 · 45Comments · Source: JuliaLang/julia

It would be good to be able to iterate over columns (rows) as follows:

for col in columns(A)
    # some code
end

This can be part of a more generic interface slices(A, dims), with rows(A) and columns(A) being alias for slices(A, 1) and slices(A, 2) respectively.

With this we can consider deprecating mapslices(f, A; dims) in favor of map(f, slices(A, dims))

arrays speculative

Source

AzamatB

👍17

Most helpful comment

I fully agree with @axsk. It is just so much more readable to write something like:

xs = collect(columns(X))

compared to:

sx = [X[:,i] for i in 1:size(X,2)]

@StefanKarpinski has mentioned before that it adds only a bit of code. In my opinion, the difference between C and Julia to a large part is made up out of seemingly small conveniences like this, leading to a big difference when all put together.

sdewaele on 22 Sep 2018

👍9

All 45 comments

You'll save one line of code compared to

for i in 1:size(M,2)
    col = slice(M,:,i)
    # some code
end

Would there be any other advantages?

andreasnoack on 28 Dec 2015

It makes the code clearer and easier to understand. I would prefer it over slice, though I would rather call it columns.

for col in columns(matrix)
    # some code
end

and

for row in rows(matrix)
    # some code
end

lnsp on 28 Dec 2015

👍2

If anything, I guess there would be a single method taking the index of the dimension, e.g.

for row in eachslice(array, 1)
    # some code
end

nalimilan on 28 Dec 2015

👍2

If slice(M,:,i) were expensive, then this iterator could be implemented more efficiently. Not saying we should do this, but it's a possible argument for this kind of thing.

StefanKarpinski on 28 Dec 2015

+1 for @mooxmirror's suggestion.
And I agree that the main benefit of implementing these functions is the code that is clearer and easier to understand.

AzamatB on 28 Dec 2015

This is somewhat similar to sparse matrix iteration: http://docs.julialang.org/en/release-0.4/stdlib/arrays/#Base.nzrange. If something like this were to be added, could it be designed to work efficiently for sparse and other fancy matrices too?

mauro3 on 3 Jan 2016

👍1

This also potentially allows compiler to optimize loops to be stride efficient. For example when it sees the following code:

for row in rows(matrix)
    # some code
end

it could transform it to the following

for col in columns(matrix')
    # some code
end

AzamatB on 18 Jan 2016

👍8

Just to add that I'd also enjoy having something along these lines. I particular like the

.jl for row in rows(matrix) # end

notation

cortner on 10 Feb 2016

+1
The twoline version suggested by @andreasnoack of course does the job, but in a rather cryptic way given iterating over rows is such a natural operation.

But I am not sure if we should adopt the sub or the slice behaviour, keeping or dropping singleton dimensions respectively. I guess when I iterate over rows(matrix) I would actually expect row vectors, i.e. keeping the singleton dimensions.

I also agree @nalimilan that we should have a general iterator for higher dimensional arrays. Alternatively to eachslice I would call it slices, and furthermore also have subs for keeping the singleton dimension, as natural extensions to slice and sub.

OT: Like @nalimilan I would naturally expect eachslice(m, 1) to return rows, much like slicedim(m, 1, rownumber) also selects a row. But mapslices(f, m, [1]) applies f to columns. On the other hand mapslices(sum, m, [1]) is consistent with sum(m,1), the column-summation, but actually 1 is the dimension index of the rows and 2 of the columns...
Is it just me or is anyone else also confused by these semantics?

axsk on 24 Feb 2016

👍1

@axsk +1.
It would be nice to have consistency in dimension convention across the methods you have mentioned.

AzamatB on 25 Feb 2016

Well rows and columns is clearer of what is doing while slice would be a bit more obscure.
I agree with @nalimilan with having a generic iterator but I think that rows and columns must be present.
@axsk Yeah It confuses me too.

Mice7R on 10 Mar 2016

👍2

Well rows and columns is clearer of what is doing while slice would be a bit more obscure.
I agree with @nalimilan with having a generic iterator but I think that rows and columns must be present.

As long as we don't provide nrow(x) = size(x, 1) and ncol(x) = size(x, 2), we shouldn't offer eachrow(x) = eachslice(x, 1) and eachcol(x) = eachslice(x, 2) either.

nalimilan on 10 Mar 2016

I'll put in my vote for the following:

tmp

If there's interest in this, I can put together a PR for this and a RowIterator, which is similar. Should I define any other methods? What file might this live in?

tbreloff on 27 Jun 2016

I think it would be great. I've implemented something similar for myself and find it very useful. It would be good to have it in Base.

A possibility might be to allow (kw argument?) to specify whether a view or a copy is returned for each column.

cortner on 27 Jun 2016

@cortner That would certainly be possible... might be best to include that flag as a parameter of the ColumnIterator type. Maybe:

for c in columns(X, copy=true)
    # something
end

tbreloff on 27 Jun 2016

I think it would be type unstable to use a keyword argument so it might be better to have two different functions for the view and the copy version.

@tbreloff Why not offer a general slice iterator as proposed by @nalimilan? We could still have the handy names for 2D.

andreasnoack on 27 Jun 2016

Agree about slice iterator + simplified terminology for rows and columns.

Re type instability: is there nothing on the horizon that will fix type instabilities for kwargs?

cortner on 27 Jun 2016

@andreasnoack Yes a generic slice iterator would be better in terms of LOC, but might require much more thought to get right. I'm willing to attempt it and compare.

type unstable to use a keyword argument

separate methods is fine by me... happy to follow what the group wants.

tbreloff on 27 Jun 2016

The type instability has (semantically) nothing to do with kwargs but that you want to return a different type depending on the value of an argument.

KristofferC on 27 Jun 2016

Was thinking about:

immutable ColumnIterator{T, A<:AbstractVecOrMat}
    a::A
end

...
Base.next(itr::ColumnIterator{:slice}, i::Int) = slice(itr.a, :, i), i+1
Base.next(itr::ColumnIterator{:copy}, i::Int) = itr.a[:,i], i+1
...

columns(a::AbstractVecOrMat) = ColumnIterator{:slice}(a)
columns(a::AbstractVecOrMat, should_copy) = ColumnIterator{:copy}(a)

# does a slice
for c in columns(X)
    # something
end

# does a copy
for c in columns(X, true)
    # something
end

And we could get tricky and create one array and just overwrite it at every iteration for the copy version, to reduce allocations. This could be generalized to a SliceIterator I'm sure.

tbreloff on 27 Jun 2016

👍2

Adding an unused argument seems like an abuse of dispatch. I would prefer different names over that ou
r alternatively have the second argument be a type like Array, ArrayView etc indicating the type you want returned from the iterator.

KristofferC on 27 Jun 2016

👍1

I had a thought about this a while ago, see https://github.com/gasagna/ArraySlices.jl . It might be useful to anyone willing to take a go at this.

gasagna on 28 Jun 2016

👍4

This would also be useful for comprehensions over rows/columns.

jebej on 16 May 2017

a simple way would be

rows(M::Matrix) = map(x->reshape(getindex(M, x, :), :, size(M)[2]), 1:size(M)[1])
columns(M::Matrix) = map(x->getindex(M, :, x), 1:size(M)[2])

wookay on 20 Jul 2017

❤1

@wookay except now that you can't take the transform of a non-numeric array, rows breaks.

It's julia -e 'println("$(Dates.year(now()))")'. We really need row and column iterators over matrices.

sbromberger on 21 Jul 2017

👍3

Open a PR? Or make a package with it?

tkelman on 21 Jul 2017

@tkelman I've included the functionality in my own package. As for a PR (presumably to Base), there is ample code here and it's still been under debate for nearly 2 years. Not exactly encouraging.

sbromberger on 21 Jul 2017

@sbromberger yeah, non-iterating rows has updated it by reshape. thanks.

I think that we would add these rows, columns, ColumnIterator functionalities to https://github.com/JuliaCollections/Iterators.jl package.

wookay on 21 Jul 2017

Nice solution advertised in https://github.com/JuliaLang/julia/issues/23306#issuecomment-324708199

timholy on 24 Aug 2017

@wookay Iterators.jl has been deprecated in favour of IterTools.jl

I've open a issue about this feature request (which is nearly 2 years old) https://github.com/JuliaCollections/IterTools.jl/issues/11

femtotrader on 27 Oct 2017

I was considering using Julia for scientific computation (in my design of evolutionary algorithms for optimization), and a so simple operation like that (that I use a lot in Numpy) is not implemented after more than two years, It's very discouraging. It has not to be included in Base, it is enough to be in a optional recommended package to use it, but I have not found it in any package until now. For people working a lot in two-dimensional matrixes like it, it is a real pain not to be able to iterate easily in them.

dmolina on 2 Sep 2018

👍2

Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people; much more important stuff gets done all the time, and indeed the rate of evolution in Julia is insane. Since this is clearly all-important to you, I'd say just do it!

timholy on 2 Sep 2018

👍1

https://github.com/bramtayl/JuliennedArrays.jl

Does this not cover this functionality?

cortner on 2 Sep 2018

Ah yes, I'd forgotten that it also had those iterators. It's a brilliant package, highly recommended.

timholy on 2 Sep 2018

Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people

It is quite important, basic, common, and it has been implemented - there is a PR for this in IterTools.jl waiting to be merged, but I am not sure why package maintainers are reluctant to review and merge it.

AzamatB on 2 Sep 2018

Thanks all for your responses. I am very grateful for them, and for the language itself. I like specially its flexibility through packages. I expect to write my own ones when I will have more experience in Julia. Thanks for the information about the mentioned packages.

dmolina on 2 Sep 2018

I agree it is important, and quite basic/common. In my eyes I think it is too basic for putting it in its own package, which I have to count on being maintained.
This is why I usually write for loops or other mentioned here, but I am annoyed every time it comes up.
Of course you can write it yourself, but this also holds for many other functions in Base, (e.g. slice...) and I believe a scientific language needs to assist in such basic tasks...

Considering JuliennedArrays, this seems way more specific and complex. And IterTools seems to disagree that it fits there (in which point I agree as well).

I don't fully agree on slimming out Base and moving everything to different packages for a language targeting scientific computing (thats another discussion though).
But imho this should at least be available in LinearAlgebra.

axsk on 2 Sep 2018

👍4

I fully agree with @axsk. It is just so much more readable to write something like:

xs = collect(columns(X))

compared to:

sx = [X[:,i] for i in 1:size(X,2)]

sdewaele on 22 Sep 2018

👍9

I also would like to have columns and rows iterators in LinearAlgebra

ekinakyurek on 11 Oct 2018

👍1

Having eachslice seems like a reasonable API; then eachrow and eachcol can be shorthands.

StefanKarpinski on 11 Oct 2018

👍2

I believe eachcolumn is more readable but if there are another *col* functions in LinearAlgebra base you may want to prefer eachcol for consistency.

ekinakyurek on 11 Oct 2018

slices(), rows() and columns() is another alternative that I find to be more slick.

AzamatB on 11 Oct 2018

The each prefix is quite standard to indicate that an iterator is returned instead of a collection. The verb slices indicates that what is returned is an eager representation of the collection of slices. That may seem nice to have but it forces allocation of all view objects for all slices whereas an iterator that yields one slice at a time can either reuse a single view object or, with some clever compiler tricks, determine when a view doesn't escape and eliminate the allocation that way.

StefanKarpinski on 11 Oct 2018

Opened a PR (#29749) which I since updated to reflect the naming conventions in this thread. So, eachrow(M::AbstractVecOrMat) and eachcol(M::AbstractVecOrMat) and a general eachslice(A::AbstractArray; dims) which return an iterator over views into the rows/columns of M.