Julia: (Row)Vector equality with Matrices

Created on 21 May 2017  Â·  15Comments  Â·  Source: JuliaLang/julia

Right now, RowVectors are implicitly embedded within Matrices, but not so for Vectors. This leads to funky behaviors such as:

julia> a = Vector(1:2)
2-element Array{Int64,1}:
 1
 2

julia> am = Matrix(2, 1); am[:] = 1:2; am
2×1 Array{Any,2}:
 1
 2

julia> a == am
false

julia> a' == am'
true

There are many ways we could try to fix this. Two take two extremal approaches, we could:

  • On the conservative side, we could just define ==(Matrix, Vector) and be done with it

  • Personally I'd rather push for automatic "do what makes sense" equality checking for types that have natural isomorphisms. E.g. if I have an Array{T,N}, I think it makes sense to define equality with an Array{T,N+1} that has a trailing singleton dimension, as the former is naturally embedded within the space of the second.

I'd be happy to submit a PR implementing whatever we reach consensus on. Hopefully this is a small change to just equality and doesn't get sidetracked into anything more fundamental.

Pinging a random subset of the LinAlg posse @stevengj @jiahao @mbauman

arrays

Most helpful comment

Let me clarify my position:

  • There should be conceptual embeddings T^0 --> T^1 --> T^2 --> T^3 --> ... from n-dimensional arrays to (n+1)-dimensional arrays for all n, into the leading dimensions of the higher space.

  • There should be a conceptual embedding of T* --> T^2 of row vectors into matrices as the second dimension.

  • This embedding guides how indexing should behave and how equality should behave – i.e. we should make the behavior match as much as possible.

  • In particular, this issue: vectors should be == to column matrices with the same contents, and row vectors should be == to row matrices with the same contents.

  • Also, indexing (aka #14770):

    • I do not have a problem with omitting an index into a trailing singleton dimension
    • I do not have a problem with indexing past the last dimension of an array with a 1
    • I want to get rid of "generalized linear indexing" in the sense of indexing into an array with fewer indices than it has dimensions with the last index used as a linear index into the remaining dimensions – unless the omitted dimensions all have size 1 (in which case it is ok)
  • If you ask for the size of an array past its number of dimensions you get 1 (as we have forever).

I believe this is a pretty coherent view in which the embedding of lower dimensions into higher ones is maintained but code unexpectedly "working" for the wrong dimension of argument is avoided. Yes, you can potentially pass an (n+1)-array into a loosely typed routine that expects an n-array and it might not error – but only if the last dimension is singleton, and then it will actually do exactly what you expect. What you cannot do under these rules, is pass an (n+1)-array with a non-singleton last dimension and have that second dimension effectively ignored, leading to unexpected results and hard-to-find bugs instead of errors. I believe there was quite a bit of support for this view in #14770.

[Edited to generalize the example at the end since non-column matrices can be linearly indexed, so it was a bad example. Eliminating linear indexing altogether is whole other can of worms.]

All 15 comments

Yeah - a bit more consistency on whether arrays with singleton indices are the "same" or not would be awesome.

RowVevtor complicates things because it has a non-trailing singleton dimension. This also gets interesting (to me, at least) when you combine this with non-one based indexing.

I for one hated that MATLAB made all these things equivalent and enjoyed moving to Julia where vectors and matrices were different. Broadcasting seems to cover the cases (that I need) where you want to be relaxed about this, but I see that sometimes you probably want (at least some variant of) == to be relaxed about this also, which is more efficient than all(a .== b).

RowVectors are naturally embedded into Matrices as well, so you'd consider [1,2,3]' == [1 2 3] by the same criteria. I'm not too concerned about this making vectors and column matrices "the same" since they're still quite easily distinguishable, e.g. via dispatch.

Seeing as RowVector already passes the equality check, I wasn't thinking about how to include that within my proposal. :)

~@StefanKarpinski coming back to this, I disagree with what you're saying about [1,2,3]' == [1,2,3]. If that is true, then we either break transitivity of == or we have some truly funky business going on with matrices where a == a' for non-square matrices:~

~If [1, 2, 3] == Matrix([1,2,3]), and [1,2,3]' == Matrix([1,2,3])', then we would have to have that Matrix([1,2,3]) == Matrix([1,2,3])', which seems like it should fail, since we shouldn't have Matrices equal to eachother with different shapes.~

~A RowVector and a Vector, when viewed from the 2-dimensional viewpoint should be qualitatively different objects. It's only when we view them from the 1-dimensional viewpoint that they collapse down into the same thing, IMO.~

EDIT: I misread Stefan's post and this is all wrong

~EDIT: After writing that last sentence, I see that we have an ambiguity as to whether [1,2,3] == [1,2,3]' is viewing the objects from a 1-dimensional or 2-dimensional viewpoint. Dang. My gut reaction is to say "no", but I don't have a good reason behind why yet.~

Also wrong

I did NOT say that [1,2,3]' == [1,2,3] – that's crazy talk. I wrote [1,2,3]' == [1 2 3] – note the spaces on the right, not commas. I.e. a row vector is equal to a row matrix with equal contents.

Oh wow, mental parsing failure. Yes, I agree with everything you said then.

I'm in favor of this change.

I know you're hoping to avoid a fundamental design discussion, but I think this decision is intrinsically coupled with the decision at #14770. Allowing vectors to equal column matrices currently matches their behavior since we allow them to be indexed with trailing singletons. Equality matches behavior.

But if we disallow indexing with trailing singletons, then that means to me that we're deciding that vectors should not behave like matrices, so neither should they be equal.

Last I checked, my position on #14770 was the minority.

Let me clarify my position:

  • There should be conceptual embeddings T^0 --> T^1 --> T^2 --> T^3 --> ... from n-dimensional arrays to (n+1)-dimensional arrays for all n, into the leading dimensions of the higher space.

  • There should be a conceptual embedding of T* --> T^2 of row vectors into matrices as the second dimension.

  • This embedding guides how indexing should behave and how equality should behave – i.e. we should make the behavior match as much as possible.

  • In particular, this issue: vectors should be == to column matrices with the same contents, and row vectors should be == to row matrices with the same contents.

  • Also, indexing (aka #14770):

    • I do not have a problem with omitting an index into a trailing singleton dimension
    • I do not have a problem with indexing past the last dimension of an array with a 1
    • I want to get rid of "generalized linear indexing" in the sense of indexing into an array with fewer indices than it has dimensions with the last index used as a linear index into the remaining dimensions – unless the omitted dimensions all have size 1 (in which case it is ok)
  • If you ask for the size of an array past its number of dimensions you get 1 (as we have forever).

I believe this is a pretty coherent view in which the embedding of lower dimensions into higher ones is maintained but code unexpectedly "working" for the wrong dimension of argument is avoided. Yes, you can potentially pass an (n+1)-array into a loosely typed routine that expects an n-array and it might not error – but only if the last dimension is singleton, and then it will actually do exactly what you expect. What you cannot do under these rules, is pass an (n+1)-array with a non-singleton last dimension and have that second dimension effectively ignored, leading to unexpected results and hard-to-find bugs instead of errors. I believe there was quite a bit of support for this view in #14770.

[Edited to generalize the example at the end since non-column matrices can be linearly indexed, so it was a bad example. Eliminating linear indexing altogether is whole other can of worms.]

Great, we're in complete agreement. It should be a relatively small patch to get indexing behaviors the rest of the way there.

Yes, I think what Stefan said makes the most sense - it's the right balance between scruffy and neat, seems to be what people expect, and is easy to use while catching the majority of "conceptual" size problems (not all, e.g. interaction with linear indexing when n == 1 in Stefan's post).

The thing which gets me is the special role of the index (not size) of 1 as being equivalent to singleton. Maybe I misunderstood, but wasn't there a push to not have assumptions about 1-based indexing in AbstractArray? Should we support singleton dimensions in zero-based arrays, for instance, for indexing and broadcasting behaviors? Or do we say that these are niche enough that singleton dimensions in these cases are corner cases that we won't officially support?

I don't think this is hard to generalize: it all depends on what the array returns for indices(A, d) where d > N. so long as your index value matches that, we can support it.

I think that we have to assume something about absent dimensions, which seems like it has to be that they range from 1:1. On the flip side, if you don't index into a dimension, as long as there's only one possible choice of index values – be it 1, 0, 76234, or -7 – then we can safely only assume that's the index you meant.

@StefanKarpinski, @JeffBezanson and I have been slowly mulling this one over for the past few weeks. While we often allow vectors to behave like 1-column matrices, that's not always the case. A salient example is APL indexing: A[ones(5), ones(5)] is very different from A[ones(5,1), ones(5,1)]. Another example is how we allow appending elements to vectors but not column matrices. The data structures are different, even if they sometimes behave similarly.

I believe the general rule here is that we allow vectors to participate in linear algebra as 1-column matrices, but in other respects their dimensionality leads to behaviors that are _different_, _observable_, and _distinct_.

Even more compelling is the fact that I really don't want to generalize a notion of equality that ignores trailing singleton dimensions. That would lead directly to a complaint that three dimensional "matrices" cannot participate in linear algebra — even though ones(3,2,1) would be equal to ones(3,2).

On the other hand the primary purpose of a RowVector is to participate in the linear algebra of matrices. In my view, the status quo is correct.

Resolved: we aren't going to allow for equality between different dimension arrays.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpsanders picture dpsanders  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

manor picture manor  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

sbromberger picture sbromberger  Â·  3Comments