from @alanedelman:
We really should think carefully about how the transpose of a vector should dispatch the various A_*op*_B*
methods. It must be possible to avoid new types and ugly mathematics. For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.
What works for me mathematically (which avoids introducing a new type) is that for a 1-dimensional Vector
v
:
v'
is a no-op (i.e. just returns v
),v'v
or v'*v
is a scalar,v*v'
is a matrix, andv'A
or v'*A
(where A
is an AbstractMatrix
) is a vectorA general _N_-dimensional transpose reverses the order of indices. A vector, having one index, should be invariant under transposition.
In practice v'
is rarely used in isolation, and is usually encountered in matrix-vector products and matrix-matrix products. A common example would be to construct bilinear forms v'A*w
and quadratic forms v'A*v
which are used in conjugate gradients, Rayleigh quotients, etc.
The only reason to introduce a new Transpose{Vector}
type would be to represent the difference between contravariant and covariant vectors, and I don't find this compelling enough.
For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.
The double-dual of a finite dimensional vector space is isomorphic to it, not identical. So I'm not clear on how this is bad mathematics. It's more that we tend gloss over the distinction between things that are isomorphic in mathematics, because human brains are good at handling that sort of slippery ambiguity and just doing the right thing. That said, I agree that this should be improved, but not because it's mathematically incorrect, but rather because it's annoying.
How can v' == v
, but v'*v != v*v
? Does it make more sense than we thought for x' * y
to be its own operator?
The double-dual of a finite dimensional vector space is isomorphic to it, not identical.
(Speaking as myself now) It is not just isomorphic, it is naturally isomorphic, i.e. the isomorphism is independent of the choice of basis. I can’t think of a practical application for which it would be worth distinguishing between this kind of isomorphism and an identity. IMO the annoyance factor comes from making this kind of distinction.
Does it make more sense than we thought for
x' * y
to be its own operator?
That was the impression I got out of this afternoon’s discussion with @alanedelman.
I think what Jeff is asking is right on the money ... it is starting to look like x'_y and x_y' is making more
sense than ever.
I'm in violent agreement with @stefan. Bad mathematics was not meant to mean, wrong mathematics, it was
meant to mean annoying mathematics. There are lots of things that are technically right, but not very nice ....
If we go with this logic, here are two choices we have
x_x remains an error ..... perhaps with a suggestion "maybe you want to use dot"
or x_x is the dot product (I don't love that choice)
If x
and x'
are the same thing, then if you want (x')*y
to mean dot(x,y)
that implies that x*y
is also dot(x,y)
. There's no way out of that. We could make x'y
and x'*y
a different operation, but I'm not sure that's a great idea. People want to be able to parenthesize this in the obvious way and have it still work.
I would point out that if we allow x*x
to mean the dot product, there's basically no going back. That is going to get put into people's code everywhere and eradicating it will be a nightmare. So, natural isomorphism or not, this ain't pure mathematics and we've got to deal with the fact that different things in a computer are different.
Here is a practical discussion of distinguishing "up tuples" and "down tuples" that I like:
http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_idx_3310
It carefully avoids words like "vector" and "dual", perhaps to avoid annoying people. I find the application to partial derivatives compelling though:
http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_sec_Temp_453
Another reason to distinguish M[1,:]
and M[:,1]
is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1)
is column-stochastic and M./sum(M,2)
is row-stochastic. The same could be done for normalization if we "fixed" the norm
function to allow application over rows and columns easily. Of course, we could still have sum(M,1)
and sum(M,2)
return matrices instead of up and down vectors, but that seems a bit off.
I like the idea of up and down vectors. The trouble is generalizing it to higher dimensions in a way that's not completely insane. Or you can just make vectors a special case. But that feels wrong too.
It's true that up/down may be a separate theory. The approach to generalizing them seems to be nested structure, which takes things in a different direction. Very likely there is a reason they don't call them vectors.
Also, x*y = dot(x,y)
would make *
non-associative, as in x*(y*z)
vs. (x*y)*z
. I really hope that we can avoid that.
Yes. To me, that's completely unacceptable. I mean technically, floating-point *
is non-associative, but it's nearly associative, whereas this would just be flagrantly non-associative.
We all agree that x*x
should not be the dot product.
The question remains whether we can think of v'w
and v'*w
as the dot product --
I really really like this working that way.
@JeffBezanson and I were chatting
A proposal is the following:
v'
is an error for vectors (This is what mathematica does)
v'w
and v'*w
is a dot product (result = scalar)
v*w
is an outer product matrix (result = matrix)
There is no distinction between rows and column vectors. I liked this anyway
and was happy to see mathematica's precedent
From mathematica: http://reference.wolfram.com/mathematica/tutorial/VectorsAndMatrices.html
Because of the way Mathematica uses lists to represent vectors and matrices, you never have to distinguish between "row" and "column" vectors
Users have to be aware that there are no row vectors .... period.
Thus if M
is a matrix
M[1,:]*v
is an error ..... (assuming we go with M[1,:]
is a scalar
The warning could suggest trying dot
or '*
or M[i:i,:]
M[[1],:]*v
or M[1:1,:]*v
is a vector of length 1 (This is julia's current behavior anyway)
Regarding the closely related issue in https://groups.google.com/forum/#!topic/julia-users/L3vPeZ7kews
Mathematica compresses scalar-like array sections:
m = Array[a, {2, 2, 2}]
Out[49]= {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1],
a[1, 2, 2]}}, {{a[2, 1, 1], a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}}}
In[123]:= Dimensions[m]
Dimensions[m[[All, 1, All]]]
Dimensions[m[[2, 1, All]]]
Dimensions[m[[2, 1 ;; 1, All]]]
Out[123]= {2, 2, 2}
Out[124]= {2, 2}
Out[125]= {2}
Out[126]= {1, 2}
[Edit: code formatting – @StefanKarpinski]
@alanedelman
assuming we go with M[1,:] is a scalar
do you mean M[1,:] is just a vector?
Yes, sorry. My mind meant M[1,:] was processing the scalar 1 :-)
Mathematica uses the period .
rather than the asterisk *
and then goes the whole 9 yards and makes (vector . vector) into a scalar, exactly what we are avoiding
with the asterisk.
There are no doubt many problems with the period, one of which is that it just doesn't
look like the "dot" in a dot product, and another of which is that it clashes with
the "pointwise op" reading of dot,
Unicode does provide very nicely a character named "the dot operator"
⋅
(char(8901))
which we can imagine offering
so we could have (v ⋅ w)
becoming synonymous with (v'*w)
In summary a current proposal subject for debate is
A[i,:]
is a vector as is A[:,i,j]
A[ i:i , : ]
or A[ [i], : ]
returns a matrix with one rowv'w
or v'*w
is the dot product for vectors (similarly v*w'
for outer product)v'
is undefined for vectors (point the user to permutedims(v,1)
????)v*A
returns a vector if A is a matrixv⋅w
also returns the dot product (but does not go as far as mathematica's .
by working on matricesv*w
is undefined for vectors but a warning might tip the user off with good suggestions including⋅
Consequences are that
a. if you stick to all vectors being column vectors, everything works
b. if you make everything a matrix, everything certainly works, and it's easy to make everything a matrix
c. if your mind can't distinguish a row vector from a one row matrix, chances are you'll be educated
politely and gracefully
d. This dot notation ⋅
is kind of pleasant to the eye
Suggestion 5) looks very odd to me. I prefer v'*A
so that it is explicit that you are using the dual vector. This is particularly important in complex vector spaces where the dual isn't just a "shape" transformation.
I want to echo @StefanKarpinski that it would be rather unfortunate to lose our concise broadcasting behavior in all this. After this change, what is the concise syntax for taking a vector v
and normalizing the columns of matrix A
by those values? Currently, one can use A ./ v'
. This is extremely nice for data manipulation.
Good questions
My scheme does not preclude v'*A
taking the complex conjugate of v and mulitiplying by A
and all the various other cases that I didn't yet explicitly mention, but readily could
we could eliminate 5
perhaps that's desirable
it doesn't conform to my column vector rule
This approach to broadcasting is cute and kludgy
One solution now is A ./ v[:,[1]]
It has the benefit of documenting which dimension is being broadcast on
and generalizes to higher dimensional arrays
Oh and the v[:,[1]]
solution has the virtue of NOT taking the complex conjugate
which is probably what the user intends .....
I LIKE THESE TWO EXAMPLES because the first is a LINEAR ALGEBRA example
where a complex conjugate is desired very frequently, but the second example is
a MULTIDIMENSIONAL DATA EXAMPLE where we want things to work in all dimensions
not just for matrices, and we very likely don't want a complex conjuugate
⋅
requires #552. This is the third time it's shown up in the past two weeks.
Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.
It seems to me that while having the broadcasting behavior is nice some of the time, you end up having to squeeze those extra unit dims outs just as often. Thus, having to do the opposite some of the time is OK if the rest of the system is nicer (and I do think having scalar dimensions dropped will make the system nicer). Thus you will need a function like
julia> widen(A::AbstractArray,dim::Int) = reshape(A,insert!([size(A)...],dim,1)...)
# methods for generic function widen
widen(A::AbstractArray{T,N},dim::Int64) at none:1
which will allow code like M ./ widen(sum(M,2),2)
or A ./ widen(v,1)
(see @blakejohnson example above)
M[:,0,:] and v[:,0] ?????
I'm more with @blakejohnson on the reduction issue; I personally think it's clearer to squeeze
dimensions than to widen
them. I suspect I'd be perpetually looking at the docs to figure out whether widen
inserts the dimension at or after the indicated index, and the numbering gets a little more complex if you want to widen over multiple dimensions at once. (What does widen(v, (1, 2))
for a vector v
do?) None of these are issues for squeeze
.
Regardless of whether we would widen or squeeze per default, I think that Julia should follow the lead from numpy when it comes to widening and allow something like v[:, newaxis]
. But I do believe that I prefer to keep dimensions instead of discarding them, it's harder to catch a bug where you accidentally widened the wrong way than when you squeezed the wrong way (which will usually give an error).
In the list of @alanedelman
I feel that
v*A returns a vector if A is a matrix
is not good.
v_A should be an error if A is not 1x1 (mismatch of the range of index)
v'_A should be the proper way to do it.
One way to handle this issue is to automatically convert vector v to nx1 matrix (when needed)
and always treat v' as 1xn matrix (never convert that to a vector or nx1 matrix)
Also we allow to automatically convert 1x1 matrix to a scaler number (when needed).
I feel that this represents a consistent and uniform way to think about linear algebra. (good mathematics)
A uniform way to handle all those issues is to allow automatic (type?) conversion (when needed)
between array of size (n), (n,1), (n,1,1),(n,1,1,1) etc (but not between arrays of size (n,1) and (1,n) )
(Just like we automatically convert real number to complex number when needed)
In this case an array of size (1,1) can be converted to a number (when needed) (See #4797 )
Xiao-Gang (a physicist)
This leaves v'_A however .... I really want v'_A*w to work
My impression of linear algebra in Julia is that it is very much organized like matrix algebra, although scalars and true vectors exist (which I think is good!)
Let us consider how to handle a product like x*y*z*w
, where each factor may be a scalar, vector, or matrix, possibly with a transpose on it. Matrix algebra defines the product of matrices, where a matrix has size n x m
. One approach would be to extend this definition so that n
or m
might be replaced by absent
, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:
absent x absent
n x absent
absent x n
Ideally, we would like to arrange things so that we never need to represent row vectors, but it would be enough to implement operations like x'*y
and x*y'
. I get the feeling that this is the flavor of scheme that many of us are searching for.
But I'm beginning to suspect that banning row vectors in this kind of scheme will come at a high cost. Example: Consider how we would need to parenthesize a product to avoid forming a row vector at any intermediate step: (a
is a scalar, u
and v
are vectors)
a*u'*v = a*(u'*v) // a*u' is forbidden
v*u'*a = (v*u')*a // u'*a is forbidden
To evaluate a product x*y'*z
while avoiding to produce row vectors, we would need to know the types of the factors before picking the multiplication order! If the user should do it himself, it seems like an obstacle to generic programming. And I'm not sure how Julia could do it automatically in a sane way.
Another reason not to fix the multiplication order in advance: I seem to remember that there was an idea to use dynamic programming to choose the optimal evaluation order of *(x,y,z,w)
to minimize the number of operations needed. Anything that we do to avoid forming row vectors will likely interfere with this.
So right now, introducing a transposed vector type seems like the most sane alternative to me. That, or doing everything as now but dropping trailing singleton dimensions when keeping them would result in an error.
Transposition is just a particular way to permute modes. If you allow v.'
where v
is a vector, then permutedims(v,[2 1])
should return the exact same thing. Either both return a special row vector type, or they introduce a new dimension.
Having a special type for row vectors doesn't look like a good solution to me, because what will you do about other types of mode-n vectors, e.g., permutedims([1:4],[3 2 1])
? I urge you to also take the multilinear algebra into account before taking a decision.
@toivoh mentioned that
"One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:
In multi linear algebra (or for high rand tensors) the above proposal correspond to use absent to represent
many indices of range 1, ie size (m,n,absent) may correspond to (m,n), (m,n,1), (m,n,1,1), etc.
If we use this interpretation of absent , then 1. and 2. is OK and nice to have, but 3. may not be OK.
We do not want to mix arrays of size (1,n) and (1,1,n).
I'm not a specialist in tensor theory, but I have used all the systems mentioned above (_without_ any add-on packages) for substantial projects involving linear algebra.
[TL;DR: skip to SUMMARY]
Here are the most common scenarios in which I have found a need for greater generality in array handling than commonplace matrix-vector operations:
(1) Functional analysis: For instance, as soon as you're using the Hessian of a vector-valued function, you need higher-order tensors to work. If you're writing a lot of math, it would be a huge pain to have to use special syntax for these cases.
(2) Evaluation control: For instance, given any product that can be computed, one should be able to compute any sub-entity of that product separately, because one might wish to combine it with multiple different sub-entities to form different products. Thus Toivo's concern about, e.g., a*u'
being forbidden is not just a compilation concern, but a programming concern; an even more common variant is pre-computing x'Q
to compute quadratic forms x'Q*y1
, x'Q*y2
, ... (where these must be done sequentially).
(3) Simplifying code: Several times when dealing with arithmetic operations mapped over multi-dimensional data sets, I have found that 6-7 lines of inscrutable looping or function-mapping code can be replaced with one or two brief array operations, in systems that provide appropriate generality. Much more readable, and much faster.
Here are my general experiences with the above systems:
MATLAB: Core language is limited beyond commonplace matrix-vector ops, so usually end up writing loops with indexing.
NumPy: More general capability than MATLAB, but messy and complicated. For almost every nontrivial problem instance, I had to refer to documentation, and even then sometimes found that I had to implement myself some array operation that I felt intuitively should have been defined. It seems like there are so many separate ideas in the system that any given user and developer will have trouble automatically guessing how the other will think about something. It is usually possible to find a short, efficient way to do it, but that way is not always obvious to the writer or reader. In particular, I feel that the need for widening and singleton dimensions just reflects a lack of generality in the implementation for applying operators (though maybe some find it more intuitive).
Mathematica: Clean and very general---in particular, all relevant operators are designed with higher-order tensor behavior in mind. Besides Dot, see for example the docs on Transpose, Flatten / Partition, and Inner / Outer. By combining just these operations, you can already cover most array-juggling use cases, and in version 9 they even have additional tensor algebra operations added to the core language. The downside is that even though the Mathematica way of doing something is clean and makes sense (if you know the language), it may not obviously correspond to the usual mathematical notation for doing it. And of course, the generality makes it difficult to know how the code will perform.
scmutils: For functional analysis, it is clean, general, and provides the most mathematically intuitive operations (both write and read) of any of the above. The up/down tuple idea is really just a more consistent and more general extension of what people often do in written mathematics using transpose signs, differentiation conventions, and other semi-standardized notions; but everything just works. (To write my Ph.D. thesis, I ended up developing a consistent and unambiguous notation resembling traditional math notation but isomorphic to Sussman & Wisdom's SICM syntax.) They've also used it for a differential geometry implementation [1], which has inspired a port to SymPy [2]. I have not used it for for data analysis, but I would expect that in a generic array context where you only wanted one kind of tuple (like Mathematica's List), you could just pick one ("up") by convention. Again, generality obscures performance considerations to the programmer, but I would hope this is an area where Julia can excel.
SUMMARY
I think the proposed transposed-vector type should be characterized as the more general "down" tuple in scmutils, while regular vectors would be the "up" tuples. Calling them something like "vector" and "transposed-vector" would probably make more sense to people than calling them "up" and "down" (at the cost of brevity). This would support three categories of use:
(1) for data analysis, if people just want nested arrays, they only need "vector";
(2) for basic matrix-vector linear algebra, people can use "vector" and "transposed-vector" in direct correspondence with mathematical convention ("matrix" would be equivalent to a "transposed-vector" of "vector"s);
(3) for higher-order tensor operations (where there's less standardization and people usually have to think anyway), the implementation should support the full generality of the two-kind tuple arithmetic system.
I believe this approach reflects the emerging consensus above for the results of various operations, with the exception that those cases that earlier posts considered errors (v'
and v*A
) would actually give meaningful (and often useful) results.
[1] http://dspace.mit.edu/handle/1721.1/30520
[2] http://krastanov.wordpress.com/diff-geometry-in-python/
@thomasmcoffee sound like you are advocating for an explicit distinction between co- and contravariant vectors then.
I would think of that as a common application, but overly specific for what I'm advocating: to me, that has a geometric meaning implying a restriction to rectangular tensors of numbers (for coordinate representations). Since I imagine (without expertise in this area) that a suitable library of tensor algebra functions with standard arrays would usually suffice for this purpose, I'm sympathetic to Alan's point that this alone is not compelling enough to introduce a two-kind system in the core language.
I'm mainly thinking of other applications depending on more general nested structure, for instance, calculus on functions of multiple arguments of mixed dimensionality, which would be more difficult to develop as an "add-on" later if the core language did not support this distinction. Maybe we mean the same thing.
The problem with up and down vectors is that you need to extend the idea to general arrays. Otherwise vectors become something special and separate from arrays, instead of simply the 1-dimensional case of an array, which would lead to whole mess of awful problems. I've thought a lot about how to do that, but haven't come up with any acceptable. If you've got any good ideas of how to sanely generalize up and down vectors to arrays, I'd love to hear them.
Just trying to extrapolate this idea. As I understand it, in order to use an array to calculate with up and down vectors, you need to indicate for each dimension whether it is up or down. In general, this could be achieved by wrapping an array in something like
immutable UpDownTensor{T, N, UPMASK} <: AbstractArray{T, N}
A::AbstractArray{T, N}
end
where UPMASK
would be a bit mask to indicate which dimensions are up. Then operations on non-wrapped arrays could be implemented by providing a default UPMASK
as a function of N
: vectors would default to a single up, matrices to the first up and the second down; then I'm not sure how it would be reasonably continued.
Some random thoughts:
Well, this is certainly one generalization of the Transposed
type, and it certainly has some merit. Not sure whether it is feasible.
I think Toivo's suggestion is a reasonable realization of what I was advocating above. To me, the most sensible default is to continue alternating directions at higher orders: for instance, if someone provided the components of a power series as non-wrapped arrays, this would do the right thing.
But on further reflection, I think it could be very powerful to combine both ideas: (1) distinctions between up and down vectors and (2) distinctions between arrays and vectors. Then you could have objects where some dimensions are up, some are down, and some are "neutral." The reason to distinguish arrays and vectors is that, semantically, arrays are for organization (collecting multiple things of the same kinds), whereas vectors are for coordinatization (representing multidimensional spaces). The power of combining both distinctions in one object is that it can then serve both these purposes simultaneously. The neutral dimensions would be treated according to broadcasting rules, while the up/down dimensions would be treated according to tensor arithmetic rules.
Returning to an earlier example of mine, suppose you're computing a number of quadratic forms x'Q*y1, x'Q*y2, ...
for different vectors y1, y2, ...
. Following SICM, denote up tuples (column vectors) by (...)
and down tuples (row vectors) by [...]
. If you wanted to do this all at once, and you're stuck with up/down only, the conventional way would be to combine the yi
into a matrix Y = [y1, y2, ...]
using a down tuple (of up tuples), and compute r = x'Q*Y
, which gives you the results in a down tuple r
. But what if you want to multiply each of these results by a (column) vector v
? You can't just do r*v
, because you'll get a contraction (dot product). You can convert r
to an up tuple, and then multiply, which gives you your results in an up tuple (of up tuples). But suppose for the next step you need a down tuple? Semantically, you have a dimension going through your calculation that just represents a collection of things, which you always want broadcasted; but to achieve this in the strictly up/down world, you have to keep doing arbitrary context-dependent conversions to elicit the right behavior.
In contrast, suppose you also have neutral tuples (arrays), denoted {...}
. Then you naturally write ys = {y1, y2, ...}
as an array (of up tuples), so that r = x'Q*ys
is an array, and r*v
is also an array (of up tuples). Everything makes sense, and no arbitrary conversions are required.
Stefan suggests that distinguishing 1-D arrays from up/down vectors is disastrous, but I think this problem gets solved by the fact that most functions make sense operating on vectors _or_ on arrays, but NOT _either_. (Or, on matrices _or_ on arrays of vectors _or_ on vectors of arrays _or_ on arrays of arrays, but NOT _either_. And so on.) So with appropriate conversion rules, I haven't thought of a common case that wouldn't do the right thing automatically. Maybe someone can?
Upon looking deeper [1], I discovered that scmutils actually distinguishes what they call "vectors" from up and down tuples under the hood; but currently the conversion rules are set up so that these "vectors" are mapped to up tuples (as I had proposed earlier) whenever they enter the up/down world, with the caveat that "We reserve the right to change this implementation to distinguish Scheme vectors from up tuples." (Perhaps someone on campus can ask GJS if he had any specific ideas in mind.) The Sage system [2] largely separates handling of arrays from vectors and matrices (currently no core support for tensors), and the only problems I've experienced with this have to do with its lack of built-in conversion between them in cases that would obviously make sense.
[1] http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt --- starting at "Structured Objects"
[2] http://www.sagemath.org/
I was talking to @jiahao at the lunch table and he mentioned that the Julia team was trying to figure out how to generalize linear algebra operations to higher dimensional arrays. Two years ago I spent several months thinking about this because I needed it for KroneckerBio. I wanted to share my approach.
Let's consider just the product between two arrays for the moment. Other operations have a similar generalization. The three most common types of products when dealing with arrays are the outer product, the inner product, and the elementwise product. We usually think of doing operations like this between two objects, such as inner(A,B)
or A*B
. When doing these operations on higher-dimensional arrays, however, they are not done between the arrays as a whole, but between particular dimensions of the arrays. Multiple outer/inner/elementwise suboperations happen in a single operation between two arrays and each dimension of each array must be assigned to exactly one suboperation (either explicitly or to a default). For the inner and elementwise products, one dimension on the left must be paired with an equally sized dimension on the right. The outer product dimensions do not have to be paired. Most of the time the user is doing either an inner product or an elementwise product between a pair of dimensions and an outer product for all others. The outer product makes a good default because it is the most common and does not have to be paired.
I usually think about the dimensions as being named rather than being ordered, much like the x, y, and z axes of a plot. But if you want users to actually be able to access the arrays by ordered indexing (like A[1,2,5]
rather than A[a1=1, a3=5, a2=2]
) then you have to have a consistent procedure to order the results of an operation. I propose ordering the result by listing all the dimensions of the first array followed by listing all the dimensions of the second array. Any dimensions that participated in an inner product are squeezed out, and for dimensions that participated in an elementwise product, only the dimension from the second array gets squeezed out.
I am going to make up some notation for this. Feel free to Juliafy it. Let A
be an array that is a1
by a2
by a3
and let B
be an array that is b1
by b2
. Let's say that array_product(A, B, inner=[2, 1], elementwise=[3, 2])
would take the inner product between dimensions a2
and b1
, the elementwise product between a3
and b2
, and the outer product of a1
. The result would be an array that is a1
by a3
.
It should be clear that no binary or unary operator is going to have much meaning in the context of higher-dimension arrays. You need more than two arguments in order to specify what to do with each dimension. However, you can recapture the ease of linear algebra by making the Matlab operators shorthand for array operations on just the first two dimensions:
Matlab's A*B
is array_product(A, B, inner=[2,1])
.
Matlab's A.'
is permute(A, B, [2,1])
where permute keeps unchanged all dimensions higher than the count of the third argument.
You can choose whether or not to throw errors when the dimensionality of the arrays is greater than 2 or even not equal to 2, like Mathematica does with vector transposes. If you are using just the general array calculations, you do not have to decide whether or not to take @wenxgwen's suggestion to interpret all (n, m) arrays as (n, m, 1) and (n, m, 1, 1). Only when using the linear algebra operators or other operators that expect array or particular dimensionality do you have to make this decision. I like @wenxgwen's suggestion, because there is little downside in a dynamically typed language.
I wrote a more detailed description of my system, which includes addition, exponentiation, and differentiation along with the chain rule and product rule for array calculus .
Thanks for the perspective! I found this quite enlightening to understand what kind of beast that a general array * array product really is.
May be interesting to cross-reference the proposals for multidimensional array multiplication with the semantics proposed for a matrix multiplication operator in PEP 0465. In particular:
1d vector inputs are promoted to 2d by prepending or appending a '1' to the shape, the operation is performed, and then the added dimension is removed from the output. The 1 is always added on the "outside" of the shape: prepended for left arguments, and appended for right arguments. The result is that matrix @ vector and vector @ matrix are both legal (assuming compatible shapes), and both return 1d vectors; vector @ vector returns a scalar... An infelicity of this definition for 1d vectors is that it makes @ non-associative in some cases ((Mat1 @ vec) @ Mat2 != Mat1 @ (vec @ Mat2)). But this seems to be a case where practicality beats purity
Ducking typing in Python causes a special problem. Naturally, arrays and matrices should be interchangeable (same underlying data). But because Python discourages the checking of a type, matrices are not cast to the correct interface at the beginning of a function that expects an array and vice versa. This is why they must have different operator characters. Julia with runtime type checking and convert
methods does not suffer from this ambiguity.
From PEP 0465:
An infelicity of this definition for 1d vectors is that it makes @ non-associative in some cases ((Mat1 @ vec) @ Mat2 != Mat1 @ (vec @ Mat2))
Notably, this kind of definition can produce incorrect results in Mathematica, because Dot
(.
) is assumed to be associative (Flat
) when evaluated symbolically (as with f
below, but not with g
):
In[1]:= f=X.(y.Z);
g:=X.(y.Z)
In[3]:= Block[{
X=Array[a,{2,2}],
y=Array[b,2],
Z=Array[c,{2,2}]
},{f,g}]
Out[3]= {{(a[1,1] b[1]+a[1,2] b[2]) c[1,1]+(a[2,1] b[1]+a[2,2] b[2]) c[2,1],(a[1,1] b[1]+a[1,2] b[2]) c[1,2]+(a[2,1] b[1]+a[2,2] b[2]) c[2,2]},{a[1,1] (b[1] c[1,1]+b[2] c[2,1])+a[1,2] (b[1] c[1,2]+b[2] c[2,2]),a[2,1] (b[1] c[1,1]+b[2] c[2,1])+a[2,2] (b[1] c[1,2]+b[2] c[2,2])}}
In[4]:= SameQ@@Expand[%]
Out[4]= False
From @drhagen:
Julia with runtime type checking and
convert
methods does not suffer from this ambiguity.
This is why I feel the right solution for Julia _should_ let the data itself distinguish between array-like semantics (for universal broadcasting) and tensor-like semantics (for possible contraction).
I am by no means an authority here, but I don't think that the general arbitrary-dimensional collection type (Array
) should support an operator that does dot product. This operator simply cannot be sensibly defined for this type because the dot product can be between any two dimensions, requiring additional arguments that cannot be supplied to a binary operator. The same goes for all the linear algebra operations, inv
, transpose
, etc.
To operate in the mathematical field of linear algebra, there should then be three more types, Matrix
, ColumnVector
, and RowVector
, on which all the normal linear algebra operators and functions work as normal.
Now that the type structure is defined well, you can make it easier on the user by adding implicit conversion for Matrix
to Array{2}
, ColumnVector
to Array{1}
, and RowVector
to Array{2}
(not sure about this one), Array{2}
to Matrix
, and Array{1}
to ColumnVector
.
My proposal above (https://github.com/JuliaLang/julia/issues/4774#issuecomment-32705055) allows each dimension of a multidimensional structure to distinguish whether it has neutral ("collection"/"array"), up ("column"), or down ("row") semantics. I think what you're describing is then a special case.
The advantage of this general approach is that even in computations with many dimensions of data or space, you can get operators to just do the right thing without explicitly specifying which dimensions it should operate on. I think we agree that, at least in Julia, it is much more intuitive and readable for a user to specify once the meaning of the input data by choosing type parameters, than to have to specify the meaning of every operation by calling every instance with additional arguments giving dimensional indices. Implicit or explicit conversions can still be used, with full-dimensional generality, in the cases where the meaning must be changed midstream in unusual ways.
@thomasmcoffee I like your proposal very much. I implemented something vaguely similar in a DSL (long ago, far away) with a few guiding principles (aka personal opinions):
The biggest complaints I got back then (and they were loud) were about exactly the kind of inconvenience which your trivalent semantics (adding a neutral collection notion) solve. Nice! That idea never occurred to me, but makes so much sense now that you put it out there. I'd really like to use such a system, and I mean for real work. It would be nice if Julia could accommodate this!
What you guys seem to be describing is regular tensors. I doubt that is a common enough use case alone to justify being in the standard library, as the other two use cases (collections and linear algebra) are far more common. However, if it could be integrated seamlessly, I would support it. Could you give some examples of what some common operations would look under this system, such as vector-matrix multiplication, scalar-array multiplication, distributing the addition of one array over an array of arrays, etc.?
I think you're right David. We're really talking about two use cases.
The subset of Linear Algebra is most commonly needed by most people as you
say. Even there I advocate maintaining a distinction between v and v'.
What I really would like (insert disclosure of greed here) is Tensors with
first-class (or close) status...near native speed (in the limiting case,
compared to Linear Algebra performance) with easy syntax, no overwrought
typing issues, with co/contravariance encoded in the data, not foisted onto
the operators. Once I define the data semantics, the operations should
just work. Tensoral Duck Typing.
Perhaps tensors and TDT belong in a package and not in core, just on
grounds of relative popularity. But like the Julia declaration of
independence says, Julia is born of greed. And like Gordon Gecko says,
greed is good. :)
On Mar 21, 2014 3:14 AM, "David Hagen" [email protected] wrote:
What you guys seem to be describing is regular tensors. I doubt that is a
common enough use case alone to justify being in the standard library, as
the other two use cases (collections and linear algebra) are far more
common. However, if it could be integrated seamlessly, I would support it.
Could you give some examples of what some common operations would look
under this system, such as vector-matrix multiplication, scalar-array
multiplication, distributing the addition of one array over an array of
arrays, etc.?
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4774#issuecomment-38262998
.
I think seamless integration is definitely achievable given a rich enough type family. Extending Toivo's https://github.com/JuliaLang/julia/issues/4774#issuecomment-32693110 above, it might notionally start off like this:
immutable AbstractTensorArray{T, N, UPMASK, DOWNMASK} <: AbstractArray{T, N}
A::AbstractArray{T, N}
end
# where !any(UPMASK & DOWNMASK)
typealias AbstractColumnVector{T} AbstractTensorArray{T, 1, [true], [false]}
typealias AbstractRowVector{T} AbstractTensorArray{T, 1, [false], [true]}
typealias AbstractMatrix{T} AbstractTensorArray{T, 2, [false, true], [true, false]}
(currently AbstractMatrix{T}
simply aliases AbstractArray{T, 2}
; potentially, another name could be used here)
From here the following implementations seem logical:
transpose
method, after rearranging dimensions and the corresponding UPMASK and DOWNMASK indices, then interchanges the UPMASK and DOWNMASK. Neutral dimensions would be unaffected.AbstractArray{T, N}
subtypes are typically converted by default to corresponding alternating AbstractTensorArray{T, N, [..., false, true, false, true], [..., true, false, true, false]}
subtypes in tensor operations. This preserves the existing semantics of Julia's special array syntax for vectors and matrices.array
) for AbstractTensorArray
is used to produce neutral dimensions, and can combine other AbstractTensorArray
s (or types convertible thereto) to create a combined AbstractTensorArray
with neutral top-level dimension.Considering @drhagen's examples:
vector-matrix multiplication, scalar-array multiplication
c = 1 # Int
v = [1, 2] # Array{Int, 1}
M = [[1, 2] [3, 4]] # Array{Int, 2}
# scalar-array
c * M # UNCHANGED: *(Int, Array{Int, 2}) => Array{Int, 2}
# matrix-vector
M * v # *(Array{Int, 2}, Array{Int, 1}) => *(Matrix{Int}, ColumnVector{Int}) => ColumnVector{Int}
# vector-matrix
v' * M # transpose(Array{Int, 1}) => transpose(ColumnVector{Int}) => RowVector{Int}
# *(RowVector{Int}, Array{Int, 2}) => *(RowVector{Int}, Matrix{Int}) => RowVector{Int}
# (1-array)-(2-array)
v .* M # UNCHANGED: .*(Array{Int, 1}, Array{Int, 2}) => Array{Int, 2}
(using Matrix
with a definition corresponding to the AbstractMatrix
definition above)
distributing the addition of one array over an array of arrays
I take this to mean, semantically, the addition of one vector over an array of vectors, addition of one matrix over an array of matrices, and so on:
# vector-(vector-array)
ws = array([1, 2], [3, 4])
# TensorArray{Int, 2, [false, true], [false, false]}
v + ws # +(Array{Int, 1}, TensorArray{Int, 2, [false, true], [false, false]}) => +(ColumnVector{Int}, TensorArray{Int, 2, [false, true], [false, false]}) => TensorArray{Int, 2, [false, true], [false, false]}
# => array([2, 4], [4, 6])
# array-(vector-array)
u = array(1, 2) # TensorArray{Int, 1, [false], [false]}
u + ws # +(TensorArray{Int, 1, [false], [false]}, TensorArray{Int, 2, [false, true], [false, false]}) => TensorArray{Int, 2, [false, true], [false, false]}
# => array([2, 3], [5, 6])
# alternatively:
v .+ ws # .+(Array{Int, 1}, TensorArray{Int, 2, [false, true], [false, false]}) => TensorArray{Int, 2, [false, true], [false, false]}
# => array([2, 3], [5, 6])
# same effect, but meaning less clear:
v .+ M # UNCHANGED: .+(Array{Int, 1}, Array{Int, 2}) => Array{Int, 2}
# => [[2, 4] [4, 6]]
# matrix-(matrix-array)
Ns = array([[1, 2] [3, 4]], [[5, 6] [7, 8]])
# TensorArray{Int, 2, [false, false, true], [false, true, false]}
M + Ns # +(Array{Int, 2}, TensorArray{Int, 2, [false, false, true], [false, true, false]}) => +(Matrix{Int}, TensorArray{Int, 2, [false, false, true], [false, true, false]}) => TensorArray{Int, 2, [false, false, true], [false, true, false]}
# => array([[2, 4] [6, 8]], [[6, 8] [10, 12]])
Considering my earlier example of scaling a vector v
by several different quadratic forms x'M*w1, x'M*w2, ...
, for a final result x'M*w1*v, x'M*w2*v, ...
:
x = v
x' * M * ws * v # *(RowVector{Int}, Array{Int, 2}) => *(RowVector{Int}, Matrix{Int}) => RowVector{Int}
# *(RowVector{Int}, TensorArray{Int, 2, [false, true], [false, false]}) => TensorArray{Int, 1, [false], [false]}
# *(TensorArray{Int, 1, [false], [false]}, Array{Int, 1}) => *(TensorArray{Int, 1, [false], [false]}, ColumnVector{Int}) => TensorArray{Int, 1, [false, true], [false, false]}
# => array([27, 54], [59, 118])
In this notional implementation, I've assumed that AbstractArray
is left alone, and thus AbstractTensorArray
forms its own "space" in the type hierarchy. Things might be simplified if the whole AbstractArray
family were simply replaced by AbstractTensorArray
, but that's another discussion.
In the context of a package for something in quantum physics, I have been playing around with defining my own tensor type (actually more than one). Initially, I also had some notion of indices coming in two flavours (up and down, incoming and outgoing, covariant and contravariant, however you want to call it), with this information being stored in either a field in the type or even in a type parameter. After a while I decided this is just to big of a hassle. It is much easier to just associate the indices of the tensor to a vector space (which I was doing anyway already) and to allow this vector space to have a dual which is different. In practice, by vector space I just mean a simple Julia type which wraps the dimension of the space and whether it is the dual or not. If a tensor index is associated to a normal vector space, it is an up index, if it is associated to a dual index, it is a down index. Do you want to work with tensors/arrays for which there is no distinction, you just define a different vector space type which doesn't distinguish between the normal vector space and its dual.
In this proposal, you can only contract tensor indices which are associated to vector spaces which are each others dual. ctranspose (=Hermitian conjugation) maps the vector space of every index onto its dual (together with the permutation of the indices in the case of a matrix, and a preferred definition for higher order tensors) etc.
Of course, normal transposition and complex conjugation are not really well defined in this setting (i.e. these are not basis independent concepts)
Minimalistically, it looks something like this:
immutable Space
dim::Int
dual::Bool
end
Space(dim::Int)=Space(dim,false) # assume normal vector space by default
dual(s::Space)=Space(s.dim,!s.dual)
matrix=Tensor((Space(3),dual(Space(5))))
# size is no longer sufficient to characterise the tensor and needs to be replaced by space
space(matrix) # returns (Space(3),dual(Space(5)))
space(matrix') # returns (Space(5),dual(Space(3)))
Of course, you could invent some syntax so as not to have to write Space constantly. You can create a different space type for which dual(s)==s in order to have tensors which do not distinguish between up and down indices, and so on. But of course there is no way this can still be built into Julia's standard Array type without breaking everything...
I've always wondered why there isn't a closer relationship between how tensors are used in engineering / physics and how they are handled in mathematical software programs. I found an interesting stack exchange conversation on the topic...http://math.stackexchange.com/questions/412423/differences-between-a-matrix-and-a-tensor. Here, also, was a good reference post.
http://www.mat.univie.ac.at/~neum/physfaq/topics/tensors
I use Matlab a lot for my daily scientific computing, but a newbie on Julia. Here, I notice that there are a lot of discussions on high-dimensional Array multiplication and transposition or other similar array operations. I suggest to take a look at http://www.mathworks.com/matlabcentral/fileexchange/8773-multiple-matrix-multiplications--with-array-expansion-enabled
It basically follows the syntax similar to what @drhagen has mentioned in an earlier post, such as array_product(A, B, inner_A_dim=[1, 2], inner_B_dim=[3, 4]) for a product between arrays A and B on the given inner dimensions.
This is one Matlab package that can apply multiplications or transposition operations on some selected dimensions. There is a manual in the package on how to implement these operations in Matlab, but I think the math theory should apply for other languages as well. Their idea is to implement array operations by avoiding using For-loops, and mostly relying on array reshaping and so on. So, it is extremely fast in Matlab. I don't know if Julia more like vectorized operations or devectorized operations (seems the later one). I feel that vectorized operation is an advantage for high-dimensional array operations, if the core supports it. Maybe we should sincerely consider this kind of array operations at this stage.
For your reference: Another similar implementation in Matlab for INV operation is here: http://www.mathworks.com/matlabcentral/fileexchange/31222-inversion-every-2d-slice-for-arbitrary-multi-dimension-array
Also notice that, after the release of the Matlab array operation support package in 2005, downloading record is maintained in a high level as up to today. As in my practical experience, array operations are very frequently used in physics and other areas. I would say, if Julia has similar functions for operating arrays with arbitrary sizes, the game will become very interesting!
another vote here for @alanedelman's proposed solution towards the top. here's a motivating example.
right now, a row-slice is a 2d array, while a column slice is a 1d array; which is weirdly asymmetric and ugly:
julia> A = randn(4,4)
4x4 Array{Float64,2}:
2.12422 0.317163 1.32883 0.967186
-1.0433 1.44236 -0.822905 -0.130768
-0.382788 -1.16978 -0.19184 -1.15773
-1.2865 1.21368 -0.747717 -0.66303
julia> x = A[:,1]
4-element Array{Float64,1}:
2.12422
-1.0433
-0.382788
-1.2865
julia> y = A[1,:]
1x4 Array{Float64,2}:
2.12422 0.317163 1.32883 0.967186
in particular, it means I can't multiply a row by a column and extract a number without doing some terribly ugly manipulation like
julia> dot(y[:],x)
2.4284575954571106
julia> (y*x)[1]
2.42845759545711
It's not a coherent proposal unless you make '*
a special operator, which is pretty dubious, since then x'*y
and (x')*y
do not mean the same thing. Moreover, it would make multiplication non-associative.
i understand the difficulties with x_y' and y'_x. It may be better to treat
inner and outer products as separate operations, using eg dot(). (Perhaps
also using cdot?)
But what are the arguments in favor of having a slice along the first
dimension return an object whose dimension is different than a slice along
the second dimension? For consistency, it seems that every time you slice,
the dimension of the resulting object should be diminished by one.
On Wed, Jul 16, 2014 at 8:17 PM, Stefan Karpinski [email protected]
wrote:
It's not a coherent proposal unless you make '* a special operator, which
is pretty dubious, since then x'_y and (x')_y do not mean the same thing.
Moreover, it would make multiplication non-associative.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-49254346.
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell
@madeleineudell , I agree with you, but that's a different issue, see #5949 . Although that issue seems to be closed, I don't remember there was a clear agreement or conclusion.
Once we switch over to array views, it will become easier to explore those directions. In particular, saying slice(A, i, :)
gets you the behavior you want. (It does so right now, but at the cost of introducing a slower type, the SubArray.)
From a purely mathematical standpoint, all of the issues presented here come from a conflation of (and confusion between) what we mean by arrays and what we mean by vectors/tensors/matrices. Arrays, conceptually, are simply lists (or, in the case of n-dim arrays, lists of lists). As such, there is no natural specification for operations like array multiplication, transposition, etc. While functions like permutedims, element-wise operations, and axis-specific operations (mean, median, etc.) make sense and can be uniquely defined in a natural way, operations such as dot products cannot be.
As mentioned above, vectors and tensors are geometric objects, and while it's possible to represent them using arrays these representations do not contain the same richness of structure as the mathematical objects they represent. The transpose of a 1-dim array is a no-op; the transpose of a vector is its dual. The transpose of a 2-dim array can be uniquely and naturally defined as the permutation of its dimensions, but this is not generally true for tensors: while the natural case holds for rank (1,1) tensors (aka, matrices), a rank (2,0) tensor transposes into a rank (0,2) tensor. Again, by treating tensors as arrays, the geometric information that makes tensors tensors is lost.
This matters when defining operations such as dot products. A dot product has a specific geometric meaning (the projection of one vector onto the dual space defined by a second vector), and thus a consistent definition of dot products requires preservation of the geometric information contained in vectors. Using certain assumptions might make it possible to use arrays and still cover a majority of use cases, but these assumptions are messy (as seen by the various proposals in this thread) and actually makes things more difficult for anyone who needs the richer structure of tensors.
So, consider this a strong vote in favor of thomasmcoffee's suggestion for including a richer AbstractTensor type. My personal preference would be that operations such as transposition and dot products were not even defined for arrays, but as I suspect most people would not share that view, I would at least want the ability to create true tensors should the need arise.
The practical implication of this perspective seems to be that arrays should be identified with a subset of tensors, and transposing a 1-d array should give a DualVector
or perhaps an error. My view is that this is analogous to operations on real numbers that give complex numbers.
My perspective would be that the general AbstractArray family, a (multidimensional) data container, is a sufficiently general to be an indespensible part of any technical programming language. A tensor following strict mathematical rules, even though I care for it dearly, is a good object for a dedicated package. In fact, I am working on something along the lines specified by @jdbates in https://github.com/Jutho/TensorToolbox.jl . It is so far undocumented and largely untested. I wrote it for the things I need personally in quantum many body physics, but I hope it is constructed in a way that is sufficiently general and extensible to be useful for the greater community of mathematicians and physicists who care about working with tensors.
To give some detail (copied from the JuliaQuantum forum): I decided to define a new type hierarchy for tensors, that is independent of the AbstractArray type of Julia (although the basic Tensor is just a wrapper for Array). This type hierarchy is supposed to work at a slightly more formal way. Tensor indices are associated to vector spaces (henceforth referred to as index spaces), and if the type of vector space to which the tensor index is associated is different from its dual, this corresponds to a tensor which distinguishes between covariant and contravariant indices.
So the first part of the package is the abstract part for defining vector spaces, where I go match the type hierarchy of Julia objects to the mathematical hierarchy of vector spaces. A general vector space V comes in four varieties, corresponding to the representation theory of the general linear group on V, i.e. V itself (fundamental representation), conj(V), dual(V) and dual(conj(V)). For real vector spaces, conj(V)=V and there is only V and dual(V), corresponding to contravariant and covariant vectors. Then there are the inner product spaces, and at the top level of the hierarchy are the Euclidean spaces , which are inner product spaces with a standard Euclidean inner product (i.e. orthogonal basis). In physics, it is also useful to think about vector spaces which are decomposed into different sectors, i.e. they are graded by e.g. irreducible representations of symmetry actions.
Tensors are objects living in (a subspace of) the tensor product of some elementary vector spaces. However, aside from the standard Tensor, which is an object living in the tensor product space of its index spaces, one could build tensors which live in e.g. the invariant sector of a tensor product of spaces graded by irreps, the symmetric or antisymmetric subspace of a tensor product of identical spaces, ... One could have fermionic vector spaces as index spaces, which implies that a permutation of the tensor indices will induce certain sign changes depending on the parity sectors, etc...
Then there are supposed to be certain operations defined on tensors, the most important of which is contracting tensors, but also, e.g. orthogonal factorisations (singular value decomposition) etc. Finally there should be linear maps that map one tensor onto another one. They deserve a special type in that one typically don't want to fully encode them as matrix, but rather in a way that the matrix vector product can be computed efficiently, for use in iterative methods (Lanczos etc). My two existing packages so far (TensorOperations.jl and LinearMaps.jl) implement this functionality for standard Arrays, the tensor toolbox under construction would overload / redefine them for the new AbstractTensor hierarchy.
I hope that this package is sufficiently general so that it is also useful for the wider physics/mathematics community. E.g. if somebody comes along that creates a package for working with manifolds, he could then define a TangentSpace vector space as subspace of the abstract InnerProductSpace, and he can then immediately create tensors living in the tensor product of a few tangent and cotangent spaces. In fact, I am thinking of splitting the part for defining vector spaces into a separate package, that could grow into a package for defining mathematical structures/objects.
Finally, the interop with standard julia comes from calling tensor
on a standard Array
, which wraps it into an object of type Tensor
with the indices associated to spaces of type CartesianSpace
. This is the standard real vector space R^n with Euclidean product, where there is no distinction between covariant and contravariant index. I think this entails best what a standard Julia Array is.
@JeffBezanson, I'm ambivalent regarding treating arrays as subsets of tensors. No information is lost that way, but at the same time there are multiple possible interpretations for arrays, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).
I'm not opposed to @Jutho's suggestion of splitting off the tensor mechanics into a separate package. He's probably right that the number of users who need the full tensor mechanics is much smaller than the number of people who just want straightforward array operations. The question we are really trying to ask here is "in what domain should linear algebra fall?"
The machinery of linear algebra is a substantive enough subset of the machinery of tensor algebra that, in my mind at least, it makes no sense to implement the former without also implementing the latter. Operations like v'M are more concisely and consistently represented if we have a notion of co- and contravariant vectors, but that already puts us most of the way towards general tensor operations.
I agree with you that this is conceptually similar to operations on real numbers which return complex numbers.
Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).
While this doesn't address or take away from your overall message, https://github.com/timholy/Images.jl/pull/135 is working toward an implementation of this idea for images. I'm hoping this also makes it easy to deal with color structure tensors, which I'm looking to use for a project.
On 23 Aug 2014, at 20:36, jdbates [email protected] wrote:
@JeffBezanson, I'm ambivalent regarding treating arrays as subsets of tensors. No information is lost that way, but at the same time there are multiple possible interpretations for images, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).
I agree that tensors do not supersede arrays. This example above is indeed a different mathematical structure (i.e. a vector bundle or more generally a tensor bundle) whose representation also can be given as a multidimensional array by choosing a grid for the manifold coordinates and a basis for the vector space part. So indeed, you can have different mathematical objects/structures which are well defined in a coordinate-independent / basis-independent way but which can be represented (after choosing a coordinate system or a basis) as a multidimensional array. So multidimensional arrays are certainly not restricted to representing tensors. The other way around also fails, as not all tensors have a convenient representation using a multidimensional array. That is only the case when you use a particular basis known as the product basis, which is obtained by taking the direct product of all possible combinations of the individual basis vectors of the vector spaces involved in the tensor product space. In some cases, e.g. when using tensors in a symmetry-invariant subspace of the tensor product space, such a representation is no longer possible and you need to define a different basis for the complete space, with respect to which the tensor is just represented as a long one-dimensional list of numbers.
I'm not opposed to @Jutho's suggestion of splitting off the tensor mechanics into a separate package. He's probably right that the number of users who need the full tensor mechanics is much smaller than the number of people who just want straightforward array operations. The question we are really trying to ask here is "in what domain should linear algebra fall?"
The machinery of linear algebra is a substantive enough subset of the machinery of tensor algebra that, in my mind at least, it makes no sense to implement the former without also implementing the latter. Operations like v'M are more concisely and consistently represented if we have a notion of co- and contravariant vectors, but that already puts us most of the way towards general tensor operations.
I agree with you that this is conceptually similar to operations on real numbers which return complex numbers.
—
Reply to this email directly or view it on GitHub.
there are multiple possible interpretations for arrays, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).
It was just this sort of distinction I was attempting to capture in the notional AbstractTensorArray
proposal of https://github.com/JuliaLang/julia/issues/4774#issuecomment-38333295 by allowing both array-like and tensor-like dimensions. Under this scheme, I would expect to represent your example as
AbstractTensorArray{Uint8, 3, [false, true, false], [true, false, false]}
so that the x, y, and RGB dimensions are "down", "up", and "neutral", respectively. Geometric operations (e.g., affine transformations) could then handle the grid coordinate dimensions in tensor-like fashion while mapping over the RGB values in array-like fashion. (If you later want to treat the RGB values geometrically, you'd have to explicitly change the mask for that purpose, but I would guess that (a) it's less common that two different flavors of geometric operations will be applied to different subspaces of the same data table, and (b) in this situation, an explicit conversion would probably _improve_ the clarity of code.)
I hadn't considered the conjugate representations that @Jutho mentions, but it seems to me that this generalization could be handled by further extending the same masking approach, for complex spaces.
The question we are really trying to ask here is "in what domain should linear algebra fall?"
Once a design is settled for how array-like and tensor-like operations play together, the entities for linear algebra can just be defined by special cases (like the aliases I used above), so that the pure linear algebra user can be oblivious to the whole generalized tensor hierarchy until it's needed (but won't have to rewrite things if and when it is). So I would see no issue (except perhaps bloat) putting the whole thing in Base.
so that the x, y, and RGB dimensions are "down", "up", and "neutral", respectively. Geometric operations (e.g., affine transformations) could then handle the grid coordinate dimensions in tensor-like fashion while mapping over the RGB values in array-like fashion. (If you later want to treat the RGB values geometrically, you'd have to explicitly change the mask for that purpose, but I would guess that (a) it's less common that two different flavors of geometric operations will be applied to different subspaces of the same data table, and (b) in this situation, an explicit conversion would probably improve the clarity of code.)
I think you’re mixing something here. In the discussion above, it was actually explained that the x and y coordinates did not carry the vector space interpretation, as they can correspond to coordinates on an arbitrary curved manifold, not necessarily a flat space. It was the RGB dimension that was given the vector interpretation, although this might actually also not be the best choice, as I seem to remember (I don’t have a decent background in image processing) that color space is also rather curved. Also, even for the case where the domain (x and y) forms a vector space, why would x and y be up and down, or was this just as an example of your notation?
Anyway, I also started with TensorToolbox.jl by denoting covariant and contravariant indices as some kind of parameters or mask, but this soon become a complete nightmare, which is why I switched to a representation where every tensor is an element to of some vector space, and to perform operations, one has to check that spaces match, just like you need to check that sizes match when doing operations with arrays.
x and y coordinates did not carry the vector space interpretation
Sorry, I overread "rectangular grid" --- I guess @jdbates meant precisely what he said. But aren't we just talking about replacing dot products with generalized inner products? (Forgive me if I misunderstand, I spend almost all my time in Euclidean space :-)
every tensor is an element to of some vector space
Seems like a nice idea --- I'd be interested to see some examples of how it works for the user (I didn't get very far reading the code).
I've got a new proposal for this issue.
(1) APL-style slicing.
size(A[i_1, ..., i_n]) == tuple(size(i_1)..., ..., size(i_n)...)
In particular, this means that "singleton slices" – i.e. slices where the index is scalar or zero-dimensional – are always dropped and M[1,:]
and M[:,1]
are both vectors, rather than one being a vector while the other is a row matrix, or any other such distinction.
(2) Introduce Transpose
and ConjTranspose
wrapper types for vectors and matrices. In other words, something like this:
immutable Transpose{T,n,A<:AbstractArray} <: AbstractArray{T,n}
array::A
end
Transpose{T,n}(a::AbstractArray{T,n}) = Transpose{T,n,typeof(a)}(a)
and all the appropriate methods to make these work as they should for vectors and matrices. We may want to limit it to only working for vectors and matrices, since it's unclear what a general transpose should mean for arbitrary dimensions (although just reversing dimensions is tempting). When you write a'
you get ConjTranspose(a)
and likewise v.'
produces Transpose(a)
.
(3) Define various specialized methods for (conjugate) transposed vectors and matrices, such as:
*(v::Transpose{T,1}, w::AbstractVector) = dot(v.array,w)
*(v::AbstractVector, w::Transpose{T,1}) = [ v[i]*w[j] for i=1:length(v), j=1:length(w) ]
etc., including replacing all the horrible At_mul_B
functions and special parsing with lazy (conjugate) transpose construction followed by dispatch on Transpose
and ConjTranspose
types.
(4) Restrict broadcasting operations to cases where the arguments are scalars or arrays with the same number of dimensions. Thus, the following, which currently works as shown will fail:
julia> M = rand(3,4);
julia> M./M[1,:]
3x4 Array{Float64,2}:
1.0 1.0 1.0 1.0
0.516884 0.675712 2.11216 9.0797
1.00641 0.726229 2.48336 4.38751
julia> M./M[:,1]
3x4 Array{Float64,2}:
1.0 0.891557 0.561464 0.103968
1.0 1.16552 2.29433 1.82633
1.0 0.643353 1.38544 0.453257
Instead, you will have to do something like this:
julia> M./M[[1],:]
3x4 Array{Float64,2}:
1.0 1.0 1.0 1.0
0.516884 0.675712 2.11216 9.0797
1.00641 0.726229 2.48336 4.38751
julia> M./M[:,[1]]
3x4 Array{Float64,2}:
1.0 0.891557 0.561464 0.103968
1.0 1.16552 2.29433 1.82633
1.0 0.643353 1.38544 0.453257
I believe this proposal solves all of the major issues we currently have:
v'' === v
.v' == v
.v'w
is the dot product of v
and w
– in particular, it's a scalar, not a one-element vector.v*w'
is the outer product of v
and w
.M*v
is a vector.M*v'
is an error.v'*M
is a transposed vector.v*M
is an error.At_mul_B
operators and special parsing go away.:+1: to all of it. I did some work on 2 and 3 in #6837, but never finished it. @simonbyrne also looked into it.
+1 too. Sounds like it would offer a quite consistent behavior all over the place.
The only really disruptive part of this proposal would actually be that M[1,:]
is a implicitly vertical vector rather than an explicitly horizontal row matrix. Otherwise, it's actually a pretty smooth, non-disruptive set of changes (one hopes). The main epiphany (for me) was that APL slicing behavior could be combined with lazy transposes. If we get buy-in, we can come up with a plan and split up the work. I really hope that lazy transposes and staged functions allow for some code reduction and simplification.
Yes, please! Tensor transpose should probably allow any user-defined permutation, with reversing the dims as the default.
Tensor transpose should probably allow any user-defined permutation, with reversing the dims as the default.
That seems like it would complicate the type a bit much, perhaps we can have a PermuteDims
type that allows arbitrary lazy dimension permutation.
@Stefan: This seems a pretty good idea to work out the vector and 2-dim
algebra. Just a few challenges:
On Thu, Oct 16, 2014 at 2:31 PM, Stefan Karpinski [email protected]
wrote:
The only disruptive would actually be that M[1,:] is a (vertical) vector
rather than a row matrix. Otherwise, it's actually a pretty smooth,
non-disruptive set of changes (one hopes). The main epiphany (for me) was
that APL slicing behavior could be combined with lazy transposes. If we get
buy-in, we can come up with a plan and split up the work. I really hope
that lazy transposes and staged functions allow for some code reduction and
simplification.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-59425385.
Re 2 & 3: having had a rough stab at it, I came to the conclusion that the vector transpose should NOT be a subtype of AbstractVector
, otherwise everything gets way to messy (see discussion on #6837). I think the most sane way forward is with Transpose{T,A} <: AbstractMatrix{T}
, and a separate Covector
type (+ Conjugate
variants).
The other significant problem I came across is that often you want to dispatch on a specific matrix type, its transpose, or its conjugate transpose. Unfortunately, I couldn't come up with a way to express this via existing type machinery (see this mailing list discussion). Without this, I fear we'll be in for a lot of @eval
-ing over 3x3 possible combinations of arguments.
@simonbyrne, I defer to your experience with the implementation. Does the rest of it seem reasonable?
I've pointed out (in less-public forums, so it probably bears a brief mention here) that a potential alternative is to handle all shaping _internally_, by expanding the types of indexes that SubArrays can use. In particular, one could have a "transposed range" type that would confer a transposed shape to the SubArray, even when the parent array is a Vector
. (See https://github.com/JuliaLang/julia/blob/d4cab1dd127a6e13deae5652872365653a5f4010/base/subarray.jl#L5-L9 if you're unfamiliar with how SubArrays are/may be implemented.)
I am not certain whether this alternative strategy makes life easier or harder. It reduces the number of externally-facing types, which might mean that one needs fewer methods. (As someone who is _still_ filling in missing methods due to the Color
transition in Images
, this seems like a Good Thing.) On the other hand, in the absence of convenient triangular dispatch it could make it somewhat more awkward to write selective methods, which might exacerbate the problems raised by @simonbyrne.
Any insights would be most welcome.
Aside from such details, I like the shape of @StefanKarpinski's proposal. I am not wedded to APL-style indexing, but on balance I suspect it is a better choice than the Matlab-derived rules we have now.
Two thoughts:
A[[2], :]
becomes idiomatic, it seems a bit wasteful to have to create a vector just to wrap the single index 2
. Should we consider to allow A[(2,), :]
for the same thing, or something similar? I guess a single element range is ok, but it would be nice to have a syntax for it that is almost as convenient as [2]
.newaxis
indexing. I was thinking of proposing that indexing with semicolons, a la A[2;:]
, could be a different indexing mode where the result always has the same number of dimensions as A
– i.e. drop no singletons and indexing with anything having rank more than one is an error. Decided to leave that out of the core proposal for simplicity, but something like that does seem like a good thing to have.
I can see the concerns expressed by @simonbyrne . However, in principle, a covector is also just a vector living in a different vector space, namely the dual space. So making the Transpose
or Covector
type not a subtype of AbstractArray
also feels somewhat unpleasing. A possible resolution, which would be a major breaking change and is probably not going to be considered (but I wanted to mention it anyway) is to given the whole AbstractArray
family an extra type parameter trans
, which could have values :N
, :T
or :C
. For all methods that just assume a vector to be one-dimensional list of numbers, they would not need to distinguish between different values of this final parameter, and so the corresponding method definitions can stay as they are now.
For N-dimensional arrays with N>2, there are various options. Either transpose
gives an error and it is impossible to actually create an object of type AbstractArray{3,Float64,trans}
where trans!=:N
, or, alternatively, :T
just means row-major and transpose
of a general array has the effect of reversing all dimensions. I think the latter is also the accepted convention by those people that use the Penrose graphical notation (see http://en.wikipedia.org/wiki/Penrose_graphical_notation although transpose is not explained there, but also see the cited book by Cvitanović ).
I don't really see the role for arbitrary index permutations being supported by transpose
, there is permutedims
for that, and maybe some lazy approach using revamped SubArrays. In addition, the main motivation for this issue is to simplify the A_mul_B zoo, and higher order tensor contractions are not (and should not be) supported through normal multiplication anyway.
I am sure there are some new issues related with this approach that I have not yet thought of.
I think I figured out a reasonable solution to the dispatch problem here.
@Jutho's proposal seems interesting, and I think is worth exploring. Unfortunately, the only real way to evaluate these things is to try to implement them.
@toivoh,
A[2:2,:]
will also retain the dimension, and that doesn't require allocation or any new syntax.newaxis
seems eminently feasible. Indeed, with the architecture in #8501 it seems possible to create broadcasting directly by indexing: have an index type that always resolves to 1 no matter what value the user fills into that slot.The issue with 2:2
is the repetition if there is a long expression for the index instead of just 2
. But of course you can always define your own function to create a range from an index.
Very good proposal :+1:.
Remind me why we want v' == v
?
We don't really need that but it's kind of nice since the dual of a (finite dimensional) vector space is isomorphic to it.
Or more strongly, since Julia's arrays don't distinguish between covariant and contravariant indices, it only makes sense to think of this to be vectors in a Cartesian space (Euclidean metric=Identity matrix= kronecker delta), where indeed the dual space is naturally isomorphic.
I'm not so sure we want v' == v, but I think that's pretty orthogonal to
the rest. Do we want a column matrix and a vector to compare equal if they
have equal elements?
That's actually a different issue since they have different numbers of dimensions.
In particular, this proposal effectively removes the identification between a vector and a column matrix – because if you slice a matrix horizontally or vertically, you get a vector either way. Previously you could kind of ignore trailing singleton dimensions – or pretend that there were more than there actually were. It will no longer be a good idea to do that because a vector can come from any slice of an array.
Would it make sense to convert
something from 1-d to 2-d by adding a trailing singleton dimension?
With this proposal, I think that might no longer be a good idea. But maybe since vectors still behave like columns while covectors behave like rows.
One thing I noted in #8416 is that sparsevec
is messily faked as a single-column CSC matrix right now. Sparse should be able to fit into this fairly well once a proper 1-d sparse vector type gets implemented (which would fall out as the simplest useful case of a generic N-d COO type, just needs to be written).
Just taking this all in. So the following would not work?
A[1,:] * A * A[:,1] # row from a Matrix * Matrix * column from a matrix ???
You wrote
v'w is the dot product of v and w – in particular, it's a scalar, not a one-element vector.
Also v' * w is a scalar ?
I like the idea of dot(x,y) taking any two items whose shapes are (1,...,1,m,1,...,1) and
returning the dot product no matter what. But I don't want x*y to give dot(x,y) in this sense
unless x is a covector and y is a vector.
Not sure if this is such a hot idea but maybe it would be okay if
A[ : ,1,1] was a vector and A[1, : ,1] or A[:, 1, :] were covectors.
Feels better to trail along a dimension for the vector -- the slot, on which you
are allowed to contract the tensor, with standard linear algebra being
1 (row vectors) and 2 column vectors.
In my view, the two major challenges we had previously addressed in this issue were:
(A) how to distinguish tensor semantics (for contractions) and array semantics (for broadcasting) when operating on multidimensional data;
(B) how to embed obvious and convenient linear algebra within a consistent framework that generalizes to higher dimensions.
It's not clear to me how this proposal deals with either of these issues. So far as I can tell, to achieve (A) still requires ad-hoc user manipulations (as with present-day functionality); and to address (B) using lazy wrappers would require something like the SubArray extensions suggested by @timholy, at which point it becomes a lazy version of the masking approach discussed some time ago. I can imagine providing additional support for (A) using some similar lazy mechanism (like a List
wrapper type), but in all these cases it seems to me like laziness should be an optional strategy.
I don't know how many share @Jutho's view that "higher order tensor contractions are not (and should not be) supported through normal multiplication anyway", but I couldn't disagree more: I only do what I consider ordinary engineering math, and I need them all the time. While current languages like Mathematica and NumPy have their design limitations in this regard (as I've discussed above), they are at least supported! For instance, as soon as you want to use the gradient of a vector field in a simple numerical method, you need higher-order tensor contractions.
When you say, "...have their design limitations in this regard (as I've discussed above), they are at least supported", are you talking about missing functionality, or something fundamental about vectors and transposes that cannot be addressed at a higher level, or by adding functions?
Does anything about this proposal _conflict_ with improving on your points (A) and (B)?
I really don't see how tensor contractions are supported by matlabs standard multiplication operator * , or through any other built in matlab function for that matter. Numpy has a built in function ( i forgot the name) but it is also rather limited as far as I remember.
I too need tensor contractions in their most general form all the time, that's exactly why I know that specifying the most general tensor contraction, let alone implementing it efficiently, is not entirely straightforward. That s why I argued that there need to be special functions for this, rather than trying to cram some half-working or rather specific functionality into the standard operators in Julia base, which doesn't cover half the use cases. But I am happy to change my opinion, e.g. If there is one 'standard' contraction that is so much more important/ useful than any other ? But this might be very domain dependent and hence not suitable as a general rule for adoption in Julia Base.
Op 19-okt.-2014 om 22:52 heeft thomasmcoffee [email protected] het volgende geschreven:
In my view, the two major challenges we had previously addressed in this issue were:
(A) how to distinguish tensor semantics (for contractions) and array semantics (for broadcasting) when operating on multidimensional data;
(B) how to embed obvious and convenient linear algebra within a consistent framework that generalizes to higher dimensions.It's not clear to me how this proposal deals with either of these issues. So far as I can tell, to achieve (A) still requires ad-hoc user manipulations (as with present-day functionality); and to address (B) using lazy wrappers would require something like the SubArray extensions suggested by @timholy, at which point it becomes a lazy version of the masking approach discussed some time ago. I can imagine providing additional support for (A) using some similar lazy mechanism (like a List wrapper type), but in all these cases it seems to me like laziness should be an optional strategy.
I don't know how many share @Jutho's view that "higher order tensor contractions are not (and should not be) supported through normal multiplication anyway", but I couldn't disagree more: I only do what I consider ordinary engineering math, and I need them all the time. While current languages like Mathematica and NumPy have their design limitations in this regard (as I've discussed above), they are at least supported! For instance, as soon as you want to use the gradient of a vector field in a simple numerical method, you need higher-order tensor contractions.
—
Reply to this email directly or view it on GitHub.
here's a contraction over the last index of A and the first index of B
sort of like mathematica's dot
function contract(A,B)
s=size(A)
t=size(B)
reshape(reshape(A, prod(s[1:end-1]), s[end]) * reshape(B,t[1],prod(t[2:end])) , [s[1:end-1]... t[2:end]...]...)
end
I've always been able to do general contractions with reshapes, permutes, and perhaps complex
conjugates when needed more or less like the above
not sure what the big issue with tensors really is, why can't we just implement a few of these
functions?
Yes, exactly. In this issue, all we want to settle on to move forward is
vector'
give?For tensor contractions, with appropriate types and staged functions I think we could actually get pretty high-performance implementations.
My feeling is that tensors will take care of themselves and we have to be sure
that linear algebra users don't get frustrated.
My biggest concern is that
(take a row from a 2d array) * (2d array) * (take a column from a 2d array)
which is a common operation still won't work unless we take
(take a row) with covector or perhaps better yet
tag it with a general slot index.
@JeffBezanson, when I say that these operations are supported, I mean that the built-in data types and functions are specifically designed with them in mind, for instance, like Mathematica's Dot
function. Thus, for the user, there is a built-in, documented, and/or obvious way to do certain things. For any design, it is possible to achieve support for anything by adding functions, just as it is with the current implementation; so it's not an issue of technical conflict, it's an issue of design.
@Jutho, I don't use MATLAB much, so I can't comment. I agree NumPy's design is less coherent than Mathematica's (as I discussed above), but it also supports a richer range of behaviors. I agree that basic linear algebra should leave the general tensor machinery invisible to users who don't need it, but due to the terrific language features of Julia, it does not seem necessary to introduce divergent implementations for them, as both NumPy and Mathematica have been forced to do to some extent. It seemed to me this issue was, at least in part, about finding the right unified system for both, to reveal what specializations should be used for the common linear algebra cases: for instance, what to do about vector'
.
A[1,:] * A * A[:,1] # row from a Matrix * Matrix * column from a matrix ???
Correct – you would have to write A[1,:]' * A * A[:,1]
.
Also v' * w is a scalar ?
Yes, v'w
are the same thing. One of the good things abot this proposal is that it completely eliminates cheap syntactic hacks.
Not sure if this is such a hot idea ...
I don't think it is. One of the goals of this proposal is to make slicing and indexing rules symmetrical and this would make one of the indices special, which seems to me to defeat the entire purpose. If slicing is going to be asymmetrical, we might as well keep the current behavior.
@thomasmcoffee You'll just have to be more specific. Of course everybody wants things to be coherent, documented, obvious etc. The question is, does the proposal on the table serve those goals? Maybe the current proposal doesn't affect those goals at all, which is ok --- then as long as it leads to an improvement elsewhere, we still have a net improvement.
So let me get this straight
If A is not square
| | Current | Proposed | MATLAB |
| --- | --- | --- | --- |
| A * A[1,:] | No | Yes | No |
| A * A[1,:] ' | Yes | No | Yes |
| A[:,1] A | No | No | No |
| A[:,1] 'A | Yes | Yes | Yes |
and if A is square
| | Current | Proposed | MATLAB |
| --- | --- | --- | --- |
| A * A[:,1] | Yes | Yes | Yes |
| A * A[:,1] ' | No | No | No |
| A[1,:] A | Yes | No | Yes |
| A[1,:] 'A | No | Yes | No |
I swear I just posted an answer to this but somehow it disappeared into the ether. This is all correct. In the current arrangement, you need to consider whether you're taking a row slice or a column slice as well as whether you are multiplying on the left or right when deciding whether to transpose or not (columns are transposed on the left, rows are transposed on the right). In the proposal, you only consider which side you're multiplying on – you always transpose on the left, never on the right.
Would it be okay if
dot(x,y)
and dot(x.',y)
and dot(x,y.')
and dot(x.',y.')
all give the same scalar?
i.e. Σᵢ conj(xᵢ) * yᵢ
this way one can do dot(x,A*y) without thinking too much
Those examples by @alanedelman feel a bit backward transposing in the places you shouldn't, due to APL indexing. Maybe that's motivation enough to have a dimension preserving variant of indexing, as I think had been discussed (eg A[i; :]
?)
But then you would want that to give a covector in the case above.
@alanedelman, I don't see any reason those methods of dot
shouldn't exist and give the same result.
I've always been able to do general contractions with reshapes, permutes, and perhaps complex
conjugates when needed more or less like the aboveThat’s exactly how it is implemented in the tensorcontract function in TensorOperations.jl, if you choose the :BLAS method, which is certainly the fastest for large tensors. I also wrote a native julia implementation, using the Cartesian.jl functionality (and hopefully one day using staged functions) which eliminates the need for permutedims (extra memory allocation) and is faster for smaller tensors.
I was just responding to the false claim that matlab provides built-in functionality for this which Julia doesn’t. Both the reshape and permutedims are available in Julia. Numpy indeed has the tensordot function which does exactly this, but doesn’t allow you to specify the index order of the output tensor, so you still need a permutedims afterwards if you had a specific output order in mind.
But this is getting too far from the current topic, which is indeed to get consistent behavior for vector and matrix algebra.
+1 for Stefan's proposal. It seems to give extremely clear but sufficiently flexible semantics. As a linear algebra user, even one accustomed to MATLAB style syntax, I'd find the rules simple enough to use.
I am a little confused about what '
is supposed to mean in general. If v
is a vector, v'
is a transpose
. If a
is a 2d array, a'
would now be a transpose
as well. These both seem to be defined in the interest of being able to easily form b' * a
by contracting the first dimension of b
with the first dimension of a
.
There seems not to be consensus on a definition of a'
when the dimension of a
is >2. I haven't heard anyone propose anything other than reversing the dimensions, and this coincides with b' * a
contracting the first dimension of b
with the first dimension of a
.
I do think it would be nice if we could state what '
does succinctly without referring to the size of the thing it is operating on.
It seems reasonable to have another contraction function available in Base for more general situations, eg contract(a, b, 2, 3)
to contract the 2nd dimension of a
with the 3rd of b
.
By the way, dot(a,b) == a'*b
when a
and b
are vectors, but dot(a,b)
is currently undefined for matrices. Could we have dot(a,b) = trace(a'*b)
?
@madeleineudell: I am a little confused about what ' is supposed to mean in general.
I share this concern. You can basically take properties 2-5, 7, 8 & 10 in my proposal as the defining characteristics. I.e. you want this to hold:
v'
is one-dimensional but not a vectorv'' === v
v' == v
v'w
is a scalarv*w'
is a matrixv'*M
is the same kind of thing as v'
M'
is two-dimensional but not a matrix, maybe a matrix viewIn particular, there are no constraints on what it means or does for higher dimensional arrays. A general theory of what covectors are that encompases higher dimensionalities would be nice, but I'm not convinced it can really be done without complicating things too much, e.g. by having up/down dimensions or having each dimension be labeled by an index.
It's clear at least that the general rule for '
is not that it would reverse the dimensions, since that is not what it would do for vectors and covectors.
The only simple rule I can think of that captures the behavior above for both vectors, covectors, and matrices is to permute the first two dimensions (a vector has an absent second dimension and a covector has an absent first and present second).
On 20 Oct 2014, at 09:05, toivoh [email protected] wrote:
It's clear at least that the general rule for ' is not that it would reverse the dimensions, since that is not what it would do for vectors and covectors.
The only simple rule I can think of that captures the behavior above for both vectors, covectors, and matrices is to permute the first two dimensions (a vector has an absent second dimension and a covector has an absent first and present second).
I would argue that this is not true if you want to think about general tensors. It might be true if you only think about matrices and vectors as some blocks with numbers in it, and you think of v as a column and v’ as a row.
However, if you think of v as some element of a vector space, and w=v’ as some way to map v to an element w of the dual space V_, then w is still a vector, i.e. one-dimensional object. In general, one needs a metric to define this mapping from V to V_, i.e. w_i = g_{i,j} v^j. Now if V is a Cartesian space, i.e. R^n with the standard Euclidean metric, then V* is naturally isomorphic to V. This means that there is no notion of upper or lower indices (covariant or contravariant) and hence that w_i = v_i = v^i = w^i. I would argue that this is the case used in most programming, i.e. case that needs to be supported by Julia Base. (For a complex vector space, V* is naturally isomorphic to Vbar, the conjugate vector space, and hence the natural mapping is w_i = conj(v^i). )
The convention to denote vectors as columns with numbers, dual vectors as rows with numbers, and hence matrices (which are elements of V \otimes V_) as blocks with numbers is extremely convenient, but stops as soon as you also want to consider higher dimensional arrays, i.e. elements of higher order tensor product spaces. In that case, a ‘row vector’, i.e. two-dimensional object where the first dimension has size 1, to say it in matlab/julia terminology, is some element of a tensor product space V1 \otimes V2, where V1 just happens to be R (or some other one-dimensional space). I agree that this might not be what you want as default behaviour, but than I would prefer that one should just not try to define transpose for a general array and refer to permutedims, like matlab does. Reversing the first two dimensions for a general tensor as default convention doesn’t make sense. Transposing an element from some higher order tensor product space V1 \otimes V2 \otimes … \otimes Vn has no unique definition. The convention to reverse all dimensions just stems from a convenient graphical representation by Penrose, as referred to above. In addition, this is the one which maps matrix space (V \otimes V_ ) to itself (V* \otimes V* = V \otimes V).
I can see two ways forward:
1) Make the convenience operator ‘ (and maybe even *) work with arbitrary higher-order arrays using some chosen convention, within the setting of Cartesian tensors (i.e. no distinction between upper or lower indices). This might bring some surprising results for typical use cases.
2) Restrict ‘ and * to vectors and matrices. Error on higher order arrays. I think this is the most popular approach (e.g. Matlab etc).
This post is a bit of a taking the other side from the position I took last year, so as to
Hope that's okay.
It finally hit me that there's a logic to the current approach, but it was never articulated.
It looked so much like a hack that it just annoyed me. Somehow now that I understand
it, I'm liking it more.
In the current approach, all arrays are infinite dimensional with implied dropped 1's.
The apostrophe operator ' could have meant interchange the first two dimensions,
but in fact as it exists today, it means
ndim(A) <= 2 ? interchange_first_two_dims : no_op
sorry if everyone else saw that and I just missed it. My mind was bogged down
with the idea that vectors were one dimensional, not infinite dimensional, and
therefore a transpose ought to reverse dimensions, hence be a no_op.
I'm okay with the apostrophe doing this, or the apostrophe always interchanging
the first two dimensions -- I don't think I care. I think the apostrophe exists
for linear algebra and not multilinear algebra.
I toyed with whether apostrophe-star (" '* ") (if possible! or something else if impossible)
should mean contract last dimension to first dimension (like Mathematica's dot)
and have apl indexing and no covectors. Part of my brain still think's that's worth exploring,
but as I wake up this a.m. the current approach is seeming better and better.
(Let's see how I feel later today)
I don't regret starting this thread last year, but I'm wondering again what cases
really annoy people today ... can we get a list of examples? ... and whether
shedding light on these cases is better than changing what we do.
I've read through this whole thread and see many many principles -- which is good,
but not enough use cases to put my head around things.
I can say that I have often been annoyed by
[1 2 3] # ndims == 2
[1,2,3] # ndims == 1
[1;2;3] # ndims == 1
mainly because I can't seem to remember it.
Just a thought experiment. What functionality would be lost if we redefined *
to serve as a contraction operator similar to what @alanedelman has in mind for '*
? Wouldn't it remove the need for '
in linear algebraic operations at all (apart from functioning as a transpose for matrices)? I can only see that it would devoid us of the outer product w*v'
, which could easily be replaced by outer(w,v)
.
EDIT: Assuming APL slicing that is. For example A[1,:]*A*A[:,1]
would have the expected result without the need for transposing the first operand with '
.
I thought of this too. I think that v * w being a dot product just seems like overloading on steroids
that may be error prone.
This is one place where mathematica's dot seems not so bad.
So to summarize, a contract last to first seems reasonable, but
whether it could be apostrophe-star or could be * or should be dot
has issues.
Somewhat unrelated but not completely unrelated ....
I'd like to point out that the dot-star which everyone reads as dot-star originally (i'm told) was
meant to be point-star because the POINTWISE operator was what
was meant
I can say that I have often been annoyed by
[1 2 3] # ndims == 2
[1,2,3] # ndims == 1
[1;2;3] # ndims == 1mainly because I can't seem to remember it.
I always tought "why don't we use only ,
?"
What functionality would be lost if we redefined * to serve as a contraction operator similar to what @alanedelman has in mind for '*?
We would lose associativity: e.g. (M*v)*v
would give the current dot(v,M*v)
(a scalar), whereas M*(v*v)
would give M.*dot(v,v)
(a matrix).
Why don't we just make dot
the contraction operator (which is not associative anyway)? We could also define higher order contractions, e.g. ddot(A::Matrix,B::Matrix) == A⋅⋅B == trace(A'*B)
.
So do I understand correctly that the only purpose of introducing vector transposes is to save us from non-associativity of *
? The ambiguity of the example of @alanedelman could be fixed by transposing one of the vectors (M*v*v'
vs M*v'*v
) but the same can be done with parenthesis (M*(v*v)
vs (M*v)*v
) without all the hassle of implementing the transpose for vectors (as @Jutho already pointed out implementing transpose for higher order tensors does not make sense mathematically anyway). To me the question is which notation is more readable/concise/elegant/pure.
Actually M*(v*v)
and (M*v)*v
are currently both parsed as
*(M, v, v)
to allow for a matrix product that optimizes the order of multiplication based on the sizes of the arguments, so the parser would have to be changed as well.
But at least personally I think that associativity is a pretty big deal. You have to translate between math and most programming languages; for Julia I still hope that you won't have to translate very much.
(as @Jutho already pointed out implementing transpose for higher order tensors does not make sense mathematically anyway).
I don’t think that’s exactly what I said.
What functionality would be lost if we redefined * to serve as a contraction operator similar to what @alanedelman has in mind for '*?
We would lose associativity: e.g. (M_v)_v would give the current dot(v,M_v) (a scalar), whereas M_(v_v) would give M._dot(v,v) (a matrix).
Why don't we just make dot the contraction operator (which is not associative anyway)? We could also define higher order contractions, e.g. ddot(A::Matrix,B::Matrix) == A⋅⋅B == trace(A'*B).
With that line of reasoning, we just don’t need the multiplication operator for vectors and matrices anymore, we can just write everything in terms of dot:
A ∙ B
A ∙ v
v ∙ A ∙w
v ∙ w
So that’s just the @pwl proposal but with * replaced with dot.
But at least personally I think that associativity is a pretty big deal. You have to translate between math and most programming languages; for Julia I still hope that you won't have to translate very much.
I guess part of the problem is that even in mathematics there are different conventions. Is thinking in terms of row and column vectors mathematics, or do we want the more abstract behaviour in terms of linear operators and dual spaces (where I guess an operator is not multiplied with a vector, but applied to a vector, so why not write A(v) instead of A*v or A ∙ v )?
I am not in big need for convenience syntax, I personally prefer to write dot(v,w) over v’*w every time, since the former more clearly expresses the basis independent mathematical operation (In fact, one first needs to define a scalar product before a natural mapping from vectors to dual vectors even can be defined).=
Sorry for being so ignorant, but why exactly can't M[1, :]
be a transpose, behaving like v'
?
I'm going to add to my list of things that bug me
1.
[1 2 3] # ndims == 2
[1,2,3] # ndims == 1
[1;2;3] # ndims == 1
2.
v=rand(3)
v' * v
curently has ndims == 1
(@StefanKarpinski 's proposal fixes this)
(v' * v)/( v' * v )
has ndims ==2
(this really really bugs me and would also be fixed)
Still I don't really want covectors
and apl indexing is something i keep going back and forth on
--- still thinking, but I'd love to see the list of things that
bug other folks in current julia
I definitely like the idea of cdot
and allowing dot(a,b, ..) allow for very general contrations.
Despite Numpy's naming convention (and maybe the appearance of my previous post), I have somewhat mixed feelings about using dot for general tensor contractions. This is Wikipedia's first sentence:
In mathematics, the dot product, or scalar product (or sometimes inner product in the context of Euclidean space), is an algebraic operation that takes two equal-length sequences of numbers (usually coordinate vectors) and returns a single number.
In my mind, too, dot should return a scalar.
@brk00 , the problem with your question is that is not clear how to extend this to slices of higher-dimensional / higher-rank (I really dislike the world dimensional for this) arrays.
Sorry for being so ignorant, but why exactly can't M[1, :] be a transpose, behaving like v'?
It can, but how do you generalize this?
@Jutho, I am sorry, I should have phrased it differently. I meant your line "Transposing an element from some higher order tensor product space [...] has no unique definition", so there is no mathematical motivation for introducing one particular definition, or for defining it at all.
@alanedelman:
I'm going to add to my list of things that bug me
[1 2 3] # ndims == 2
[1,2,3] # ndims == 1
[1;2;3] # ndims == 1
Concatenation behavior is a unrelated to this issue. Let's not muddy the waters further.
I definitely like the idea of`cdot contracting last index to first index in addition to the * operator
and allowing dot(a,b, ..) allow for very general contrations.
This is also tangential. Let's stay focused on arrays and slicing.
In the current approach, all arrays are infinite dimensional with implied dropped 1's.
This is not really true. If this were true, there would be no distinction between ones(n)
, ones(n,1)
, ones(n,1,1)
, etc. But those are all different objects with different types which are not even equal to each other. We make some effort to implement things so that they behave _similarly_ but that's a far cry from arrays actually being infinite dimensional.
The list of things that are are currently bothersome largely mirrors (a subset of) the nice properties of my above proposal. Namely:
v'' !== v
– in fact v'' != v
; this is because they don't have the same rank.v' != v
– same deal.v'w
is a one-element vector, not a scalar.A*_mul_B*
, which is the most awful thing ever.I agree with most of @StefanKarpinski's points, except for the v' == v
bit: I don't think these should be equal. Yes they may be isomorphic but they still are different objects, in that they behave very differently: matrices M
and M'
are also isomorphic yet I don't think we would ever want them to be equal (unless they are Hermitian of course).
My general view with the Covector
types was that they were to be relatively lightweight: other than basic operations (multiplication, addition, etc.), we would avoid defining too many operations (I would even be hesitant to define indexing). If you want to do something with it, you should convert it back to a vector and do your stuff there.
+1 to @simonbyrne's take, including his general approval of @StefanKarpinski's view.
Also agree with @simonbyrne.
Yes, v
and v'
are have different types, but we already consider different array types with the same shape and data to be equal – for example speye(5) == eye(5)
. Why is covector different than sparse?
I hope that covectors would broadcast like row matrices. For arrays A
and B
that are considered equal, it so far holds that
all(A .== B)
which would not be the case if one is a vector and one is a covector.
To me, a covector is more like a row matrix than a vector or column matrix.
I guess a key question is what is size(v'
)? If the answer is (length(v),)
then I think that v == v'
should be true. If size(v') == (1,length(v))
then it should probably be false, but arguably then v' == reshape(v,1,length(v))
should be true. So the question is should v'
be a special kind of vector or a special kind of matrix?
Increasingly I'm convinced that this issue is really about what transpose really means.
Covectors are not really arrays, and so I don't think we can really say what "shape" they have. A covector is really defined by what it does, that is *(::Covector, ::Vector)
gives a scalar: as any AbstractVector
doesn't do that, it's not really the same thing.
Another problem would be in the complex field: would we have v' == v
or v.' == v
?
@simonbyrne:
A covector is really defined by what it does, that is
*(::Covector, ::Vector)
gives a scalar: as anyAbstractVector
doesn't do that, it's not really the same thing.
This is a really good point. However, it might be rather frustrating if v'
cannot be used as an an object. Perhaps the right approach is to treat v'
as funny row matrix instead of a funny vector.
Perhaps the right approach is to treat v' as funny row matrix instead of a funny vector.
Kind of, but I don't think it should be a subtype of AbstractMatrix
either: I think AbstractCovector
should be a top-level type. I'd be happy to define length(::Covector)
, but I don't think we should define a size
method.
I'm not sure about handling broadcasting: unless we can come up with reasonable criteria, I would err towards not defining broadcasting methods.
This discussion seems to be converging towards using transposes and vectors like they are used in engineering, i.e. think of everything as being matrices. Think of vectors as a column and of convectors as a row. This is good for vectors and linear maps in Cartesian space (the typical use case), but starts to fail if one tries to generalize to multilinear algebra or to more general vector spaces etc. There seem to be many confusions originating from the fact that for cartesian spaces lots of things are equivalent which are not equivalent for general spaces. I am not necessarily opposed to this as Julia's default behavior, but I really disagree with some of the statements above as if they would be “mathematically correct”. So let me contradict those below.
On 20 Oct 2014, at 17:39, Simon Byrne [email protected] wrote:
I agree with most of @StefanKarpinski's points, except for the v' == v bit: I don't think these should be equal. Yes they may be isomorphic but they still are different objects, in that they behave very differently: matrices M and M' are also isomorphic yet I don't think we would ever want them to be equal (unless they are Hermitian of course).
This statement makes no sense at any level.
1), I think you are abusing the meaning of isomorphic here. Isomorphic is a relation between two spaces (in this setting). In general, for every (real) vector space V there is a dual space V* of linear functions (for complex spaces, there is also the conjugate space and the dual of the conjugate space). These are typically different space, and there is not even a unique or natural mapping from one to the other, i.e. there is in general no meaning to associate elements v of V to elements phi of V*. The only general operation is applying the linear function, i.e. phi(v) is a scalar.
A natural mapping from V to V* can be defined once you have an bilinear form V x V -> scalars, typically an inner product / metric. One can then define phi_i = g_{i,j} v^j .
2), there is in fact no natural operation associated to "transposing a vector" (see point 3). This confusion comes from identifying vectors with column matrices.
Putting it probably too strong, I would actually like to see any use case of the transpose of a vector that you cannot obtain using dot and maybe some outer product/tensor product operation?
However, if you want to use transpose as a way to map vectors to dual vectors, i.e. to map v in V to some phi=v' in V_, then you are assuming that you are working with the standard Euclidean inner product ( g_{i,j} = delta_{i,j} ). At that point, you are eradicating the distinction between covariant and contravariant indices, you are essentially working with Cartesian tensors, and V and V_ become naturally isomorphic. As stated above, w_i=v^i = v_i = w^i, so yes, v==w and I would even say that there is nothing that distinguishes these two.
3) The operation “transpose” is originally only defined for linear maps from V -> W (http://en.wikipedia.org/wiki/Dual_space#Transpose_of_a_linear_map), and in fact even there it might not mean what you think. The transpose of a linear map A: V->W is a map A^T from W_->V_, i.e. it acts on vectors in W_, on dual vectors, and produces elements of V_, i.e. covectors of V. This means, if you think of it in terms of the usual transpose A^T of a matrix A, that this matrix A^T is to be multiplied with columns representing vectors in W* and that the column that comes out of this represents a covector of V. So at this point the identification of dual vectors with row vectors already fails.
However, in the typical use case of real vector spaces where V* and W* are identified with V and W via the standard Euclidean inner product, the transpose of a linear map can be identified with the adjoint of that linear map, which is indeed a map from V->W as most people think is also the case for the transpose of the map. In the complex case, this fails since ordinary matrix transposition (without complex conjugation) only makes sense as a map W_->V_, as a map W->V it is not a basis independent definition and thus not operationally meaningful.
As to what transpose should mean for higher dimensional arrays, within the matlab/engineering approach that we are converging to: it should be an error, it doesn’t generalize. This doesn’t mean that there is no way to define it. The problem is, what does a higher order array represent. Is it an element of a tensor product space V1\otimes V2 \otimes … \otimes VN, is it a multilinear map acting on V1 x V2 x … x VN, is it linear map from some tensor product space V1 \otimes V2 \otimes … \otimes VN to another tensor product space W1 \otimes W2 \otimes … \otimes WM ? For the two-dimensional case, there is a resolution. Identifying linear maps A: V->W with vectors in W \otimes V_, transposition in the linear map sense corresponds to reversing the spaces in this tensor product space, A^i_j -> A_j^i with A^T in V_ \otimes W = V* \otimes W_, which is indeed a map from W_ -> V. The nice thing about reversing dimensions for higher order objects is that it has a convenient graphical representation.
In conclusion, I think that the problem is whether Julia’s Vector needs to capture the properties of an arbitrary general vector in the mathematical sense, or whether it just represents a column of numbers, a list, … The words vector, tensor, … have well-defined operational and basis independent meanings in mathematics, which might conflict with the kind of operations that you want to define on a Julia Vector. In reverse, some objects are really vectors from the mathematical point of view (dual vectors etc) which should probably not be identified with Julia Vectors, since the standard Julia (Abstract)Vector type has no way to distinguish between different vector spaces.
In that respect, there is some asymmetry, since the term Matrix is much less overloaded, even from the mathematical point of view a matrix is just a convenient representation of a linear map in a certain basis. Note that there are also many cases where you don’t want to represent a linear map as a Matrix, but rather as a function (this issue has come up before in e.g. the arguments of eigs etc). In that respect, I can see why matlab didn’t even bother to have a vector, a truly one-dimensional structure. In this approach, everything is just interpreted as a ‘matrix’, i.e. a block with numbers, anyway.
Question? Let c be a covector. What is fft(c) ?
Answer: fft(c')' which takes complex conjugates in potentially unexpected ways
when compared with fft(c)
users might just benefit from it being undefined
(or perhaps this would be good for users, if documented well ??)
I bet there's a lot more of these kinds of things
I suspect the right thing to do in Base
for now is to only define:
+1 to minimal support for covectors for now.
Yes, +1.
So if i mentioned that i'm pretty sure that
norm(covector,q) should be norm(vector,p) where 1/p + 1/q=1
from the Holder inequality, that wouldn't be implemented for a long time. :-)
Thank heavens for p=q=2
That wouldn't be all that hard to implement:
norm(c::Covector, q::Integer) = norm(c.vector, q/(1-q))
You'd probably want some additional checking to avoid q == 0
and q == 1
.
I think q == 1 would be okay
Med venlig hilsen
Andreas Noack
2014-10-22 15:19 GMT-04:00 Stefan Karpinski [email protected]:
That wouldn't be all that hard to implement:
norm(c::Covector, q::Integer) = norm(c.vector, q/(1-q))
You'd probably want some additional checking to avoid q == 0 and q == 1.
—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-60139762.
I'm working on a writeup of the covector proposal (which I mostly support), but it may be useful to post now one point which makes precise @StefanKarpinski's notion of "funny row matrix".
Covectors are not vectors, but the description of why is not always clear. A covector (or bra-vector, as physicists would call it) is a linear functional that eats a vector and spits out a number which is a dot product.
More precisely:
V = V(F)
and W = W(F)
be vector over some field of elements F
,v
and w
be vectors that are elements of V
and W
respectively,<.,.>
be an inner product such that <.,.> : V × W → F
and v, w ↦ <v, w>
The covector corresponding to v
is the linear functional v' : W → F
that performs the mapping w ↦ <v, w>
.
The usual description ends with saying that v'
is an element of a dual space V*
with not much else said about what the dual space _is_.
For computer science-y people, there is a simple description of v'
as a currying of the inner product with respect to its first parameter, and V*
as the collection of functions that are one-parameter curries with different first vectors.
This Wikipedia article may be helpful to reconcile the different notations associated with the two descriptions, but they are expressing exactly the same concept.
Also, I initially thought that indexing into a covector should be disallowed. However, indexing v'[1]
is equivalent to applying v'
to the canonical basis vector e₁ = (1, 0, ...)
, so that v'[1]
can be defined as <v, e₁>
. More general indexing semantics like 1:n
follow naturally from applying v'
to a suitable projection matrix with each column being a canonical basis vector.
Considering the indexing semantics of covectors in the context of #987 gives another reason why covectors are not AbstractVector
s: it is not possible in general to index v'
cheaply because it all depends on how expensive quantities like <v, e₁>
are to compute. In general you could have an inner product defined with respect to a matrix <v, w> = v'*A*w
and the cost of indexing is dominated by the matvec product A*w
(or A'*v
), which is certainly too expensive to qualify as an AbstractVector
.
Since we're talking about DFTs and norms... this crossed my mind today
if f
is a function of a vector and produces a scalar, then I would like diff(f)
to be a function that accepts a Vector
and produces a Covector
so that f(x) ~= f(x0) + diff(f)(x0) * (x-x0)
. A very common use of the gradient is to get a linear approximation of the incremental change in the output of the function given an incremental change in the input. If the input is a vector, then it feels natural for the gradient to be something that contracts along all dimensions of the input.
But if I were to go implement gradient descent, I would need to add (a scaled version of) the gradient at x
to itself. In this sense, the gradient is a vector that points in the steepest direction, whose norm is proportional to the rate of change along that steepest direction.
My gut says "gradient of vector-input function at a point is a covector" is more important.
I suspect the right thing to do in Base for now is to only define:
covector times another vector
covector times a matrix
covector' to get a vector
Despite the impression you might have gotten from my previous dreadful posts, +1 to this.
Nevertheless, some remarks on this:
I'm working on a writeup of the covector proposal (which I mostly support), but it may be useful to post now one point which makes precise @StefanKarpinski's notion of "funny row matrix".
Covectors are not vectors, but the description of why is not always clear. A covector (or bra-vector, as physicists would call it) is a linear functional that eats a vector and spits out a number which is a dot product.
I think dual vectors/covectors (I also prefer the term linear functionals) can be defined without having an inner product. It is just that you need an inner product if you want to define a mapping from V to V*
More precisely:
Let V = V(F) and W = W(F) be vector over some field of elements F,
v and w be vectors that are elements of V and W respectively,
<.,.> be an inner product such that <.,.> : V × W → F and v, w ↦
It’s pretty strange, and I think impossible, to have an inner product defined between two different vector spaces V and W. How do you define the property of positive definiteness?
The covector corresponding to v is the linear functional v' : W → F that performs the mapping w ↦. The usual description ends with saying that v' is an element of a dual space V* with not much else said about what the dual space is.
I think it is pretty clear, certainly for the finite-dimensional case, that the dual space, by its very name, is a vector space. However, the summary of my previous rant is that, Julia’s (Abstract)Vector type is more like a List. It can be used to represent certain vectors, and other one-dimensional data structures which don’t have the mathematical structure of a vector, but it is not sufficiently general to capture all objects which are mathematical vectors (since it cannot distinguish between different vector spaces).
Considering the indexing semantics of covectors in the context of #987 gives another reason why covectors are not AbstractVectors: it is not possible in general to index v' cheaply because it all depends on how expensive quantities like
= v'_A_w and the cost of indexing is dominated by the matvec product A_w (or A'_v), which is certainly too expensive to qualify as an AbstractVector.
This is kind of a loop argument. As mentioned above, a linear functional (covector) is more general and exists even for vector spaces without inner product. Secondly, when the metric is a positive definite matrix A, the natural mapping from a vector v to a covector is indeed v’_A. However, what is this v’ here? Is it already a mapping from v to a different covector defined with respect to the standard Euclidean norm (with an identity as metric) ? No it is not. If vectors have contravariant (upper) indices, and covectors have covariant (lower) indices, then the metric A has two lower indices and the inner product is
So actually, one of the points I was trying to make above, is that people often write v’ when they are not really trying to work with a covector. They just try to write expressions that are defined on vectors, such as an inner product
On an aside, there was some remark before about associativity of the multiplication if we would have v’ == v as in Stefan’s original proposal. However, even without this, I would not say that * is associative, even if v’ is a covector which is not identified with ordinary vectors:
A_(v’_w) is a matrix
(A_v’)_w is producing an error
The gradient of a scalar valued function is indeed one of the proper applications of covectors, i.e. without argument the gradient is a covector. What is implicitly assumed in applications like conjugate gradient is that there is a metric (and actually an inverse metric) to map these covectors back into vectors. Otherwise there is no point in doing something like x^i (position vector) + alpha g_i (gradient). The proper way to write this is x^i + alpha delta^{i,j} g_j where delta^{i,j} is the inverse metric. If one tries to define optimization techniques on a manifolds with a non-trivial metric, then one need to take this (inverse) metric into account. There is a nice book on this:
http://sites.uclouvain.be/absil/amsbook/
and a corresponding matlab package:
http://www.manopt.org
On 22 Oct 2014, at 22:52, goretkin [email protected] wrote:
Since we're talking about DFTs and norms... this crossed my mind today
if fis a function of a vector and produces a scalar, then I would like diff(f) to be a function that accepts a Vector and produces a Covector so that f(x) ~= f(x0) + diff(f)(x0) * (x-x0). A very common use of the gradient is to get a linear approximation of the incremental change in the output of the function given an incremental change in the input. If the input is a vector, then it feels natural for the gradient to be a covector.
But if I were to go implement gradient descent, I would need to add (a scaled version of) the gradient at x to itself. In this sense, the gradient is a vector that points in the steepest direction, whose norm is proportional to the rate of change along that steepest direction.
My gut says gradient is a covector is more important.
—
Reply to this email directly or view it on GitHub.
A_(v’_w) is a matrix
(A_v’)_w is producing an error
Oh, crap.
This thread was overdue for an post with a humorous image in it.
@Jutho I admit that I am not using the most general form of dual space, but I do not see how my description is at all circular given my choice of defining a dual space using the framework of a Banach space. The Banach space formalism is already overkill for finite-dimensional vector spaces anyway.
Circular is not the right term indeed. What I was trying to say with that paragraph is that I can reverse this line of reasoning regarding efficient evaluation of entries, since for example in the case of the (conjugate) gradient on curved manifolds, I guess the gradient (covector) g_i
can be calculated efficiently, but the corresponding vector A^{i,j} g_{j} that will be added to x^i (with A^{i,j} the inverse metric) might not be efficient. I just now noticed how badly outlined my previous message appeared on Github; it was fine when I wrote the email. I tried to fix it now. I don't want to keep repeating myself, nor do I want to give the (wrong) impression as if I would not agree with the current proposals. The only points I was trying to make.
AbstractVector
, nor do all objects that are subtypes of AbstractVector
have the proper mathematical characteristics of a vector. In that sense, I like the name Matrix
much more than the name Vector
, as it has much less connotation to it. Hence, I guess I can agree with the fact that, whatever Covector
type is being introduced as part of this proposal, it should not be a subtype of AbstractVector
or even AbstractArray
, even in those cases where V* is naturally isomorphic to V (Cartesian space, which makes up probably half of the applications).v'
, they are not really trying to construct a covector, they are just trying to write a bilinear mapping between two vectors (i.e. V x V to scalar), as in v'_A_w= v^i A_{i,j} w^j or even v'w = v^i delta_{i,j} w^j using the convenient matrix representation. I prefer to write dot(v,A*w)
explicitly, but that's a personal choice. Since we have no information about upper or lower indices on matrices, one can construct use cases where in a scalar expression like v'*A*w
both v'
and w
can be either vectors or covectors in the mathematical sense. The only consequence of this is that I am not sure whether I am truly happy with calling the type of v'
Covector
, as just like the name Vector
it has too much mathematical connotation which might not be justified in most of the applications, which then leads to more (mostly irrelevant) discussions just like this. @jutho +1 for gradient being covectors
I first learned this from Steve Smith's phd thesis and have used this idea
to explain clearly this vague idea of conjugate gradient for eigenvalue computations
that people used to talk about in chemistry and physics.
I've been increasingly blown away about how internally consistent
and self-contained is the world of rank 2 tensors with one upper and one lower index is.
This is what we conventionally call matrix computations or just plain old linear algebra.
Heck, while I can find lots of examples like norm(v,p) or fft(v) where functions
differ if they are vectors or covectors, I really don't have one single good example (yet!)
of a function that is naturally different on a vector and a one column matrix.
(Someone help me, surely there must be one, even many!!)
I also for several years now have been concerned like @jutho that abstract vector
spaces hadn't quite found it's way into julia yet, I remember chatting with @StefanKarpinski
about this at my whiteboard years ago. Still the overarching concerns of
1) newcomers to Julia must have an easy experience and
2) performance
must trump this fancy stuff.
As a result of talking with @jiahao it really hit me that there are two very special worlds:
Linear Algebra (arrays with one contra, and one co) and there are simple tensors
(everyone is a co, no contras). The latter works well in APL an Mathematica.
The former is all over linear algebra and was captured best in MATLAB before
they grafted on arrays of dimension higher than 2. There is the fuller generality
which after a word with @JeffBezanson seemed like it wasn't so crazy.
by the way I think it's taken about a year if not more, but we are finally taking this seriously :-)
Regarding
A_(v’_w) is a matrix
(A_v’)_w is producing an error
Only matrix multiply " * " is associative. The overloaded scalar times matrix was never
associative. Not even in math, not even in MATLAB.
A_(v’_w) is a matrix
(A_v’)_w is producing an error
Good catch. Are there any cases though where non-errors produce different answers? A related question: what should a scalar * covector do?
The overloaded scalar times matrix was never associative. Not even in math, not even in MATLAB.
Operations involving only matrices and scalars are associative (see point 3). As we can treat vectors as 1 column arrays, then including them doesn't cause a problem either. The wrinkle here is that the Covector is an operator that can reduce dimensions (map vectors to scalars), so I'm not sure how that fits in.
Scalar-matrix addition is a bigger concern, as that ruins distributivity (though if I recall correctly, I think that debate resolved to keep it):
(A+s)*v != A*v + s*v
This is an extremely nice summary:
As a result of talking with @jiahao it really hit me that there are two very special worlds:
Linear Algebra (arrays with one contra, and one co) and there are simple tensors
(everyone is a co, no contras).
It is clear that this discussion is not about supporting the more general case, that's too confusing for people who don't need the full generality, cannot be included in the current AbstractArray hierarchy and is more suitable of a package (which I am actually working on). But the discussion is indeed, which of these two special worlds do we want to support, as they are not mutually compatible.
In the former, all vectors v
are columns, all v'
are covectors and all matrices are linear operators V->W (e.g. they live in W ⊗ V_) This excludes the representation of e.g. bilinear maps V x V -> scalar (e.g. v^i A_{i,j} w^j) as this requires a matrix with two lower indices (e.g. living in V_ ⊗ W_). Of course, you can still use it in practice and write it as v'_A*w
, but it kind of conflicts with the chosen nomenclature of the objects. In addition, it doesn't specify in which space higher order arrays are living.
The latter case resolves that issue since it has (V == V*), but leads to surprising results as v' == v
. In addition, this solution is actually limited to real vector spaces, since even in complex spaces with a standard Euclidean inner product (i.e. 'complex Cartesian space'), the dual space is not naturally isomorphic to V but to conj(V) ( V bar ), the conjugate vector space (see Hilbert spaces in http://en.wikipedia.org/wiki/Complex_conjugate_vector_space )
Regarding the non-associativity, in that respect the current behaviour, where v'*v
doesn't produce a scalar but an array, is actually more consistent, since then both A*(v'*v)
and (A*v')*v
produce an error.
The "Linear Algebra" side of things can be additionally resolved into different choices of how to represent vectors and covectors:
I wonder if it is strictly necessary to also conflate (scalars, 1-vectors, and 1x1-matrices) in the "no true vectors" world; @alanedelman and I were discussing this yesterday and in numerical linear algebra the commutativity of vector * scalar and scalar * vector is used everywhere, yet the vector * scalar product is the one that doesn't care if it's doing (N,) * (,) or (N,1) * (1,1).
1) At the end of the day, the most important things are A)ease of use and B)performance.
2) The two worlds should be able to coexist in complete harmony. When doing engineering work, my opinion is that there is nothing more intuitive and easy than tensor notation. One suggestion, if I may, might be to enable an environment flag or a package could be imported at the beginning of a program. This would allow for ease of use and straightforward programming logic. Isn't this possible or must we choose one overarching schema?
It seems like there are two options
1) enable import of a toolbox, patch or environmental flag to work concurrently
2) build a language that is capable of unifying the special worlds, yet still be easy to use and fast
I'm not sure how promising this is but I just stumbled upon a potentially interesting area of research.
May I direct your attention to:
Geometric Algebra research group
http://www.mrao.cam.ac.uk/~clifford/pages/introduction.htm
Lecture Course in Geometric Algebra
http://www.mrao.cam.ac.uk/~clifford/ptIIIcourse/course99/
Physical Applications of Geometric Algebra
http://www.mrao.cam.ac.uk/~clifford/ptIIIcourse/
A unified mathematical language for physics and engineering in the 21st century
http://www.mrao.cam.ac.uk/%7Eclifford/publications/ps/dll_millen.pdf
Geometric Algebra
http://arxiv.org/pdf/1205.5935v1.pdf
Foundations of Geometric Algebra Computing
http://link.springer.com/book/10.1007%2F978-3-642-31794-1
Geometric Algebra Computing
http://link.springer.com/book/10.1007%2F978-1-84996-108-0
After looking into this a little bit more, it seems really amazing.
As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each mutual forces, and have marched together towards perfection.
Imagine needing glasses and not knowing you needed glasses. Then, when you do get glasses, the world changes unexpectedly. GA is like glasses for the inside of your brain.
This guy seems to have the right idea...https://github.com/wolftype/versor
Typical matrix operation libraries have templated inlined functions for Vector and Matrix multiplication. Geometric Algebra combines many other maths (matrix, tensor, vector, and lie algebras). Versor is similar, but on steroids, where vectors and sparse matrices of various sizes are all just called multivectors and represent geometric elements beyond just xyz directions and transformation matrices. Circles, lines, spheres, planes, points are all algebraic elements, as are operators that spin, twist, dilate, and bend those variables. Both these elements and operators are multivectors which multiply together in many many many different ways.
This video goes into some good detail about Versor, the programmed GA math language.
https://www.youtube.com/watch?v=W4p-e-g37tg
These are the slides for this. https://github.com/boostcon/cppnow_presentations_2014/blob/master/files/generic_spaces.pdf
You can do some amazing tensor computations, really easily, optimized for speed when you compile.
@esd100, I think it would be helpful to keep discussion on this thread focused to the specific issue in the title.
@johnmyleswhite I did a search for the term "tensor" and it was mentioned 171 times (now 172 times). While I agree with your statement, in general, this thread has 157 comments (now 158), some of which deal directly and indirectly with the title of the original post (Taking vector transposes seriously). I think that my post deals indirectly with the title of the original post by offering increased perspective of what can be done with new applications of tensor mathematics, via geometric algebra. My opinion is that building the kind of higher-dimensional power that Versor has into Julia would be an added beneficial feature of Julia. The title of the youTube video was "Generic programming of Generic Spaces: Compile-Time Geometric Algebra with C++11". I don't see why that couldn't be Julia instead of C++. Then again, I am not a mathematician or computer scientist.
@esd100 Geometric algebras are interesting, but are not the most general thing this issue covers. The geometric algebra we would be interested primarily is the real Clifford algebra Cl(R^n, I); however, we would also be interested in other Clifford algebras which are _not_ geometric algebras, such as Clifford algebras like over complex vectors Cl(C^n, I) or even algebras over arbitrary fields or noncommutative rings Cl(F^n, I). Furthermore, we do not even want to restrict ourselves necessarily to Clifford algebras, but want to consider how linear algebra generalizes to the more general tensor algebra setting where quadratic forms (inner products) are not necessarily defined.
In the tensor algebra setting, the "v'
is a no-op" proposal in the OP makes perfect sense, since it generalizes consistently to the reversal antiautomorphism. However, this is only one of at least three proposals on the table. One could also consider bialgebraic extensions that lead to tensor coalgebras where the "v' produces a covector" proposal is very natural. I'm not sure how the "no true vector" proposal generalizes into a tensor algebra though.
@jiahao Thanks for your very informative response. It is nice to have someone with some mastery of the subject shed some light. I'll be interested in looking more into those topics to get a better understanding. I am curious about why super optimized code for BLAS exists, but super optimized code for more complex algebras, like Clifford Algebra doesn't.
I'm not sure if this discussion is still ongoing, but I wanted to share my opinion because these issues are some of the most important (and annoying) parts of me using Julia. Some of the above I agree with and some I do not.
Basically, I think we need to realize this: All Julia users will want fast, multidimensional arrays (for data storage) as already implemented in Julia. Many/most users will also be interested in linear algebra, and the aforementioned arrays are a convenient way of packaging the data contained in vectors and matrices. A few users will want to do generic tensor (multilinear) algebra.
The question is how to extend these "bare" arrays to the mathematical concepts of linear algebra? How do you make the syntax look like normal mathematics? Will people be comfortable that v'' != v? Etc, etc.
The first thing is, we don't want to make Arrays any worse, just so we can do linear algebra. We don't want to add additional data to Vector or unnecessary template arguments to Array. We want to be as fast as C/C++ when not doing linear algebra (and hopefully as fast as C/fortran, when we are).
The second thing we should all realize, is that full multi-dimensional tensor contraction requires significant amounts of extra information. For up-to 2 dimensional tensors write any multiplication like A_B'_C*D... while for general tensors it's more natural to think of a graph to define the contraction (a tensor network diagram, or a factor graph - the same thing goes by many different names). For high dimensional tensors it makes sense to wrap the bare arrays and decorate them with vector spaces, etc to keep track of everything. This can be done in packages such as those which Jutho (and possibly others) are working on. There is no point considering the meaning of ' on 3+ dimensional tensors - nobody would be interested in using that function when permutedims exists, or if they are using a dedicated tensor package.
The core users will need to multiply matrices, add vectors and so-forth. They will want to transpose matrices. They will also want a inner-product and outer-product. Like-it-or-not, these are defined in conjunction with a dual space. We know what these are for real numbers and complex numbers, and have them pre-defined. And yes, its true that the dual space of a real finite vector space feels like basically the same thing, and I believe this is what is clouding the whole issue. The biggest current problem is that we don't have a proper dual vector. Without it, we can't "write" equations in Julia as we do on pen and paper - a huge problem! Even more importantly, we don't have v'' = v. This is very unintuitive.
People will only want or need to create a dual vector for performing inner or outer products. A simple decorative wrapper for telling the compiler what to do when it next encounters * is a sensible solution and should have zero run-time overhead. I think these dual vectors/covectors should be indexible, addable/subtractable, multiplied by a scalar, multiplied by a vector (returning a scalar) and multiplied by a matrix (returning a covector) - and that's about it! I wouldn't even explicitly perform the complex conjugation for complex vectors - that could be built into the dot product, outer product, indexing, etc for speed/memory improvements.
I'm not concerned that we are adding complication to the type system. I believe this is necessary to encapsulate mathematics as the users learnt it in high school and university: to have associative * (where well-defined), to have v'' = v, to have both "bras" and "kets" (I'm using Dirac notation of linear algebra here), etc.
BTW if this stuff has been implemented in Julia 0.4, I'm oblivious! I'm still working on understanding 0.3.4...
@andyferris, I generally agree with all of this. I think that my proposal here could be modified slightly and work well: instead of M*v'
being an error, have it produce a covector and instead of v*M
being an error, have it produce a vector. To me, this conceptually amounts to M
being agnostic about whether its dimensions are up or down – you can write v'*M*w
or v*M*w'
for an inner product and either one will work.
One thought that's been bouncing around is having row-major and col-major arrays and having M.'
just change from one to the other and similarly have v'
be the row-major variation on v
– which has the same shape but is conceptually stored in the other order. Not sure about this though.
+1 to @andyferris . I also think we just want simple wrapper types Transpose
and ConjTranspose
that allow us to use the concise matrix algebra syntax for writing products of vectors, matrices and transposes thereof. With respect to point 2 of the proposal of @StefanKarpinski , I would not restrict these wrappers to be a container of AbstractArray
objects and I would not make the types part of the AbstractArray
type hierarchy itself. Transpose(x)
should just be the type that results from writing x'
in an expression and allows to defer its evaluation / have lazy evaluation, depending on the rest of the expression, by dispatching on Transpose
for the rest of the operations in the expression (which will probably be the multiplication operator *
in 99.9% of the cases). However, people should be able to also give this syntax a meaning for new types which might not be part of the AbstractArray
hierarchy, such as the matrix factorization objects or other types for defining e.g. linear operators.
For the specific case of matrices and vectors, I don't really see the use case for M*v'
, but I am not intrinsically opposed to it. The important thing is that it fulfills the basic expectations (i.e. rules 2,4,5,6 and 8 of the end of Stefan's proposal ).
The main final point of discussion is probably the interplay with slicing. Should a row slice of a matrix automatically be a Transpose
? My vote here is on APL slicing rules where this is not the case.
@Jutho, the motivation is to make *
associative – A*(v'w)
and (A*v')*w
both work and produce the same results, modulo non-associativity of the underlying element type (e.g. floating-point).
In order for (A*v')*w
to give the same result as A*(v'*w)
, A*v'
would need to be an N=3
array. Is that what you had in mind? I completely missed this.
OK, very interesting, I have a few comments.
First we need to address the core users. Basically, first-and-foremost using Array{T,n}
as n
-dimensional storage is its primary function. It has to make sense to programmers who have no idea about linear algebra but who want to manipulate data in Julia. Because of this, under no circumstance can an array slice return a co-vector! This is a strictly linear algebra concept that applies only to certain data types T
that can define a vector space and its dual (e.g. numerical data or "fields").
I could go either way with APL slicing. It seems intuitive enough. It is type stable and doesn't do anything surprising. Though I don't think it is necessary to make this change to make linear algebra self consistent... it might be nice (though it may be a breaking change). I find it unsettling that M[1,:]
is size 1xn and M[:,1]
is size n (not nx1)... it seems too asymmetric... but I guess it is a good reminder about how things are laid out in memory. Anyway.. this change should only be made if it makes sense for abstract data storage and manipulation. To me this change is orthogonal to the linear algebra discussion.
So, even if we don't implement APL slicing rules, we can still salvage meaningful linear algebra quite straightforwardly. If you want the i
th column vector of M
call colvec(M,i)
and if you want the i
th row covector of M
call rowvec(M,i)
. If i
was a tuple or vector it could return a tuple or vector or (co)vectors (could this be useful for reasoning about parallelization in some cases?). If you would prefer a matrix, just use the normal indexing notation. If we use APL slicing rules, the same thing is very useful for distinguishing the action of row and column slices with the *
symbol. (One needs to consider how complex conjugation behaves with rowvec
).
That way, if you are doing linear algebra everything will make sense and the user won't have to worry about what particular slicing rule Julia implements. The operator '
would only act on vectors and covectors (to change the decoration) and on matrices (either in a direct or lazy fashion). It may be easier to remember how to extract basis vectors from an eigendecomposition, for which I always seem to forget.
For *
, I think that matrix/vector algebra in Julia should work so it looks and feel like mathematics as the multiply operator and never like the tensor product operator between two vectors, two covectors, two matrices, etc. We shouldn't allow M*v'
, because strictly in mathematics you need to write the equation with the "\otimes" symbol (the times sign inside a circle) not the standard multiply symbol. Using the multiply symbol it's meaning is ill-defined! We can't have w*M*v' == v'*M*w
because matrix multiplication is not commutative. Again for M*v'*v
it doesn't make sense to talk about associativity of *
. You can either interpret this as innerproduct(outerproduct(M,v'),v)
which requires two different multiply symbols, or as scalar multiplication M * innerproduct(v',v)
where it _does_ make sense to use *
for both. With this expression we might require bracketing, or perhaps the compiler could search for a meaningful order of operations (note here that the fastest order of evaluation is also the what I consider the only valid order of evaluation) .
With vectors, covectors and matrices we can have an algebraic system that is consistent with standard mathematics. Whenever you want to perform the outer product between two vectors, or two covectors, or two matrices, you are inherently moving to multi-linear algebra, or generic tensor algebra. Here you could possibly have matrices and vectors that multiply like kron(M,N)
using a new symbol. Or you could require users to use a full multi-linear algebra package. Or whatever... it seems beyond the scope of the original question (which was 14 months ago, btw...)
So to summarize, I see four almost completely distinct things happening here:
Array
s of arbitrary data using APL slicing rules.transpose
, conj
, etc by using wrapping types much like covector, but for matrices as well.Possibly only the first will be a breaking change? The others improve on or add to current functionality. They can all be implemented more-or-less independently, and they are probably all worthwhile things to do. (PS if you do want APL slicing, I recommend doing it soon, before Julia gets too "big" and it breaks too many packages, etc).
Yeah, sorry, my suggestion of allowing M*v'
doesn't really make any sense since the different ways of associating products produce completely different shapes. So, I think if we're going to change this my original proposal is the way to go. So far that seems to be the best combination with APL slicing behavior. Of course, whether we want APL slicing behavior or not is a higher level consideration.
I just wanted to say a few things and please take them with a grain of salt. They are just opinion and I know mine don't carry much weight. However, one thing struck me as I was reading @andyferris comment. I don't think I agree with the comment that it has to make sense to programmers who have no idea about linear algebra but who want to manipulate data in Julia. I don't think programmers are the audience that mathematical programs should be written for. If a programmer wants to manipulate data, there are many languages that will allow them to do that. Julia is a language for computing and it should allow sophisticated technical computing.
@esd100 True, I did think of that. But with Julia, I thought we wanted to be _greedy_... wanted to cater to many purposes, and to excel at many things. There may be genuine scientific reasons for manipulating non-numerical data. There may be current or ex-scientists who are Julia users that want to do more general programming. My earlier point was that we shouldn't make Array
slower or take more memory by giving it an extra field to talk about transpose/conjugation.
If one does APL slicing, then either row or column slices should return the same object. The question was: without APL slicing, what is the best thing for M[1,:]
to return? Well, if M[1,:,:,:,:]
returns an Array{T,n}
, I feel it would confuse everyone if M[1,:]
returned a covector{T}
. What if we let M = zeros(3,3,3)
and call M[1,:]
, which currently returns a 1x9 matrix? Should we make this a covector too? What if M
was Array{String,3}
?
To me this would seem quite surprising and confusing. It's also not consistent with APL slicing changes, if that were to happen in the future. Thus, I was suggesting adding new functions for linear algebra purposes, rowvec(M,i)
and colvec(M,i)
. It's not ideal, it adds new functions... but at least it is clear what they should return for linear algebra purposes. It was the only thing I could think of that had nice properties for generic arrays and nice properties for linear algebra (along with a covector type), while not attempting to cater to multi-linear algebra. If anyone has a nicer notation for extracting covectors from matrices that would be nice too!
@esd100: I am pretty sure that programmers are the audience that Julia is designed for. Scientific computing is Julias strength but there are many Julia users that use it just as a general purpose programming language.
I agree with @andyferris that one has to split the tensor requirements from the container requirements.
Without having read the entire thread my personal issue is that if I have a 3D array A
(e.g. tomographic data) and do
imagesc( data[1,:,:] )
this does not work. IMHO data[1,:,:]
, data[:,1,:]
, and data[:,:,1]
should be 2D arrays (or subarrays). Currently I use a self-defined squeeze
function to overcome this issue.
Again, this is just my opinion and a very unimportant one, given that I am far from the action. I feel like when an application is developed it should have a set of design principles which guide it. The builders must send a clear, unified message about its purpose and its identity. If developing a fast, intuitive, sophisticated, technical computing platform is the purpose and that is Julia's strength, then stick to it. Don't send mixed signals by trying to make it's purpose fit to the audience of general programmers.
The point is that even for purely scientific or mathematical applications, there are several conflicting interpretations, e.g. there are several mathematical objects that you could represent using vectors and matrices, and hence one shouldn't be making too specific choices.
It's better to have dedicated packages to satisfy the needs of specific disciplines following a specific set of conventions.
@jutho My gut feeling is that you are right, but I wonder what the basis is of your conclusion that packages are "better" and in comparison to what alternatives?
This is quite simple. Functionality that is important for most of the Julia users belongs in Base. Functionality that is more special belongs into a package.
Of course it is not very easy to draw a line here. But the best way to get something into base is to create a package so that many people can test this. Various core functionality that has landed in Base was first developed in a package.
@esd100, I don’t fully understand your question. In the end, what a scientific language should provide is the data structures and methods to represent and compute with typical objects used in science. But certain data structures might be useful to represent different mathematical structures, and sometimes different representation might be convenient for objects of the same type. Hence, associating a fixed mathematical structure to a data type might be too restrictive for some disciplines, and too overly complex for other disciplines. So this should not be pursued in Julia base, but by specific packages that only try to serve the needs of one discipline.
With respect to the current discussion, everybody working with vectors and matrices will expect the possibility to transpose a vector, and to have v^T * w = scalar. But these vectors and matrices might be used to represent a number of different things, and this will probably depend on the field / discipline. I gave examples of where v^T might not be an actual covector in posts above.
On 11 Jan 2015, at 18:10, esd100 [email protected] wrote:
@jutho https://github.com/jutho My gut feeling is that you are right, but I wonder what the basis is of your conclusion that packages are "better" and in comparison to what alternatives?
—
Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4774#issuecomment-69501771.
This is quite simple. Functionality that is important for most of the Julia users belongs in Base. Functionality that is more special belongs into a package.
I don't think it's quite so simple. For instance, there are Bessel functions in Base, and it's highly unlikely they'll be important for most Julia users. The reason they can be in Base is that there's one reasonably universal standard for what the Bessel functions are, and nobody's likely to use those names for something else, or expect them to do something different.
Through careful (some might say verbose) naming conventions, Mathematica is able to put over 4000 functions in the core language, and I find it extremely convenient to almost never have to load a package. In contrast, when I'm using Python/Sage, it's not unusual for one file/notebook to explicitly import 200 functions at the top (avoiding the more untraceable "from ___ import *" syntax). Every time I need to use a function that's not a built-in, I have to go through a bunch of extra steps:
(1) Try to remember if I've used it already in that file, and/or do a text search to confirm (restricting to whole words if the name's a substring of something else)
(2) Either: (a) add the import right away, losing my train of thought and finding it again; or (b) try to remember to add the import after I've done whatever I was doing, keeping another thing in memory
(3) For less common functions, possibly have to look up which package it's in
This can get annoying and debilitating. So I think that, like the Bessel functions, anything that's considered standard---and thus won't cause name conflicts or confusion---should be in Base, regardless of whether lots of people use it. Certainly linear algebra.
Back to topic, we've been talking about several layers of functionality---going from arrays as semantically bare containers (as implemented in Base.AbstractArray
), to Cartesian tensor objects with up/down semantics (as described by my AbstractTensorArray
proposal, to more general tensor objects with indices mapped to vector spaces (as in @Jutho's TensorToolbox)---of increasing generality and decreasing optimization potential.
I think it's important for users to be able to transition easily between these layers, not least because users _may not know_ what level of generality they actually need when they start out. As a simple example, @Jutho pointed out that in @jdbates's image processing example, users might very well want to associate different subsets of indices to distinct geometries, so as to process image coordinates in one way and color space in another way; but if they had started out using just one of these geometrically, it should be convenient to upcast to a more general representation that allows using each with its own geometry. If the additional functions (or function call patterns) that kick in at each level of generality use sensible defaults---for instance, upcasting up/down indices and neutral indices in AbstractTensorArray
s to single default Cartesian vector spaces and "sequence" spaces, respectively---then it becomes almost seamless. Users of a lower layer of functionality need not know or care about the higher layers, until they need them.
Any part of this for which it's clear and predictable---to the relevant users---what the hierarchy should be, could reasonably go in Base. The real question should be---whether for array operations, linear algebra, or tensor algebra---how likely is it that a user of this type of functionality would expect or want it a different way?
Plenty of people dislike the ridiculous granularity of how many imports you need to get anything done in Python-land, I don't think anyone is proposing to go to quite those lengths.
But there is an important argument that large numbers of dependencies and library bloat are undesirable for some use cases, Julia the language really shouldn't require having Fortran runtime libraries installed on your computer but at the moment the Julia standard library does, whether or not the code you want to run is doing any linear algebra (or Bessel functions, or FFT's, etc). This is all well-covered by #5155 and other issues though - "installed by default" and "minimum necessary for any functionality" should eventually be separated out into different sets of modules.
Thanks Tony, I second that. Further, with our package system it is really no problem to have "toolbox" like packages which group several together. With the @reexport
macro this works well.
But there is another point. Within a package it is much easier to push an idea forward. If one wants to change something one simply does it. Within Base one is much more restricted.
It seems that while having a patchwork of packages does allow for greater independence, it also creates a fragmented platform which is more difficult to navigate. It seems that using a more unified, generalized structure would streamline and facilitate cross discipline interaction and exploration of new domains.
What's currently the most idiomatic way to do vector * matrix?
E.g. say I've got X
with shape (K, N)
and b
with shape (K,)
, and I want to multiply by b
on the left to get a vector of length (N,)
.
Do I call BLAS.gemv('T',1.0,X,b)
Or reshape(b'*X,size(x,2))
Both are a bit ugly.
I guess you could do X'b
2015-03-13 17:15 GMT-04:00 joschu [email protected]:
What's currently the most idiomatic way to do vector * matrix?
E.g. say I've got X with shape (K, N) and b with shape (K,), and I want
to multiply by b on the left to get a vector of length (N,).
Do I call BLAS.gemv('T',1.0,X,b)
Or reshape(b'*X,size(x,2))
Both are a bit ugly.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-79405868.
Yeah, I had thought that would trigger a copy of X.
But @time seems to indicate that there's no copy, based on memory allocation
julia> X = rand(1000,1000); y = rand(1000);
julia> @time y'X;
elapsed time: 0.00177384 seconds (15 kB allocated)
julia> @time X'y;
elapsed time: 0.000528808 seconds (7 kB allocated)
We do fancy parsing of ' so X'y
ends up as Ac_mul_B(X,y)
and makes the
same BLAS call you suggested.
2015-03-13 17:28 GMT-04:00 joschu [email protected]:
Yeah, though I had thought that would trigger a copy of X.
(But @time https://github.com/time seems to indicate that there's no
copy, based on memory allocationjulia> X = rand(1000,1000); y = rand(1000);
julia> @time y'X;
elapsed time: 0.00177384 seconds (15 kB allocated)julia> @time X'y;
elapsed time: 0.000528808 seconds (7 kB allocated)—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-79421713.
Believe it or not, a writeup of this entire thread is getting closer to materializing.
One final point: has anyone considered that .' could actually be a totally different animal from '? The latter has the right structure for being a dual, but .' does not if the underlying field isn't the real numbers. Is it crazy to assign .' the "reverse all indices" notion of transpose and reserve all notions of duality to ', which produces a "funny row vector"/Hermitian conjugate?
I think that in any case conj(transpose(x))
should be equivalent to ctranspose(x)
.
As argued above, I hope that non of the wrapper types that transpose
and ctranspose
will create will be given a name containing a specific mathematical concept such as dual
. A scientific language such as Julia should provide a set of useful data structures and should implement common operations, but I think it would be a mistake to associate a strict mathematical structure to it, as this will be not appropriate for some applications and too complex for others. It is up to the user to use these data structures and operations to implement the mathematical operations in his/her application. There are certainly applications for z.' * A * z
returning a scalar where z
is a complex vector and A
some matrix, even though this has nothing to do with an inner product and/or dual vectors.
@jutho could you give a use case for z.' * A * z
?
If z1
and z2
are the representation of two complex vectors Z1
and Z2
(e.g. tangent vectors in the holomorphic part of the tangent space of a Kähler manifold) and a
is the matrix representation of a tensor A
with two covariant indices (e.g. a complex (2,0)-form of the Kähler manifold) then A(Z1,Z2) = z.' * a * z
.
Note that I stress here that the julia objects z1
, z2
and a
only form a _representation_ of certain mathematical objects (with respect to a chosen basis/coordinization) and therefore operations on data structures cannot be uniquely associated to mathematical operations without knowing what these data structures represent.
@jutho thanks. Your point about representations is well taken, and has been represented many times in this discussion. I'm trying to find the minimal interface of vectors and matrices, and see whether that minimal interface is in anyway fundamentally incompatible with the minimal interface of arrays, and what we can offload to user-defined abstract data types.
At this point I'm totally in favor of the @StefanKarpinski proposal, also mostly echoed by @andyferris above. In particular
Everything else is a detail. It might be nice to add row(M,i)
and col(M,i)
functions. Otherwise, to extract a row as a covector I guess you would need M[i,:].'
? IIUC, M[i,:]'
would do a conj
not wanted in this case?
Yes, I should add that I didn’t fully appreciate the niceness APL-style indexing last time I wrote but now I am in full support of making that change. This in turn makes the dual vector stuff even more compelling, and the row
/col
functions.
Since then I tried playing around with an implementation and the single-most confusing thing was whether the covector extracted from a matrix was conjugated or not…
I think you want to take the literal values of the matrix. Using Dirac notation, expand (any) matrix M = sum_i |i> row(U,i)' = col(U’,i)
which makes perfect sense for extracting left and right eigenvectors from an eigendecomposition.
Andy
On 15 Mar 2015, at 9:36 pm, Jeff Bezanson [email protected] wrote:
At this point I'm totally in favor of the @StefanKarpinski https://github.com/StefanKarpinski proposal, also mostly echoed by @andyferris https://github.com/andyferris above. In particular
APL style indexing
v' gives some kind of Covector or Transpose type
Everything else is a detail. It might be nice to add row(M,i) and col(M,i) functions. Otherwise, to extract a row as a covector I guess you would need M[i,:].' ? IIUC, M[i,:]' would do a conj not wanted in this case?—
Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4774#issuecomment-81228816.
Any volunteers to start working on this? e.g. @mbauman, @jakebolewski who I'm surprised to see aren't in this thread already :)
Finding everything that needs to change might be tedious, but the basic change to indexing behavior should not be too bad. Probably @jiahao and @andreasnoack can say more about how Covectors should be incorporated, e.g. what their supertype should be if anything.
We need 9, no 8 more comments before we can go ahead with this.
I can help with that.
we are pretty close
As a relevant comment, if there will be a Transpose
and CTranspose
wrapper type, should their also be a plain Conjugate
wrapper type, such that conj(A)
is also lazy. For multiplying matrices with BLAS this is not really useful since there is no special support for that (and C
in BLAS means hermitian conjugate), but if there ever is a full Julia BLAS implementation it would be great to also support conj(A)*B
without explicit conjugation.
I, for one, feel like I now take vector transposes much more seriously than I did before.
Perhaps @andreasnoack and @simonbyrne can tell us if #6837 ought to be revisited.
I agree with @simonbyrne that we should not have Transpose{Array} <: AbstractArray
.
Other general thoughts:
u
has size (n,)
, u * u'
is a computation involving (n,) x (1, n)
. This product cannot be computed using the ordinary rules of matrix multiplication _unless_ we automatically reshape the first argument to be (n, 1)
.(n,)
are semantically identical to reshaped arrays of shape (n,1)
and shape (n,1,1)
, etc. But note that if transpose reverses all dimensions, then the resulting arrays have shape (n,)
, (1, n)
, and (1,1,n)
, which _cannot_ be equivalent if you are only allowed to add trailing singletons. Taking this to the logical extreme, the transpose of an array can have an arbitrary number of _leading_ singletons and therefore has ambiguous shape, which is logically inconsistent.I also did a literature survey and uncovered some interesting APL history. What we have referred to as the APL indexing rule was not in Iverson's 1962 book, but did exist in APL\360 (1968; the first implementation of APL). However, APL\360 conflated scalars and 1-vectors, an inconsistency that was not recognized in the literature until (Haegi, 1976). A discussion of the formal semantics of multidimensional arrays first appeared in Brown's PhD thesis (1972; he later architected APL2), and spurred a line of work formalizing their semantics.
An excellent survey of developments leading to APL2 is:
Notable in the literature for the attention paid to indexing rules are:
Also, without the "can add trailing singleton dimensions" rule, we do not obey the identity
since the left hand side is a scalar (by definition of trace) and the right hand side has shape (1, n) x (n, n) x (n,) = (1,)
. Conversely, we can take this identity as a guiding principle for selecting vector transposition semantics. The main point is the cyclic property of the trace operation that defines the first identity. The quantity inside the trace must be either a scalar or a matrix (the trace of a vector is not defined). Avv'
is already unambiguously a matrix: (Av)v'
and A(vv')
produce the same result. But also from the second quantity,v'Av
must be a matrix _or_ a scalar. (If scalar, the second identity also holds.) v'Av
can be a 1-vector only if the "can add trailing singleton dimensions" rule is active, in which case it can be transparently reshaped to a 1x1 matrix.
Thus if we want the trace to be defined, then it necessarily imposes restrictions on the allowable shapes of outer products vv'
and of quadratic forms v'Av
.
@jihao: I'm not quite sure what you are arguing for our against. Can you state the "can add trailing singleton dimensions rule" a bit more clearly? When does it apply? Do you think that we should support it?
I think that some of your arguments can be taken to strengthen the position that we cannot think of transposition as reversing all dimensions (a column vector would be transposed into a column vector, or possibly an array with arbitrary number of leading singleton dimensions). To remain consistent with matrix algebra, I think it has to be taken as swapping the two first dimensions instead. Then I believe that some of the contradictions you state disappear. Hopefully, the rest disappear if we don't add any singleton dimensions when transposing (the absent first dimension of covector doesn't count).
My general comments above are not to advocate these rules, but rather to help delineate the design space of array indexing rules and their interaction with linear algebra identities.
The "can add trailing singleton dimensions" rule is used by MATLAB and (according to @alanedelman) was introduced to support multidimensional arrays. The index-to-offset calculation used in MATLAB arrays is defined by sub2ind
, which returns the same linear index regardless of how many trailing 1s you throw at it. Furthermore, MATLAB's matrix indexing documentation states that for indexing operations:
The number of subscripts specified for B, not including trailing subscripts equal to 1, does not exceed ndims(B).
More formally I think it can be stated as:
For operations taking arrays as input,
(n,)
-arrays,(n,1)
-arrays,(n,1,1...)
arrays are all in the same equivalence class with regard to being valid arguments to these operations. (Ifn=1
, then the equivalence class also includes scalars.)
Examples:
A*b
and A\b
where A
is a matrix and b
can be a vector or a matrix (identical semantics for n x 1
matrices),hcat(A, b)
where A
is a matrix and b
can be a vector or a matrix For the most part, Julia does not have this "can add trailing singleton dimensions rule" rule, except for the outer product example. Possibly there are others as well but I cannot think of them right now.
As long as Transpose{A<:AbstractArray}
is not a subtype of AbstractArray
I think I don't see a problem (but probably I am overlooking something as I haven't given this as much thought as you guys):
with
typealias AbstractVectorTranspose{A<:AbstractVector} Transpose{A}
typealias AbstractMatrixTranspose{A<:AbstractMatrix} Transpose{A}
typealias AbstractTMatrix Union(AbstractMatrix, AbstractMatrixTranspose}
(and similarly for CTranspose
) we can have
AbstractVectorTranspose * AbstractVector = Number
AbstractVector * AbstractVectorTranspose = AbstractMatrix
AbstractVectorTranspose * AbstractTMatrix = AbstractVectorTranspose
AbstractTMatrix * AbstractVector = AbstractVector
AbstractTMatrix * AbstractTMatrix = AbstractTMatrix
The only open question is whether AbstractVector * AbstractTMatrix
should be supported when AbstractTMatrix
has 1 as first size, or whether AbstractVector * AbstractVectorTranspose
is sufficient.
Also, the new type system might help with expressing some of those typealias
and union
s a bit more carefully.
Also, if e.g v.'*A
is calculated as (A.'*v).'
, then the need for a Conjugate
wrapper pops up if A
itself is e.g. A=B'
.
I agree with @simonbyrne that we should not have
Transpose{Array} <: AbstractArray
.
Can you elaborate there? I thought the opinion in https://github.com/JuliaLang/julia/issues/4774#issuecomment-59428215 was that CoVector should not be a subtype of AbstractVector, but it would seem a little strange to me not to have Transpose{Matrix} <: AbstractArray
.
For what it's worth, I think that CoVector
should mostly behave like Vector
except that Vector
converts to Matrix
as a column matrix while CoVector
converts to Matrix
as a row matrix.
I guess that would mean indexing into a covector should work the same as into a row matrix?
That's an interesting thought. Do things get easier or more complicated if only _leading_ singleton dimensions are dropped in transpose/covector types?
(I've been following this issue with interest but my linear algebra is rusty enough that I've not felt qualified to contribute).
@mbauman:
Do things get easier or more complicated if only leading singleton dimensions are dropped in transpose/covector types?
If you allow an arbitrary number of leading singleton dimensions, then array indexes are no longer well-ordered and there is no such thing as "the" first dimension, which is bizarre enough that we might as well take well-ordering as an axiom.
@tkelman:
I thought the opinion in #4774 (comment) was that CoVector should not be a subtype of AbstractVector, but it would seem a little strange to me not to have Transpose{Matrix} <: AbstractArray.
It's become clear to me that this issue is entirely about separating array semantics (c.f. #10064) from linear algebra semantics, and looking for places where are intermixed.
If we define the cat
functions to be part of the essential interface of an AbstractArray
, then Transpose{<:AbstractArray}
is clearly not an AbstractArray
because their concatenation behaviors are different. If we consider only shape and indexing to be part of the essential interface then the situation is less clear.
If we require concatenation as part of the essential interface of an AbstractArray
, then it is also easier to justify why types like SymTridiagonal
are not AbstractArray
s, since concatenation operations over SymTridiagonal
s like [SymTridiagonal(randn(5), randn(4)) randn(5)]
are currently undefined.
@toivoh:
I guess that would mean indexing into a covector should work the same as into a row matrix?
There are identities that suggest that indexing behaviors of Transpose{Vector}
should be the same as ordinary Vector
s. Consider that for numeric types, v[1]
produces the same result as v' * e₁ = v ⋅ e₁
and v[1:2]
produces the same result as v' * [e₁ e₂]
, where e₁
is the canonical basis Vector{Int}
[1, 0, 0, ...]
and e₂
is [0, 1, 0, 0, ...]
. If we require these identities relating indexing, inner products and transposition to hold, then one could claim that
(v')[1] == (e₁' * v'') == (v' * e₁)' == (v ⋅ e₁)' == conj(v ⋅ e₁)* = conj(v[1])
(where the first step is a new axiom and the fourth makes use of the fact that transposing a scalar is a no-op) so that indexing into Transpose{Vector}
would essentially ignore the transposition, and indexing into CTranspose{Vector}
would conjugate the indexed elements.
me that this issue is entirely about separating array semantics (c.f. #10064) from linear algebra semantics, and looking for places where are intermixed.
- array semantics are defined by functions like size, length, getindex, setindex, hcat, vcat, reshape, rot90...
- linear algebra semantics are defined by functions like +, - , *, /, , ', trace...
+1 to that point of view and to not have Transpose <: AbstractArray
. Also, when indexing a covector, it should be with a single index, since otherwise the result of covector * vector (contracting over a single index) could not result in a scalar (an object with zero indices).
@jihao: I'm not sure I see why we need or want
(v')[1] == (e₁' * v'')
as a new axiom. Though even if a covector would index as a row matrix, I think we would get the same result for the above due to linear indexing.
And +1 to seeing covectors as concerned with linear algebra.
But there's no reason that concatenation with a SymTridiagonal
should not be defined, right?
@toivoh linear indexing for an array is defined by first reshaping an array into a vector, then indexing the new vector. So for linear indexing to give the same result for transposed arrays, the indexing rules for transposed vectors must first be defined so that we may derive the linear indexing rules from that. (Also, you're pinging someone else.)
I thought linear indexing was about storage order, and there's really no other sensible order for linear indexing of a vector anyway, right? (And sorry about the misspelling.)
There is no unique sensible traversal order in memory. Even for Fortran arrays you could choose to store the elements in column major, row major, or even reverse column major order (which is exactly what the original IBM Fortran I compiler did). Furthermore there are other data structures (see #10064) like tries which can be used as arrays and have even more options for traversal order.
The same could be said about vectors and matrices. Since linear indexing accesses the elements in the same order for a column matrix and its transpose (and the same as for a column vector), why should a covector be different? If it should be different, I think it should be that we wouldn't define indexing for covectors at all.
@toivoh yes the same definitions hold for ordinary arrays. That does not change the fact that linear indexing is derived from ordinary (tuple?) indexing in either case.
Indexing Transpose
objects could be disallowed. I'm not saying they have to be indexable, but if they are, they don't necessarily have to have the same indexing behavior. To be conservative, we can leave indexing undefined for now and see if any use cases show up.
Many pure Julia implementations of linear algebra functions will want to index into a Transpose, no? Writing pure Julia matrix multiplication (for non-BLAS number types) becomes easy if you do not have to distinguish between any possible case (normal, transpose, ctranspose, conj?) for the two matrices involved but just treat them as ordinary matrices. Cache-oblivious methods could be tried to obtain a somewhat local memory access pattern.
Right, duh.
@Jutho: I agree. And to fit in there, covectors should index like row matrices, right?
@toivoh , if you mean that they shoud have an extra index 1 in front, I disagree and don't see how that is implied by my statement. I was only talking about matrix matrix products. Matrix * vector or covector * matrix are different methods that require different function definitions, not only because they have a different memory access pattern, but also because they have a different return type (Matrix_vector = vector or covector_matrix = covector), so ther is a very practical reason in Julia not to mix these things.
In general, I am not a big fan of the abilty to add extra indices 1 when indexing a N-dimensional array, or of e.g the VecOrMat
type alias. This allows for sloppy programming, but that's also why it facilitates to make mistakes or to detect other mistakes slower. I only see two useful ways to inex an N-dimensional array, that is with exact N indices, in case you ar using it as a multilinear object, or with a linear index, when you are treating it as a vector in a tensor product space (e.g. for adding two such arrays or multiplying it with a scalar). While that is sufficient for my use caes, I can accept that that is to limited for others.
@Jutho: Ok, I agree that it probably doesn't matter since the return type has to be different anyway.
Here's an attempt at describing what we are trying to do and give some axioms:
I think we have landed on a pretty clear consensus that the starting point is matrix algebra (which is based solely on matrices). For most operations, we know how they behave in the pure matrix setting.
I believe that what we are trying to do is to extend and refine the pure matrix setting in a consistent manner to have also true scalars and vectors, and because it seems to be needed for consistency, covectors.
Here's my view of scalars and vectors (as seen from the pure matrix point of view): A scalar is a matrix, which by its type is constrained to be 1 x 1. A vector is a matrix, which by its type is constrained to be n x 1. (I will argue below that a covector is a matrix which by its type is constrainted to be 1 x n.) From this point of view we can give two axioms: (not very formally described below)
If we agree with the above, the transpose of a vector must be the kind of covector described above: Transposition of a n x 1 matrix yields a 1 x n matrix. If we encode the fact that the size along the first dimensions of the result is always one, we get the covector as described above.
If you start from the matrix algebra point of view, then what you say is correct. This is the model that MATlab probably implements rather perfectly. Everything is a matrix. It's a closed system, all operations on matrices produce new matrices.
I certainly had the impression that the point of this issue (taking vector transposes seriously, since they are not just matrices) was to get away from that matrix algebra model, because it starts showing inconsistencies if you want to separate numbers from 1x1 matrices for reasons of efficiency. The alternative is then to follow the model of linear algebra, where there is a clear distinction between the field (scalars), the vector space (and its corresponding dual), and the space of linear operators/transformations (matrices).
I think that linear algebra operations in Julia are pretty strongly rooted in the matlab tradition, with some notable exceptions like having scalars and vectors, and not trying to second-guess the user. Anything that moves too far away from that is likely a very big disruption.
I believe that my axioms above should go along way to resolve the inconsistencies that show up when you want to separate scalars and vectors from matrices (for reasons of efficiency and correctness). But I'm definitely open to hear about other possible systems.
I agree with @Jutho here; the entire point of this issue is to move away from MATLAB's "everything is a matrix" semantics. MATLAB's "can add trailing singleton dimensions" rule is necessary for the universe to be closed under linear algebra operations, but that rule defines equivalence classes that contain members of type T
and others of type Array{T,N}
for all N
, and is the main reason why MATLAB's type system is undecidable. (See Theorem 1 of Joisha and Banerjee, 2006 - although the result is stated in terms of shapes, the problem really is about how changing array rank can change program semantics.)
But I still think we had pretty good consensus on that multiplication of scalars, vectors, covectors, and matrices should be associative (with exception for things like (v'*v)*v
where two non-scalars multiply to produce a scalar), and that e.g. v'*M*v
should produce a scalar, M*v
a vector and v'*M
a covector. I'm not sure how much farther it is possible to stray from everything-is-a-matrix semantics while still fulfilling these conditions.
What more is that we are trying to avoid, and which properties would we want to gain instead?
How much worse is it if we would only conflate T
and Array{T,0}
in some cases? (E.g. indexing: M[:,2]
produces an Array{T,1}
, but M[2,2]
doesn't produce an Array{T,0}
)
We can still maintain valid semantics with Transpose{Vector}
.
I reread all the comments about associativity and concluded that most of that discussion is not actually about associativity per se, but rather the semantics of inner products and outer products. If you do find an expression that isn't about either, please point it out.
The problem with Matlab-type semantics is that M*v'
and v*M
sometimes work even though they shouldn't. If M
is m x 1
then M*v'
is valid and returns an outer product-like quantity (since v'
is 1 x n
). Similarly, if M
is 1 x m
and we have the "can add trailing singletons" rule then v*M
can be evaluated as the product of n x 1
and 1 x m
matrices, again yielding an outer product-like quantity.
The question of conflating T
and Array{T,0}
was also raised in the APL literature - in APL, multidimensional arrays are recursively nested, which raises the question of whether Array{T,0}
and T
are distinguishable. If not, they are "grounded arrays" (which recurse down to T
), otherwise they are "floating arrays" (which recurse down to Array{T,0}
only). I believe More, 1973 actually proved that either choice is axiomatically consistent. I'm not sure if APL ever resolved the question of which to use before most practitioners retired or moved on to something else.
@jiahao: I didn't realize how fundamental your observation was that
v[i] = e_i' * v
to tie together linear algebra and indexing semantics. But then you must also consider
M[i,j] = e_i' * M * e_j
which indicates that an inner product with a basis vector from the right corresponds to indexing along the second dimension. Thus, I would maintain that the i
th entry of a covector v'
should be indexed as
v' * e_i = v'[1, i]
where of course we would like to write something else than 1
as the first index, but what?
Anyway, since we allow
e_i' * v = v[i] = v[i, 1]
then 1
should be allowed as a placeholder also in the above case.
v' * e_i
is a scalar though, so e_1' * (v' * e_i)
is a covector, not a scalar. So the dimensions don't match to think of v'[1, i] = e_1' * v' * e_i
edit: I think this might be an argument against allowing trailing singletons in indexing?
Yes, matrix indexing is the next logical step, and e_i' * M * e_j
is actually one of the expressions where the associativity axiom becomes useful, since
(e_i' * M) * e_j = m_i' * e_j
e_i' * (M * e_j) = e_i' * m_j
should be equal. Matrix indexing can be derived from a vector indexing rule and a covector indexing rule.
I believe that a consistent resolution of this issue may require expressions like v[i, 1]
to be disallowed, since the rule that allows this indexing behavior
a) should cause spurious cases for A*v'
and v*A
to work (the former does but the latter doesn't because we apply the trailing singleton rule inconsistently), and
b) if we consider the equality of v[i] = v[i, 1, 1, 1]
then the corresponding covector indexing rule would look like (v')[1, 1, 1, i]
and one has to have the corresponding "allow arbitrary number of leading singletons" rule for covectors for consistency. I find the lack of a uniquely defined first dimension very disturbing.
I don't think this index reasoning is really going anywhere. Indexing is a general property of N
-dimensional arrays, and that they can be written as simple matrix expressions for N=1
or N=2
is a rather trivial and does not contain any fundamental information. But this doesn't generalize to higher rank arrays and should thus not be used to motivate whether a covector needs a leading 1 or not when being indexed. This quickly leads to inconsistencies as observed in the previous posts.
As stated above, I have never been a fan of relying on trailing one indices and could not think of a single situation where it is necessary or even really useful. But I can accept that my point of view is too limited.
I have not fully digested all of this but my main question is simply: is size(covector)
equal to (n,)
or (1,n)
?
If they are not part of the AbstractArray
family, it is not even strictly required that size
is defined.
Though for practical reasons I guess it will be defined (like it is for numbers etc), and my vote goes to (n,)
. From the storage/container point of view, I would say there is no distinction between vectors and covectors. So there is also no need to use covectors as containers and therefore they do not belong in the AbstractArray
hierarchy. It's just a simple wrapper type to express that they behave differently then vectors with respect to linear algebra operations.
"As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each mutual forces, and have marched together towards perfection." - Joseph Louis Lagrange
I wish I could contribute more, but I thought I would just say that I am in favor of having a system which favors making the sophisticated mathematical tools employed by physicists and engineers intuitive to use and accurate. Perhaps, this resource from MIT, might be of use...
http://ocw.mit.edu/resources/res-8-001-applied-geometric-algebra-spring-2009/lecture-notes-contents/
Actually, come to think about it, it's more fair to say that
e_i' * x = x[i, :] # x is a vector or matrix
x * e_j = x[:, j] # x is a covector or matrix
Then for matrix indexing we would have
e_i' * M * e_j = e_i' * (M * e_j) = e_i' * M[:, j] = M[:, j][i, :] = M[i, j]
e_i' * M * e_j = (e_i' * M) * e_j = M[i, :] * e_j = M[i, :][:, j] = M[i, j]
Currently this does not quite hold in Julia, e.g. v[i, :]
currently produces a 1x1 array and not a scalar. (But maybe it shouldn't)
The associativity of the matrix multiplication in e_i' * M * e_j
corresponds to the commutativity of slicing along different dimensions M[:, j][i, :] = M[i, :][:, j]
, which seems like a desirable feature to have.
By the above line of reasoing, we should have
v'[:,i] = conj(v[i])
@Jutho: I think this "indexing as repeated slicing" paradigm does generalize to higher dimensional arrays/tensors: You apply one slicing for each dimension to index, in any order. This corresponds to a series of contractions with first order tensors corresponding to e_i
, etc, also in any order (as long as you keep track of which dimensions match up).
I think this "indexing as repeated slicing" paradigm does generalize to higher dimensional arrays/tensors: You apply one slicing for each dimension to index, in any order. This corresponds to a series of contractions with first order tensors corresponding to e_i, etc, also in any order (as long as you keep track of which dimensions match up).
Yes I am very much familiar with tensor contractions etc, and indeed getting a matrix element/ tensor element does corresponds to taking an expectation value in the standard computational basis, i.e. contracting with some basis vectors (as long as you have an orthogonal basis at least), but there is no unique association as to whether an index corresponds to a contraction with e_i
or with e_i'
. That corresponds to deciding whether the corresponding index appears in a covariant or contravariant position, and there is no unique decision. All possible combinations of upper and lower indices have useful applications. But contracting with e_i
and e_i'
is not even the correct way of posing this question from the mathematical point of view, cause, as I've stated above, there is actually no such mathematical mapping as the transpose of a vector. The transpose is an operation defined for linear maps (matrices) and even there it only corresponds to the matrix transpose if you also write dual vectors as column vectors. Transpose of a vector is just a convenience trick that has been introduced in matrix algebra (where indeed vectors are n x 1
matrices) and it only works because you are in a Euclidean space where there is a canonical mapping from the (conjugate) vector space to the dual space.
In particular, if you want the properties you describe above, you should have that M[i,:]
returns a different object (either a 1xn
matrix or a covector) then M[:,i]
(which should be a nx1
matrix or vector). The fact that this does not cleanly generalize to higher dimensions is exactly one of the main discussion points of this issue, and it seems like most people are in favor of APL indexing, where dimensions indexed with a number are dropped (i.e. both M[:,i]
and M[i,:]
produce a rank 1 array, hence, a vector). Indexing is a property of arrays, and it is this mixing of indexing behavior with linear algebra operations that result in all the confusions / inconsistencies in the first place. It can be consistent as long as you stay within the closed ecosystem of rank N=2
objects, i.e. everything is a matrix, also vectors and numbers, and you never consider higher dimensional arrays.
I just also realized that M[i,:]
would have to produce a covector by my reasoning above, as you say. So it seems that having covectors is at some level fundamentally inconsistent with APL indexing. If we favor APL indexing (and I do like it) the question becomes where to draw the line between that world and world where covectors live. (I was hoping that it would be possible to reconcile the two, maybe the rest of you had already realized that we would have to give up on that?)
This conflict is maybe not so surprising if you think about it:
v'
then we would have v' = conj(v)
. The covector instead can be seen as keeping track of a missing first dimension.I guess a good start would be to restrict covectors as much as possible, basically just defining multiplication, addition/subtraction, and left division on them.
The idea seems to be to not make them <: AbstractArray
. Would it make sense to have them be subtypes of a new LinearOperator
type? (Which I know has been up for discussion before, but don't quite remember the conclusions.)
I think that the absence of multiple inheritance or a language construction for interfaces (both of which have been under discussion) requires that certain concepts are only implemented by an 'implicit' interface, i.e. a set of methods that need to be defined. Iterators is one such concept, linear operators would probably be another that needs to be specified by a set of methods rather than a type hierarchy. It would be strange to have an abstract type LinearOperator
and then not have Matrix
to be a subtype of it.
@toivoh That is an important consideration, which is the reason I suggested new functions like row(M,i)
and col(M,i)
to extract the i
th vector or covector from a matrix. These functions would just be for two-dimensional arrays M
and for people that are interested in matrix algebra. It might seem at first slightly less obvious than the MATLAB-style indexing but, overall, separating conceptually the idea of a vector and it's dual/transpose/covector helps make things clear. In the case of quantum mechanics, a whole field jumped to Dirac's bra-ket notation simply because this distinction between vector and covector is so critical for complex vectors, and Dirac's notation makes this obvious. It's my hope that Julia will do this too, as well as being a good tool for higher dimensional storage arrays and higher dimensional linear algebra! (Because we're greedy, right?)
I have to say that a person coming with experience in MATLAB and some familiarity with mostly real matrices (but is not a hard-core linear algebra fanatic) may not at first realize why what a few of us are arguing for is important, but I believe it is.
I've said this earlier, but I will repeat it: from my point of view we have this list of priorities, with causalities flowing downwards:
(1) Arrays are fundamentally storage containers which all Julia users will use, and we want the best semantics for this. APL-style rules seems, to me anyway, a very good solution. On the other hand, MATLAB-style, minimum-two-dimensional arrays with assumptions with trailing 1-dimensional indices just seems unnatural, confusing, and as Jutho said they can even lead to sloppy programming where you don't properly keep track of the dimension of your array. I would go so far to say that for vector v
, the code v[i,:]
should throw an error since v
is one dimensional. Element-wise operations like +
and .*
are useful for any container class, not just in matrix or multilinear algebra. The only concession storage-container people make for the stuff below are the extra dots on .*
etc.
(2) Most, but not all, Julia users will use matrix algebra, so we add some syntactic sugar for some vector and matrix operations, mostly with the *
symbol (but we may have matrix division, etc). In this system we take one and two-dimesional arrays are column vectors and matrices, respectively. For a fully-functional matrix system we also need row vectors (covectors) and a transpose operation '
. A row vector is in every possible sense a one-dimensional array, and I assert that it should be indexed as such (and certainly not like covec[1,i]
!!) With the APL-rules, we are forced to make some distinction between vectors and covectors, and the type system is ideal for this (remember we were lucky that we could already re-use matrix and vector as typealiases of one and two-dimensional arrays rather than a wrapper type... in principle we could wrap them as well, but I don't see the point). With the type system the compiler can figure out that CoVector * Vector
is scalar and Vector * CoVector
is a matrix, and so on. As toivoh said, _from a MATLAB point-of-view, the CoVector is exactly keeping track of the "missing" first dimension._ Casual users won't need to construct these objects; they'll just input vectors and matrices and use the *
and '
operations. People who care will notice and appreciate the distinction. The _biggest_ change for users is the necessity of changing M[i,:]
to using a new function row(M,i)
or converting to the wrapper class with something like Transpose(M[i,:])
or M[i,:]'
- one can think of this as an advantage, because like Dirac notation you never forget which objects are vectors and which are covectors, and assigning to objects with type-asserts will raise errors where appropriate. I think it's worth debating this notation, though. In the future, we may even extend the system of wrappers to allow for efficient delayed evaluation. It's a win-win for everyone, as far as I can see.
(3) Some of us are interested in multi-linear algebra, and had to learn the difference between a dual space and transposition, etc. Jutho has touched on this. Julia's arrays are already great storage devices for higher-dimensional tensors, and additional packages (that may or may not find there way into Base) will be used by people like us. We do not need, nor want, operations like '
or *
defined on arrays of dimensionality greater than two. I can't see a storage-oriented use case for '
that can't be done more cleanly by explicitly reordering the indices. The whole idea of trailing one-dimensional indices only makes the concept less appealing... so please keep '
for one- and two-dimensional arrays only - like MATLAB :) (see, I do have nice things to say about MATLAB...)
For a fully-functional matrix system we also need row vectors (covectors) and a transpose operation '.
This statement really cuts to the heart of the problem. Indexing, transposition and *
products are all intermixed. Furthermore, it is not possible to reconcile the full semantics of row vectors with those of covectors without further introducing a function like row(A, i)
to return the i
th row of A
as a covector rather than a one-dimensional array. Currently A[1, :]
wants to mean both "take the first row of A" and "take the first slice along the second dimension of A", but these notions are fundamentally incompatible in the covector proposal.
A row vector is in every possible sense a one-dimensional array, and I assert that it should be indexed as such.
I believe you are being a little too glib with this statement. The preceding discussion has established quite clearly that if you want full linear algebra semantics (with the correct products and transposes), then a row vector cannot have the same type as a one-dimensional array. First, indexing into a covector should return complex conjugated values, (v')[1] = conj(v[1])
, for consistency wrt the inner product dot
. Second, covectors are not vectors, they are linear functionals waiting to produce an inner product with a vector. Third, vectors and covectors have different array concatenation behaviors under hcat
and vcat
. For all these reasons a row vector cannot be "in every possible sense a one-dimensional array".
I would be in favour of getting rid of trailing singletons: I don't think I've ever seen Julia code that has made use of it. I was originally going to say we would need it for 0-dimensional arrays, but I see that X[]
works just fine.
I think APL indexing, along with row(M,i)
function for row vectors makes the most sense, and seems like a good compromise. I'm not sure about row vector indexing, but I _don't_ like (v')[1,i]
.
The other big decision, which we haven't even touched on yet, are types. My one finding from having had a go at this earlier is that it is really hard to have v'
be a AbstractVector
, as it makes dispatch a mess.
Two possible options:
Transpose <: AbstractMatrix
CoVector <: Any
Factorization
objects.X'
_not_ be an AbstractMatrix
Transpose <: Any
For conjugation, we can either
a. define ConjugateTranspose
(along with ConjugateCoVector
if we go with option 1 above)
b. use a Conjugate
wrapper type, and nest appropriately: we would need some convention as to whether we use Transpose{Conjugate{T}}
or Conjugate{Transpose{T}}
.
I like having Transpose{Matrix}
not a subtype of AbstractMatrix
. I think the closest analogue we have in base is a special matrix type like Symmetric
, which behaves algebraically like a matrix but not in its indexing semantics. (#987 established that without multiple inherence or Holy traits, the type hierarchy has to respect container semantics over algebraic semantics.)
The problem of composing types that are basically "semantic tags" also showed up in #8240. I think Transpose{Conjugate{T}}
would be preferable, since the conjugation is a notion that falls through to the underlying field of elements.
Here are examples that are sometimes follow the same trailing singleton rule as MATLAB, and at other times don't:
julia> (1:5)[5,1,1,1,1,1,1]
5
julia> (1:5)[5,:]
ERROR: BoundsError()
in getindex at abstractarray.jl:451
julia> A=zeros(2,2,2); A[1,2]=5; A #Same as A[1,2,1]=5
2x2x2 Array{Float64,3}:
[:, :, 1] =
0.0 5.0
0.0 0.0
[:, :, 2] =
0.0 0.0
0.0 0.0
julia> A[:,:]=3; A
2x2x2 Array{Float64,3}:
[:, :, 1] =
3.0 3.0
3.0 3.0
[:, :, 2] =
3.0 3.0
3.0 3.0
julia> A=reshape(1:8,2,2,2); A[:,1]
2-element Array{Int64,1}:
1
2
julia> A[:,1,1]
2-element Array{Int64,1}:
1
2
julia> A=reshape(1:8,2,2,2); A[1,:]
1x4 Array{Int64,2}:
1 3 5 7
julia> A=reshape(1:8,2,2,2); A[1,:,:]
1x2x2 Array{Int64,3}:
[:, :, 1] =
1 3
[:, :, 2] =
5 7
julia> A=zeros(1); A[1] = randn(1,1)
ERROR: `convert` has no method matching convert(::Type{Float64}, ::Array{Float64,2})
You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.
in setindex! at array.jl:307
julia> A=zeros(1,1); A[1,1] = randn(1)
ERROR: `convert` has no method matching convert(::Type{Float64}, ::Array{Float64,1})
in setindex! at array.jl:308
julia> 1/[1.0,] #In MATLAB, interpreted as the inverse of a 1x1 matrix
ERROR: `/` has no method matching /(::Int64, ::Array{Float64,1})
julia> [1:5]*[1:5]' # Shapes are (5,) and (1,5) - promoting to (5,1) x (1,5) works
5x5 Array{Int64,2}:
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
It seems that these things could deserve a bit of cleanup, once we have concluded how we would want them to behave. The current behavior when indexing with too few indices where the last one is a slice seems particularly fishy.
Wow. The examples that Jiahao puts up are extra-ordinarily disturbing... I don't like trailing singletons in indexing operations and indexing assignment operations because of their implicitness and ambiguity. If you are not aware of these behaviors in advance, then you could end up doing one thing when you are actually trying to do another. I am in favor of committing to an exact, clear usage of the language and avoiding ambiguous shortcuts.
To actually implement this, we're going to need some sort of triangular dispatch, otherwise I'm not sure how you would express things like "a matrix with Complex64
or Complex128
entries, its transpose, or its conjugate transpose". Especially if we use options 2+b above.
@esd100, which of these are ambiguous or dangerous? These are all either convenient – when "virtual singletons" are imagined for you and you wanted them – or inconvenient – when you wanted them and they weren't. None of these has two plausible meanings with different behaviors.
A row vector is in every possible sense a one-dimensional array, and I assert that it should be indexed as such.
I believe you are being a little too glib with this statement.
True! Sorry @jiahao , I'll blame the jetlag. The word I meant to write is "tensor", and I was strictly referring to the behaviour of indexing []
. You are correct that concatenation is an important consideration, and I think the natural way of doing that (to construct a matrix) is reasonably obvious (by thinking of it as 1 x n sized in this instance). Clearly a covector is not "in every sense" a 1D Julia Array... that's the entire point...
The examples of jiahao are also disturbing to me. I would be in favor of removing any possible trailing singleton behaviour. Linearizing later dimensions might be useful, but also could to encourage a kind of laziness where you forget how many dimensions your array has... (I think this is what is implied by "ambiguous" and "dangerous"... I really want Julia to throw an error when I treat a 16 dimensional array like a 15 dimensional array, for instance).
which of these are ambiguous or dangerous?
The forth one seems both ambiguous and dangerous to me.
julia> A=zeros(2,2,2); A[:,:]=3; A
2x2x2 Array{Float64,3}:
[:, :, 1] =
3.0 3.0
3.0 3.0
[:, :, 2] =
3.0 3.0
3.0 3.0
Clearly (to me) this should be a syntax error. A is rank-3 and the 3rd dimension was not referenced at all. It is most likely the third dimension was left off by error and a hard to find bug is now been introduced into the code. I would be grateful to get an error at his point (as it is in both IDL and fortran).
I think it is best if arrays only allow indexing with the correct number of indices or the special case of 1D linear indexing (since it is extremely handy). This would include disallowing trailing singletons as well.
or the special case of 1D linear indexing (since it is extremely handy)
How easily our principles are sold! :)
I've never liked linear indexing all that much; it strikes me as a bit of a hack. Often we just want the fastest way to iterate over an array. And for all but simple dense arrays linear indexing can be really slow. That said, we might not be able to eliminate linear indexing entirely (for performance reasons, and for breaking too much code).
Yes, true enough. But at least 1D indexing is visually distinct. Whereas indexing a 6D array with 5D is much less so. In any case, I would rather have stricter indexing rules and give up 1D linear indexing than go the other direction. Especially since one can easily get a reshaped reference to the array that shares memory.
Can we use {}
brackets (or some other) for linear indexing, and keep []
for multidimensional indexing? Just an idea...
On 23 Mar 2015, at 2:36 pm, Bob Portmann [email protected] wrote:
Yes, true enough. But at least 1D indexing is visually distinct. Whereas indexing a 6D array with 5D is much less so. In any case, I would rather have stricter indexing rules and give up 1D linear indexing than go the other direction. Especially since one can easily get a reshaped reference to the array that shares memory.
—
Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4774#issuecomment-84805310.
I also think that it would be nice I'd we were able to separate the syntax
for linear indexing from the regular indexing syntax, but I agree that it
might be a hugely breaking change.
While this issue seems to attract a lot of debate, most of the actual work in improving indexing has happened in quite a large number of pull requests throughout the 0.4 cycle. For example, now that we have CartesianIndex
and friends, I don't see why one would need to separate the syntax for linear and cartesian indexing---indeed you can now combine them (#10524). I'd like to get rid of linear indexing too, but sometimes it is more performant (probably largely due to #9080; enumerate
and zip
suffer from the same problem). We probably should implement fastindex
as a wrapper around eachindex
, as in #10507.
If you're interested in indexing rules, instead of pounding this issue to death let's focus on the most interesting frontier. At this moment, that's unquestionably #10525. In particular https://github.com/JuliaLang/julia/pull/10525#issuecomment-84597488 needs some kind of resolution.
@timholy I haven't really followed all the developments on fast cartesian indexing, and having just had a look at base/multidimensional.jl, I see there is a lot of metaprogramming going on. Is there any chance you will have time to write up (or give a JuliaCon talk) on how this all works?
In some ways, there's not much to know: while there's a fair amount going on under the hood, the idea is to make it dirt-simple to use. So quite literally
k = 0
for I in eachindex(A)
B[k+=1] = A[I] # B is being linearly-indexed, A is being cartesian-indexed
end
might be all you need to know. (In other words, the help on eachindex
might be all the documentation needed.) However, it turns out that you can write quite a large number of algorithms using a few extensions of this basic paradigm; what those extensions are and why they are powerful may, indeed, be less obvious.
I do plan to write this down sometime in the coming months, but if people really do want to know the details perhaps a JuliaCon talk would be reasonable. @Jutho or @mbauman could give that talk just as well as I.
I would add more about ambiguity, however I think some of the trailing discussion summarized certain key points. Perhaps I am more sensitive to this kind of ambiguity as an outsider or out of fear, at the risk of sounding too dramatic. In my humble opinion, any number of simple, mission critical, thought exercises could lead you down a path where the cost / benefit analysis of a simple error is not worth it.
We probably should implement fastindex as a wrapper around eachindex, as in #10507
@timholy Is there any reason not to have eachindex
return a UnitRange for fast linear arrays? It'd still be type-stable, and if the caller ever wants to ensure they get a CartesianIndex, they can manually construct a CartesianRange (we could also add a method for eachindex(size(A))
since CartesianRange isn't exported).
That's what it did at first, but I think @Jutho changed it. (If that's not true, then I probably changed it without realizing.) I assume the change was motivated for consistency sake (so you can count on getting a CartesianIndex
), which does have a certain amount of sense to it. But as you point out, there are alternative means to ensuring this.
@Jutho, any thoughts?
I don't remember changing eachindex
, but it could be. I certainly don't remember a good reason for it, other than having a consistent result independent of the type of array. But if the difference between arrays with and without efficient linear indexing is clearly explained in the docs, I see no reason why eachindex
could not return a linear range in the case of arrays with efficient linear indexing.
@timholy, in response to your previous post, I am unable to attend JuliaCon so feel free to talk about the CartesianIndex
stuff (I anyway only contributed some final touches). I am looking forward to the reports and videos of the conference already.
A whimsical thought: for an array A
of any dimension (including dimension > 2), one could have A'
cyclically permute the indices of A
. So, A'[i, j, k] == A[j, k, i]
. This would reduce to the usual matrix transpose for 2d arrays, as well as the usual transpose for row and column "vectors" when construed as in MATLAB (i.e. as [n, 1] and [1, n] 2d arrays respectively). Thus it would never map a 2d column "vector" to a true covector or a 2d row "vector" to a true vector. (I think this property is good, but others may disagree.) It would give the identities A'' == A'
and v'' == v
_for matrices and vectors construed as array objects_. Given the sort of type hierarchy as discussed above (where true vectors and covectors are sufficiently distinguished from abstract arrays), '
could still be given an entirely different method for true vectors and covectors where it corresponds to the linear algebraic concept and need not satisfy v'' == v
(but could if that's what people decided they want).
To be clear: I don't for a second think that '
_needs_ to have any method for arrays of dimension > 2, but I hadn't seen this proposal above (unless that's what was meant by "reversing indices") and thought I'd just mention it. The most harm I see it doing (prima facie) is conceptual: conflating a (slightly arbitrary) combinatorial operation with an operator normally taken as of the domain of linear algebra. To this one may reply that at least some degree of such conflation is inevitable when we try to extend linear algebraic concepts into purely data-centric concepts (as evidenced by this whole discussion). As such, we may as well reap a convenient shorthand for cyclic permutations of multi-dimensional arrays as a byproduct of determining what '
ought to do in general for multidimensional array dimensions as long as it reduces to the expected case in 2 dimensions. One could even add a method that takes an integer argument so that A'(k)[I]
cyclically permutes the indices of A[I]
k times, giving A'(ndims(A))[I] == A[I]
and A'(-k)[I]
permutes the indices in the opposite direction.
Just a thought.
If transpose
is defined for multidimensional arrays, I would prefer that it still satisfies A''=A
, i.e. it is its own inverse. This is in direct conflict with the previous proposal.
That's fair. As I said, my suggestion is only (potentially) attractive if one is comfortable letting a rift grow between the linear algebraic significance of transpose and whatever array-centric significance one imposes in the d > 2 case. My reasoning was that as long as that rift is already (kind of) happening, and if the methods would just be totally unused otherwise, it could be neat to have a shorthand for permuting indices -- as long as it doesn't require any special treatment to make the d = 2 case work. As you (@Jutho) mentioned, it's something of a convenient coincidence that transposing dimensions in the 2d case has the linear algebraic significance it does (and only after identifying (co)vectors as 2d arrays, for that matter), so maybe we don't need to be picky about transpose
's mathematical properties (e.g. requiring A'' == A
) for d > 2. If one wishes to go this route, there are a lot of potentially useful methods that could be assigned, e.g.: A'
cyclically permutes once, A'(k)
cyclically permutes k times, A'(I)
for I::Array{Int, 1}
with length <= ndims(A)
cyclically permutes the indices listed in I
, and A'(p)
for p::Permutation
of length <= ndims(A)
permutes indices according to p
. But I suppose maybe the best thing to do is make a package and see if it's useful enough to catch on.
I support two changes/clarifications that have been mentioned, e.g., by StefanKarpinski on Oct 16, 2014, and simonbyrne on Mar 22, 2015, but with one stipulation:
transpose
should only be used for vectors and 2-dimensional matrices, not general tensors.*
and transpose
should delete trailing singleton dimensions at the end of the computation. Otherwise, trailing singleton dimensions may be preserved.These two changes would provide many conveniences and resolve many ambiguities within traditional linear algebra work. They intentionally say nothing about general array indexing or tensor algebra, which can be considered separately. (In particular, tensor transpose should be considered to be an entirely separate operation from matrix/vector transpose.)
Under this proposal, when working with matrix multiplication and transpose, there would be essentially no distinction between a vector and a one-column matrix, nor would there be any meaningful distinctions among a scalar, a vector of length 1, and a 1-by-1 matrix. However, x'
would be a 1-by-n matrix, not a vector.
On the other hand, for the purpose of holding data, an array of any size could be constructed. No singleton dimensions would be deleted because the linear algebra concepts of multiplication and transpose would not be involved.
Below is a hypothetical transcript of a Julia session with the proposed changes.
First, we define a scalar, two vectors, and a matrix.
julia> alpha = 2.0
2.0
julia> x = [1.0; 2.0; 3.0]
3-element Array{Float64,1}:
1.0
2.0
3.0
julia> y = [4.0; 5.0; 6.0]
3-element Array{Float64,1}:
4.0
5.0
6.0
julia> A = [1.0 2.0 3.0; 4.0 5.0 6.0; 7.0 8.0 9.0]
3x3 Array{Float64,2}:
1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0
Scalar-vector multiplication works even if extraneous dimensions are present, and the result is always a vector.
julia> alpha*x
3-element Array{Float64,1}:
2.0
4.0
6.0
julia> alpha*x[:,[1]]
3-element Array{Float64,1}:
2.0
4.0
6.0
Transpose is an involution.
julia> x'
1x3 Array{Float64,2}:
1.0 2.0 3.0
julia> x''
3-element Array{Float64,1}:
1.0
2.0
3.0
julia> x==x''
true
julia> x'''
1x3 Array{Float64,2}:
1.0 2.0 3.0
Multiplying a matrix by a one-column matrix produces a result identical to the matrix-vector product.
julia> A*x
3-element Array{Float64,1}:
14.0
32.0
50.0
julia> A*x[:,[1]]
3-element Array{Float64,1}:
14.0
32.0
50.0
A one-row matrix times a matrix equals a one-row matrix.
julia> x'*A
1x3 Array{Float64,2}:
30.0 36.0 42.0
The inner product is a scalar and is produced by the more general rules for matrix-vector and matrix-matrix products.
julia> x'*y
32.0
julia> x'*y[:,[1]]
32.0
The outer product is nothing special.
julia> x*y'
3x3 Array{Float64,2}:
4.0 5.0 6.0
8.0 10.0 12.0
12.0 15.0 18.0
julia> x[:,[1]]*y'
3x3 Array{Float64,2}:
4.0 5.0 6.0
8.0 10.0 12.0
12.0 15.0 18.0
A vector times a vector is an error.
julia> x*y
ERROR: `*` has no method matching *(::Array{Float64,1}, ::Array{Float64,1})
A vector times a matrix is an error.
julia> x*A
ERROR: DimensionMismatch("*")
in gemm_wrapper! at linalg/matmul.jl:270
in * at linalg/matmul.jl:74
Matrix multiplication is associative.
julia> (x*x')*y
3-element Array{Float64,1}:
32.0
64.0
96.0
julia> x'*y
32.0
julia> x*(x'*y)
3-element Array{Float64,1}:
32.0
64.0
96.0
julia> norm((x*x')*y-x*(x'*y))
0.0
EDIT: Deleted two examples involving demotion of inner dimension in a product. They didn't have much bearing on the current discussion.
I'm afraid this would lose the type stability of the involved operations, since discarding trailing singleton dimensions is not a type stable operation.
The way I see it, the whole covector business is about having a singleton dimension which it is known based on the type that it will be removed. Also, we have been trying pretty hard to preserve the distinction between a vector and a matrix that just happens to have one column, but this suggestion would remove that distinction in some cases, such as x''
.
Heretical suggestion: what if we dropped the dimensionality from the type parameters, and did all dimensionality/size checks at runtime? (We end up doing a fair amount of that anyway.) And leave the dimensionality parameter for types that also encode the size as a type parameter. (Ducks, hides out for two weeks, and changes his github account to @SomeoneWhoCertainlyIsntTimHolyUhUhNoWay.)
(Of course, I would really complain at myself on account of #10525 alone.)
FWIW, I'd say that isn't so crazy an idea: it's how Torch works, for example, and the Torch developers have voiced unhappiness with Julia's encoding of dimensionality inside the type system.
@timholy Question: could we use this improve semantics regarding linear algebra and the transposes of vectors?
If one was still interested in using CoVector
/ DualVector
/ TransposeVector
these would still have to wrap our new TimHolyArray{DataType}
_and_ we would have to either make sense of the transpose of an array of dimension greater than two (or one), or else forbid the construction TransposeVector(tharray)
when tharray
has dimension greater than two (or one)... In fact all sorts of things will have to give errors at the run-time level which are currently compile-time errors (like multiplications which are currently undefined and thus forbidden by the type system).
On the other hand, implementing a transpose flag inside this new class may be bad... it adds more complexity to what should be an efficient and lightweight container. I would tend to rule out that option, and leave the hard work to the compiler/type system.
I'm not necessarily against your idea - but it seems to be an additional issue that could be done in parallel to vector transposes.
@timholy: I am really sure if this is the way to go or not. There are situations where I find it very useful to dispatch on the dimension.
The suggestion was to make this apply to all arrays. But I'm against my own proposal now, simply because for many multidimensional cases you need to generate N
loops for N
dimensional arrays. We couldn't do that any more if N
weren't a type parameter.
Yes isn't this the entire point of your Cartesian macros (or the staged function I still am not used to :-) )?
I did similar things in C++ where it is a real pain to have the dimension as a template parameter. But I had situations where the dynamic variant was limiting because one needed big if statements for specializing the different array dimensions.
Proposal:
timesfast
and the current transpose behavior to transposefast
.(*)
and transpose
to truncate trailing singleton dimensions as in earlier comment. For example, u'*v
would become a scalar, v''
would become a vector, and (v*v')/(v'*v)
would be defined.The existing behavior is type stable. The proposed behavior follows the conventions of many linear algebra texts. They are both valuable. Perhaps Julia should have both.
I want to use Julia in the classroom, so I vote for the default behavior to value convenience over efficiency.
Thanks to several contributors for getting me up to speed on this very long thread!
@briansutton: I think you should really reconsider what you're asking for. I would encourage you to wait until you have a deeper understanding of how Julia works before proposing that we redefine the meaning of multiplication.
This issue has been about how we can avoid as many special cases as possible.
Rules that allow fudging of singleton dimensions break down the moment one of the singleton dimensions is one you actually care about. With the "truncate [all] trailing singleton dimensions" rule, then if v
is a 1-element vector, then v'
is a scalar, and so v''
is also a scalar. So even in this proposal the property that v == v''
does not always hold.
You could try to modify the rule to "truncate only the last trailing singleton dimension if the array has dimensionality 2". But even with this modified rule, the outer product v * w'
does not follow automatically from the definition of matrix multiplication, but instead has to be its own special-cased definition of Vector * Matrix
, and the definition must be "throw an error unless the shapes are (N) x (1, M)".
@jiahao:
When doing linear algebra with vectors and matrices, I don't want to draw any distinctions among a scalar, a 1-element vector, and a 1-by-1 matrix. Currently, various operations produce any of the three objects. My suggestion is, for the operations of matrix multiplication and transpose, to pick one of the three to be the preferred representation.
Regarding your first example (v==v''
), I would like to avoid constructing a 1-element vector v
in the first place.
Regarding your second example, yes, I think v*w'
should be handled just as you describe. When working with matrix multiplication and transpose, I want the vector v
and the N-by-1 matrix v[:,[1]]
to denote the same mathematical object, even if they have different internal representations. This would require special code to handle N-vector times 1-by-M matrix.
I guess you're still not convinced that type stability is important.
@briansutton: I think you would find it very informative to try implementing the machinery required to make your proposal run as fast as the current type system allows Julia code to run. I, for one, believe that your stated goals are unachievable given the Julian goals of delivering predictable performance and C memory-layout compatibility.
I'm not sure if I understand the arguments going back and forth, however one thing caught my eye. Perhaps it is wishful thinking, but it would be nice if the computer were able to think as quickly as the human brain. What I mean is that when Brian Sutton talks about the program "picking a preferred representation", I envision a program that can think, just like we can, when we do math. Perhaps, it is not feasible with today's technology and will slow things down too much. But, wouldn't it be nice ...
I am a physicist and an active user of Julia.
I no longer have a strong preference on weather keeping or droping all the trailing singleton dimensions
But here I would like to raise a closely related issue.
The current Julia implementation:
Let V be a 3-dim rank-1 tensor (a vector)
V[1] will give us a scaler not a 1-dim rank-1 tensor
Let A be a 3x4x5 rank-3 tensor
B=A[1,:,:] will give us a a 1x4x5 rank-3 tensor.
The above two behaviours are not quite consistent.
I strongly prefer the following indexing/sliding feature:
Let A be a 3x4x5 rank-3 tensor
B=A[1,:,:] will give us a 4x5 rank-2 tensor.
C=A[1:1,:,:] will give us a 1x4x5 rank-2 tensor.
(currently the above two give the same result)
Let V be a rank-1 tensor
B=V[1] will give us a scaler
C=V[1:1] will give us a 1-dim rank-1 tensor.
This feature will help us to change the shape of tensor more easily, and will be helpful when we allow trailing singleton indices.
Best
Xiao-Gang
From: esd100 [[email protected]]
Sent: Tuesday, June 09, 2015 9:46 PM
To: JuliaLang/julia
Cc: Xiao-Gang Wen
Subject: Re: [julia] Taking vector transposes seriously (#4774)
I'm not sure if I understand the arguments going back and forth, however one thing caught my eye. Perhaps it is wishful thinking, but it would be nice if the computer were able to think as quickly as the human brain. What I mean is that when Brian Sutton talks about the program "picking a preferred representation", I envision a program that can think, just like we can, when we do math. Perhaps, it is not feasible with today's technology and will slow things down too much. But, wouldn't it be nice ...
—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4774#issuecomment-110554622.
I mean: C=A[1:1,:,:] will give us a 1x4x5 rank-3 tensor.
Xiao-Gang
From: Xiao-Gang Wen [[email protected]]
Sent: Monday, June 22, 2015 12:01 PM
To: JuliaLang/julia; JuliaLang/julia
Cc: Xiao-Gang Wen
Subject: RE: [julia] Taking vector transposes seriously (#4774)
I am a physicist and an active user of Julia.
I no longer have a strong preference on weather keeping or droping all the trailing singleton dimensions
But here I would like to raise a closely related issue.
The current Julia implementation:
Let V be a 3-dim rank-1 tensor (a vector)
V[1] will give us a scaler not a 1-dim rank-1 tensor
Let A be a 3x4x5 rank-3 tensor
B=A[1,:,:] will give us a a 1x4x5 rank-3 tensor.
The above two behaviours are not quite consistent.
I strongly prefer the following indexing/sliding feature:
Let A be a 3x4x5 rank-3 tensor
B=A[1,:,:] will give us a 4x5 rank-2 tensor.
C=A[1:1,:,:] will give us a 1x4x5 rank-2 tensor.
(currently the above two give the same result)
Let V be a rank-1 tensor
B=V[1] will give us a scaler
C=V[1:1] will give us a 1-dim rank-1 tensor.
This feature will help us to change the shape of tensor more easily, and will be helpful when we allow trailing singleton indices.
Best
Xiao-Gang
From: esd100 [[email protected]]
Sent: Tuesday, June 09, 2015 9:46 PM
To: JuliaLang/julia
Cc: Xiao-Gang Wen
Subject: Re: [julia] Taking vector transposes seriously (#4774)
I'm not sure if I understand the arguments going back and forth, however one thing caught my eye. Perhaps it is wishful thinking, but it would be nice if the computer were able to think as quickly as the human brain. What I mean is that when Brian Sutton talks about the program "picking a preferred representation", I envision a program that can think, just like we can, when we do math. Perhaps, it is not feasible with today's technology and will slow things down too much. But, wouldn't it be nice ...
—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4774#issuecomment-110554622.
What you are asking for is how slice
currently works. My sense is that A[stuff]
will become a synonym for slice(A, stuff)
, and so you'll probably get your wish.
Dear Tim:
Thank you for the tip. I have tried slice. It does not fit my needs. Slice produce a new data type "subarray" which I cannot use in my other code which uses ::Array data type.
Maybe I can change my other code so that they allow "subarray" data type.
Xiao-Gang
From: Tim Holy [[email protected]]
Sent: Monday, June 22, 2015 5:32 PM
To: JuliaLang/julia
Cc: Xiao-Gang Wen
Subject: Re: [julia] Taking vector transposes seriously (#4774)
What you are asking for is how slice currently works. My sense is that A[stuff] will become a synonym for slice(A, stuff), and so you'll probably get your wish.
—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4774#issuecomment-114268796.
Is there anything really dependent on using the concrete Array
types instead of AbstractArray
in your code? It may require nothing more than a search/replace of Array
with AbstractArray
to make things work.
Dear Scott
Thank you very much for the tip.
Xiao-Gang
From: Scott P. Jones [[email protected]]
Sent: Thursday, June 25, 2015 9:55 AM
To: JuliaLang/julia
Cc: Xiao-Gang Wen
Subject: Re: [julia] Taking vector transposes seriously (#4774)
Is there anything really dependent on using the concrete Array types instead of AbstractArray in your code? It may require nothing more than a search/replace of Array with AbstractArray to make things work.
—
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4774#issuecomment-115265047.
Did that do the trick for you? I'm happy to help!
@wenxgwen , alternatively, you could call copy(slice(A,...))
in order to get a copy of the slice in a new Array
that should then work natively with your already existing functions.
Fixing this issue has now become a lucrative endeavor
Since fixing this issue is now critical on my path to 3 commas ...
But in all seriousness. I tried to read through the discussion completely but I can't guarantee this hasn't been suggested before:
Can we possibly expand the AbstractArray definition to have an additional trait (similar to LinearIndexing?) which defines whether the underlying data is row-based or column-based storage? This addition would open up the following properties (please don't focus on names... just concepts):
v --> length-2 Vector{Col}
[ 1
2 ]
v' --> length-2 Vector{Row}
[ 1 2 ]
m --> 2x2 Matrix{Col}
[ 1 3
2 4 ]
m' --> 2x2 Matrix{Row}
[ 1 2
3 4 ]
Some operations:
v' --> length-2 Vector{Col}
v'' == v
v*v or v'*v' --> either error, or do element-wise multiplication
v' * v --> scalar
v * v' --> 2x2 Matrix (could be Row or Col??)
v' * m --> 2-length Vector{Row}
v * m --> either error or broadcasting operation
m * v --> either error or broadcasting operation
m * v' --> 2-length Vector{Col}
Indexing:
v[2] --> 2
v[1,2] --> error
v'[1,2] --> 2
m[1,2] --> 3
m'[1,2] --> 2
Size:
length(v) --> 2
length(v') --> 2
size(v) --> (2,)
size(v') --> (2,)
length(m) --> 4
size(m) --> (2,2)
Obviously this proposal is missing a lot of definitions, but maybe it can start the discussion. Can we have support for both column-index and row-index storage at the same time, and "fix" some issues along the way?
I very much wish that Julia was row-based storage, as that's just naturally how I think about loops and many other operations. (and I don't think I'm the only one that thinks this way) Comments please!
It is an interesting thought, having the constructor also take in row or column. I think it has been discussed before somewhere deep in the GitHub issue system. I could be wrong!
I personally prefer column based storage because most of my textbooks use columns for their math and then in Julia I don't need to change everything around to using rows instead. I also did find it strange in the beginning but it quickly became a non-issue for my work. There are algorithms that are easier expressed in rows however which is why this could be a nice to have to be able to use a non-standard storage when required. I would hope that convention would be though to always return a column major matrix or column vector when you have a function that is exported such that there is never a question on which type you have when calling a function. Otherwise it could get quite messy and become unpleasant to use when you have to go always look up what type is returned.
Column-based vs row-based is not within scope of this issue.
@tbreloff I very much like the idea. It would be great to be able more easily interface with languages/libraries that are row-major.
Jiahao is right that preferring row-major vs column-major is off topic. Its
just a nice side effect that solving the transpose issue in a "julian" way
(parametric types and staged functions) gives people more flexibility in
storage format.
If you don't like the idea of adding a row/column trait to arrays, then the
exact same thing can be accomplished with a TransposeView{T,N}, but I
suspect it will be more complicated to implement well.
The biggest downside to the additional trait is confusion for new users,
and that is something I have a hard time reconciling.
On Saturday, September 26, 2015, Scott P. Jones [email protected]
wrote:
@tbreloff https://github.com/tbreloff I very much like the idea. It
would be great to be able more easily interface with languages/libraries
that are row-major.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-143436947.
Does call-overloading change the design space at all? At times in the above there seemed to be a kind of ambiguity in the meaning of *
. In particular, when we want v::Covector * w::Vector
to return a scalar, are we taking *
actually to be "map w
under v
" instead of matrix multiplication? If so, then couldn't one similarly demand that w::Vector * v::Covector
return a scalar, since vectors are themselves linear maps over covectors?
Perhaps it would be helpful instead to overload call
and write v(w)
to denote the "map under v
" operation on w
, and similarly for w(v)
-- both would return scalars. Would this allow more leeway in accommodating semantics that array operations and linear algebraic operations share in the case of 2d arrays?
If so, then couldn't one similarly demand that w::Vector * v::Covector return a scalar, since vectors are themselves linear maps over covectors?
I think we are trying to follow the matrix and vector conventions that many people see in first-year university, which is non-commutative and has vector * transposed vector -> matrix (of rank-1, i.e. having just one (nonzero) singular value). What you wrote sounds more like an abstract linear algebra concept, where the vector and co/dual-vectors are maps from each other to some scalar, which is fine but a little different to what (IMHO) is attempted in Julia and MATLAB. I think we could interpret what you wrote as the inner product, dot()
or ⋅
acting on two covectors (which we _should_ probably define for covectors, I feel), but sometimes we also want the outer-product, and currently the non-cummutative *
allows us to write and express both behaviours.
As for the call convention v(w)
, we could equally say for the dot product we want the amount of v
in the direction of w
which suggests we use the indexing operator v[w]
. (BTW, I'm not saying this is a good language design option - just an observation!)
we could equally say for the dot product we want the amount of
v
in the direction ofw
which suggests we use the indexing operatorv[w]
.
This suggestion nearly, but does not, conform to the semantics of indexing. It also clashes with our current support for vector indexing if v
is a Vector{<:Integer}
.
Consider that for v :: Vector{T<:Real}
, v[1]
is equivalent to the dot product v⋅e₁
, where e₁
is the canonical basis vector along the first axis. Therefore, simple indexing is really the function
v[n] : n :: Integer --> y = (v ⋅ eₙ) :: T
For vector indexing, v[[1, 2]]
produces [v⋅e₁, v⋅e₂]
which is the result of projecting v
into the subspace spanned by{e₁, e₂}
, or equivalently it is the result of v' * [e₁ e₂]
.
So vector indexing is the function
v[I] : I :: Vector{<:Integer} --> y = v' * [eₙ for n in I] :: Vector{T}
The proposal to make v[w] = (w ⋅ v) v
is inconsistent with this definition because it eliminates the implicit mapping from a collection of indices n
(as specified by w
) to a collection of canonical basis vector eₙ
, which is necessary for our current indexing rules to work.
Limiting the scope to just the vector transpose for now, I think we have two options. We can either make it an error or we can introduce the special covector type. Given the first-class nature of the vector in Julia I think the former would be a very tough sell… and to do it consistently we should probably completely disallow vectors from participating in the algebra of matrices entirely.
So I took a shot at the covector. It's a lot of work, and unfortunately I will not be able to spend more time on it. I post it here in the hopes that someone will either run with it or that we will learn from the difficulties and decide to run away. But I have a branch building with transposes as views, and matrix multiplication implemented with covectors. Some selected tests pass, but only without depwarn=error
(e.g., ./julia -e 'using Base.Test; include("test/matmul.jl")'
). https://github.com/JuliaLang/julia/compare/mb/transpose
I defined two new view types to be used for transpose
and ctranspose
of matrices and vectors:
immutable MatrixTranspose{C,T,A} <: AbstractArray{T,2}
data::A # A <: AbstractMatrix{T}
end
immutable Covector{C,T,V}
data::V # V <: AbstractVector{T}
end
The C
parameter is a simple boolean to represent whether the transpose is a transpose or not. Note that the Covector is _not_ a subtype of AbstractArray
; this is a pretty strong requirement for getting dispatch to work reasonably.
RowVector
s. Regardless, the very first difficulty you hit when trying to explain this behavior is how you talk about the shape of a covector (size
is undefined). If (m,n)
is the shape of a matrix, then (m,)
can represent the shape of a vector… and if we abuse tuple notation, we can slangily describe a covector to have the shape (,n)
. This allows us to express mixed vector/matrix algebra rules with a "missing" dimension that combines and propagates somewhat sensibly:(m,n) × (n,) → (m,)
(,m) × (m,n) → (,n)
(n,) × (,n) → (n,n)
(,n) × (n,) → α
(scalar)The number of binary operations and types here lead to a huge combinatorial explosion in the number of methods that need to be defined… just for multiplication we have:
While some of these operations do align nicely with multiple dispatch and fallback behaviors, it's still a _huge_ number of methods for each operator. Entirely removing mixed matrix/vector support here would definitely help simplify things.
A[1,:]
return a covector, but it doesn't generalize well and would be quite odd to return a non-AbstractArray from a slice. It'd be great if folks could try out #13612 and specifically look for cases where the conjugate transpose of a vector causes trouble — it would be equally bad with the current vector transpose semantics and does not require Covectors to expose.Ax_mul_Bx
is a huge win. It composes very nicely with BLAS' API. In general you want to do the conjugation at the innermost levels of the algorithms, so it makes much more sense to keep this information with the arguments. For BLAS calls, a simple function can be used to look up the transpose character (ntc
).v'v
returns a scalar. You can now evaluate v'v*v
from left to right and avoid forming the matrix.T
and C
. I'm not sure if doing it the other way would be any better in terms of the number of placeholder typevars you'd need to define, but at the very least it'd match AbstractArray{T}
.*
is quite a challenge with types like ::Union{DenseArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2},MatrixTranspose{C,T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},A<:Union{DenseArray{T,1},DenseArray{T,2},SubArray{T,1,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD},SubArray{T,2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}},SubArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}
. Sure, I've written it as a typealias, but even just managing all these different type alias names is a pain (StridedMatOrTrans
, QRCompactWYQorTranspose
, etc.).r[:]
or r[1:end]
? Is it a vector or covector? I think I'd try to get away without defining this if we can. Interestingly, Matlab has very special rules for indexing row-vectors with other vectors — they try very hard to maintain their row-ness at the cost of some strange corner cases (r((1:end)')
is r(1:end)
is r(:)'
). I have scalar indexing defined in my branch for now, but perhaps that should be removed, too. That would make it clear that the Covector is only to be used with linear algebra operations that know about it.Finally, I just want to point out that most of the advantages I see here are just as applicable to the MatrixTranspose
type by itself. I think that this demonstrates that the Covector can work quite well without delving too deep into abstract algebras to enable mixed Vector/Matrix operations. It definitely adds a nice feature (the ability to use Vectors consistently with linear algebra), but I'm not sure it's worth the additional complexity.
Out of curiosity, if you still remember @mbauman, what motivated you to introduce the boolean parameter C
? Can't you just have (in pseudotraits)
transpose(::Matrix) -> MatrixTranspose
transpose(::MatrixTranspose) -> Matrix
? I'm not doubting your (exceptional) judgement here, just wanting to understand what realizations forced you to do this.
Presumably one could generalize this to a PermutedDimensionArray
type, with a tuple-parameter encoding the permutation. The Covector
type obviously has a different purpose and needs to be a separate thing.
You need some way to handle conjugate transposes (and also un-transposed conjugate arrays for (A').'
). I see three obvious ways to do this:
MatrixTranspose
type. Upon reflection, I think this is definitely the best option for interfacing with external BLAS. In terms of native JuliaBLAS, though, I'd want to make sure Julia/LLVM is able to hoist T.isconjugate ? conj(T[i,j]) : T[i,j]
out of loops.Ac_mul_Bt
and friends. It also has stronger guarantees about the branch removal optimizations Julia is able to do. But I didn't think hard about it… I just wanted to get a sketch implementation started, and I was more concerned about the Covector.MatrixTranspose
and MatrixCTranspose
. Isomorphic to a type parameter, but I find two separate wrappers types annoying. Abstract supertypes and union aliases can help, but I'd still choose the parameter over this option.I suppose there's now a fourth option with typed functions, storing any transformation function within the transpose type… but that also requires a type parameter for the function to be fast, and it's then a type parameter that isn't easily dispatched upon, either.
I wonder if we could have a ConjugateView
wrapper, with the rule conj
and transpose
ensure that ConjugateView
gets placed inside the MatrixTranspose
wrapper. I.e., for A
a Matrix
,
A' = conj(transpose(A)) = transpose(conj(A))
all produce a MatrixTranspose{ConjugateView{Matrix}}
(dropping uninformative type parameters).
Ah, yes, that's sensible, too. I tend to think of the conjugate transpose as one atomic "thing," so I missed that option. I was thinking about how non-conjugate transposes could be represented by a special subarray index type, just like reshapes.
I'm glad you guys are still working on this! This is an epic thread! Three cheers!!!
Cross-link https://github.com/JuliaLang/julia/pull/6837#issuecomment-213123832
Is this planned for 1.0?
In two issues (#18056,#18136) I was pointed to this giant thread.
So I give it a try.
Now Julia has true 1-dimensional vectors, _e.g._ mx[row,:]
is not a 1xn matrix anymore.
This is a welcome change!
But as a side effect, some existing problems became more obvious.
I was bitten by the fact that v*mx
does not work for 1-dimensional vectors.
Mathematically it should naturally work and return a 1-d vector,
when the general a*b
product is defined by contracting
the last index of the first term and the first index of the second term.
At present the method signature of the Vector-Matrix product is:
(*)(A::AbstractVector, B::AbstractMatrix) = reshape(A,length(A),1)*B
and this method is used up for the v*v'
case and not for v*mx
.
(Thanks @andreasnoack for pointing out this.)
Obviously, a single method cannot be used for both.
It seems that Julia is struggling with some Matlab-like coventions.
But in Matlab there are no 1-d vectors, just 1xn and nx1 matrices,
so many things that are natural there can be a problem here.
Julia has true 1-d vectors which should be a big advantage.
It would be nice reaching a really consistent state of its own.
Fortran is a much better and more consistent example to follow in this case.
The transpose
operation is defined only for matrices in Fortran,
creating a 1xn matrix from a true 1-d vector is simply not a transpose.
For matmul
see the excerpt of the Metcalf-book quoted in #18056.
I think that most of @alanedelman's original points were correct.
So here is a suggestion that would simply cure some existing problems,
while respecting the current state as much as possible:
v'
as is for creating a 1xn matrix from a true 1-d vector v
rowmx
and colmx
functions would be better, but v'
is too widespread to change itvec
function for creating a true 1-d vectorv'
is not v''
but vec(v')
, we can live with ita*b
product should always contract the last index of a
and the first index of b
*
operator should not be used either for inner or for outer productsdot
function*
should be eliminated (_i.e._ the v*v'
syntax)Losing [concise mathematical syntax for] outer products would be unfortunate. I'd rather give up vec*mat personally.
Can the existing PernutedDimsArray already handle the case where parent and view have different numbers of dimensions? If so, maybe that's already usable as the non-conjugate transpose wrapper type even for vector parents.
I did not find PermutedDimsArray in the documentation.
But I feel that instead of inventing even more data types,
only the three types of array products should be separated clearly.
Inner products are already separate.
We only need to separate normal and outer products.
Outer products would not be lost, only their syntax would change.
Please consider the v*mx
case only as a symptom of the deeper problem.
Aside from the "missing methods problem," it wouldn't be a type you'd have to worry about, it would be an implementation detail that .'
returns a lazy wrapper type (and '
a conjugate version thereof). Otherwise I don't think we could have both vec*mat
and vec*vec'
both using the same *
operator. If I saw vec*mat
in a paper it would look wrong to me, but I see vec*vec'
pretty frequently.
Thinking more about it though, I believe PermutedDimsArray does not recursively transpose its elements for arrays of arrays, so it's not quite suitable as the wrapper type to use here.
The other alternative is to disallow vector transpose altogether. I think the discussion here has been overly thorough already, and we're just waiting on a comprehensive implementation of one or both options to evaluate what the impact will look like.
I do appreciate that you were ready for a discussion while this thread is near its end.
While v*mx
might look strange to you, it is heavily used in crystallographic code.
It is also well handled by Fortran's matmul
. (See #18056)
Back to the u*v'
product.
When u
and v
are both nx1 matrices, this is a normal matrix product,
that turns out to provide the same result as the outer product.
But this is just using the matrix product to emulate the outer product.
This is seamless in Matlab's world where everything is a matrix.
In Julia we have true 1-d vectors that are closer to Fortran's world.
In Julia the transposed vector v'
is already a problem,
Fortran's choice is to forbid it, as was already suggested for Julia by others.
Fortran has no intrinsic function for outer products.
Julia readily transforms v'
to a 1xn matrix,
and performs the *
operation on a 1-d vector and a 1xn matrix.
Due to multiple dispatch it is able to do this,
but here *
is surely not a matrix product anymore.
The point where Fortran and Julia behave similarly
is that they both use a separatedot
function for inner products.
There was a discussion to bundle dot
also into the *
operator,
but luckily it did not happen.
So we have to handle the three products of linear algebra:
normal matrix product, inner product, and outer product.
At present the *
operator is a combined syntax for normal and outer products.
All I have suggested is to separate these and reach a more consistent state.
(Oh, I left out the cross product, but it is already well separated)
I don't think inner product, outer products, and matrix multiplication is a mathematically sound way of dividing/classifying products. The following is certainly a repetition of things that have been stated above by various people, but given the length of this topic, I hope it is ok to include a summary every now and then. This is my personal summary and I am certainly not an expert so correct me where I am wrong.
In an abstract linear algebra setting, the main players are vectors v
(living in a vector space V
), linear maps (acting on one vector space V
and mapping to a possibly different space W
) and composition of these maps, linear forms or covectors (living in a dual space V*
and mapping from V
to scalars), inner products (from V × V
to scalars), tensor products between vectors (i.e. kron
). Outer product I prefer to think of as a tensor product between a vector and a covector.
Of particular interest for this issue is the isomorphism between vectors and linear forms if an inner product is defined, i.e. for every f
mapping vectors v
to scalars, there exists a w
such that f(v) = dot(w,v)
for any v
. Covectors can however exist without reference to inner products (e.g. the gradient of a multidimensional function).
To represent all of those objects on a computer, you typically choose a basis and can then represent most of this things using matrix algebra. That is, you only need matrix multiplication and transposition if you are willing to be flexible with trailing singleton dimensions (vectors as nx1 matrices, scalars as 1x1 matrices, etc). This has been Matlab's take, as well as the way many books and papers are written, especially in numerical linear algebra.
In that case, the aforementioned isomorphism from vectors to covectors (implicitly assuming the Euclidean inner product) corresponds to taking the transpose (Hermitian conjugate in complex case) of the matrix representation of a vector. However, in the abstract sense, there is no notion of the transpose of a vector, which is probably why it is not defined in Fortran, as stated by @GaborOszlanyi . Only the transpose of a linear map is defined (this does not require an inner product and its matrix representation corresponds to the transposed matrix even in the complex case), as well as the adjoint of a linear map (this does require an inner product).
Because of the importance of type stability in Julia code, Matlab's "only matrices" (with flexible trailing singleton dimensions) approach doesn't work well. But we also don't want to transition to the fully abstract setting and keep working in the typical setting (Euclidean inner products, trivial mapping from vectors to covectors, ...). Since in the end we are writing code and want to use ASCII characters, we need to squeeze as much as possible out of the symbols *
, '
and .'
. Luckily, this is where multiple dispatch comes in handy and leads to the various proposals stated above. I made a table about this, i.e. how abstract linear algebra operations would translate to specific julia methods (not functions).
As a final note to @GaborOszlanyi , I still find no place for v*A
in all of this. This might be standard in fields where vectors are by default denoted as row matrices, but personally I think this is an odd choice. If the linear maps f
and g
act as f(v) = v*A
and g(v) = v*B
, then this means that g(f(v)) = (g ◦ f)(v) = v*A*B
which is odd since the composition order is interchanged. If any, I could interpret this as an incomplete inner product, as in the second to last row of the linked table.
Your summary is deep and convincing.
Thank you for explaining it so well.
I have just two remaining questions:
v*mx
in matmul
?There are two problems exposed by this issue:
A. Applied mathematicians are wedded to Householder notation, which implicitly makes use of two natural isomorphisms:
The problem is that neither of these isomorphisms are natural to express in Julia's types and both involve run time checks of the array shape. MATLAB's trailing singleton dimension rules can be thought of as an implementation of both of these isomorphisms.
B. A two-dimensional array can be defined recursively as an array of arrays. However, a matrix is a row of columns or column of rows, but never a row of rows or column of columns. The fact that matrices cannot be defined recursively highlights the different tensor structure of matrices and n-dimensional arrays. Properly speaking, multidimensional arrays are a very limited kind of tensor and lack the full machinery necessary to implement the latter. Because a contraction strictly speaking is an operation over a pairing of a vector space with its dual, it is a misnomer to talk about contracting the indices of multidimensional arrays, which never invoke the concept of dual vector space. Most people do _not_ want the full machinery, which requires worrying about co-/contravariant or up/down indexes with row/column nature. Instead most people want arrays to be plain old containers, fully contravariant in all dimensions, _except in two dimensions_, whereby most users want to think of two-dimensional arrays as having the algebra of matrices (which are down-up tensors), and never the down-down, up-up, or up-down tensors. In other words, two-dimensional arrays want to be special cased.
I could still be persuaded otherwise, but here is my current 3-part proposal:
a) Disallow transposing vectors entirely, requiring users to explicitly convert vectors to column matrices in order to write Householder-style expressions like u'v
, u*v'
, u'*A*v
and u'*A*v/u'v
. All of these expressions can be built from just three elementary unary and binary operators: matmul, matrix transpose, and matrix division. In contrast, if u
and v
are true vectors, then it is not possible to give subexpressions likeu'
or u'*A
meaning without introducing a special TransposedVector
type, which as a logical consequence requires multidimensional arrays to worry about up/down indices in their tensor structure, which seems like too high a price to pay.
A limitation of (a) would be that all the Householder-style expressions will generate 1x1 matrices instead of true scalars (which is still a huge improvement over u'v
returning a 1-vector), so an expression like (u'*v)*w
would still not work. In practice I do not think that "triple product" expressions like this occur often.
b) Introduce an alternate notation for the analogous operations on vectors, such as
u ⋅ v = scalarize(columnify(u)'*columnify(v))
for the inner (dot) productu ⊗ v = columnify(u)*columnify(v)'
for the outer (Kronecker) productA(u, v) = scalarize(columnify(u)'*A*columnify(v))
, an old notation for the bilinear formA(u) = A(u, u)
for the quadratic formThe expressions in (b) differ from their counterparts in (a) by automatically scalarizing the expressions for inner product and bilinear/quadratic forms, thus avoiding the formation of 1x1 matrices.
c) Make 1x1 matrices convertible to true scalars, so that code like
M = Array(Int, 1, 1, 1)
a::Int = M
could work. All that would be needed is to do the run time check for the array size with something like:
function convert{T}(::Type{T}, A::Array{T,N})
if length(A) == 1
return A[1]
else
error()
end
end
This proposal is a small modification of what was proposed by Folkmar Bornemann about two years ago. When we tried it, the cost of the run time check was not very great, and it would only be invoked on type-coerced assignment (which is actually a convert
call in disguise), not general assignment.
@jiahao, the readability of this post is limited by the hard-to-render characters. Doesn't even look right on OS X, which usually has pretty completely Unicode in its fonts.
@jiahao , I could certainly agree with most/all of that, though I would like to challenge two points:
without introducing a special
TransposedVector
type, which as a logical consequence requires multidimensional arrays to worry about up/down indices in their tensor structure, which seems like too high a price to pay.
This only seems a logical consequence to me if you want to make TransposedVector
a subtype of the AbstractArray
hierarchy, which I assumed was not the plan (unlike some LazyTranspose
type which truly is just another kind of AbstractMatrix
).
u ⊗ v
for the outer (Kronecker) product
If the goal is to not rely on implicit isomorphisms and to cleanly separate matrix operations from vector operations and more abstract products, I believe this fails for the following reason (I am not sure about the universal agreement on the following mathematical definitions):
A ⊗ B
is an operation defined on matrices, not on vectors.u ⊗ v
as the tensor product of two vectors, this would give rise to a two-dimensional tensor of the down down type. The only way to get a proper 'matrix' would be to take the tensor product of a vector with a covector. But since you want to avoid introduce these objects, it seems that it is impossible to obtain the result of this operation before columnifying the two vectors involved.u ⊗ v
is the outer product which is typically sloppily defined, and I am not sure if there is an accepted rigorous definition in terms of up and down. Some sources state it is equivalent to the tensor product of two vectors, hence the previous point. If instead you accept a definition where 'outer product' also means implicitly mapping the second vector to a covector so as to obtain a down up tensor, then there is no problem.Instead most people want arrays to be plain old containers, fully contravariant in all dimensions, except in two dimensions,
Have we considered the nuclear option? We start linear algebra over, completely removing the typealias for AbstractVector
and AbstractMatrix
and replacing them with:
abstract AbstractVector{T} <: AbstractArray{T,1}
abstract AbstractMatrix{T} <: AbstractArray{T,2}
# and we could introduce:
abstract AbstractCoVector{T} <: AbstractArray{T,1}
I understand there is a lot of fallout, but we might end up with a clean separation between multidimensional storage arrays and linear algebra. We don't _have_ to implement the full multidimensional tensor algebra, dual vector spaces, etc. We just have to implement the bits many people want: vectors, covectors and matrices. We get inner products, outer products, covector-matrix products, matrix-vector products and matrix-matrix products. Transpose is defined on all of the above. We can have implementations of AbstractMatrix
which really use a 1D array of 1D arrays (not the default implementation, of course). We don't have to use Householder notation (which IMHO is one of MATLABs weaknesses!) but we can still get all of the conveniences of linear algebra.
I'm a bit nervous to suggest this but I believe it will allow Julia to mimic the "correct" model of what most people learn in e.g. first year university level linear algebra, with no need to fallback to Householder conventions. Making a clear distinction may also make it easier to move Base.LinAlg
into a "standard library" package, which I believe is a long term goal for Julia? It also meshes well with the idea that there will be a new 1D resizeable list-like thing that will come with the new Buffer
changes and native implementation of Array
, so this generic "list" type can replace Vector
for many of the core parts of Julia and let us load the LinAlg
package quite late while having Array
and "list" defined quite early.
There exist many algorithms that have already be "dumbed down", and which are expressed in terms of Fortran arrays and not in terms of proper linear algebra. Julia needs to be able to implement these algorithms without people having to re-discover (or coerce) a linear algebra structure on top of multi-dimensional arrays.
In my line of work, a proper linear algebra that maps tensors and co/contravariant indices etc. onto Fortran arrays is probably best. To make this work -- and to not confuse people -- I would leave alone the terms "vector" and "matrix" and "array", keeping them at the low level, and use other (fancier?) terms for everything higher level. Said higher level should also abstract storage options, either via abstract types or via parametrization. Maybe a prefix LA
is the way to go to express this:
LA.Vector
LA.CoVector
LA.Tensor{... describing co/contravariance of indices ...}
Vectors and matrices are then purely used for storage. At the low level, transposition is managed manually (same as in BLAS); at the high level, it's handled automatically.
This is close to @andyferris's suggestion, except it doesn't break backward compatibility and doesn't break Matlab/Fortran/numpy convert's expectations.
@eschnett I think earlier in this thread it was decided multi-linear algebra was going to be left for packages, rather than base Julia. I think a lot of these ideas like you suggest could be fleshed out in a new package which treats tensors, vector spaces, co/contra-variance and so-on within the type system, somewhat different to _TensorOperations.jl_ package which provides convenience functions for multiplying arrays as if they were tensors. I think it would be challenging to develop but potentially a worthwhile abstraction!
As for leaving Matrix
and Vector
alone - well perhaps the current definitions are perfect, or perhaps there is something better we can do for the folks that want to multiply matrices and transpose vectors, using standard mathematical notation. I guess it may add a small learning curve to new users, though I had hoped if the system was obvious and eloquent and an _improvement_ on other languages then it should be easy to learn.
Note that a general non-Euclidean complex vector space V
has 4 associated spaces: V
, conj(V)
, dual(V)
and conj(dual(V))
. So a general tensor than has 4 kinds of indices (typically denoted as up or down, barred or not barred). In a complex Euclidean space (e.g. quantum mechanics), dual(V) ≡ conj(V)
and conj(dual(V)) = V
. In a real (non-Euclidean) space (e.g. general relativity), V ≡ conj(V)
. In both of the latter cases, only up and down indices are needed.
In a real Euclidean space (most applications?), all of them are equivalent, and julia's plain arrays do suffice to represent tensors. So at least for real numbers, the following two operations allow to construct a complete and consistent tensor algebra.
∙
, which could be generalized to arbitrary arrays of rank N
using the rule: contract the last index of the first array with the first index of the second (or numpy's dot
convention, though I find that less intuitive), such that A ∙ B
returns an array of rank M+N-2
if A
and B
had rank M
and rank N
. ⊗
: A ⊗ B
returns an array of rank N+M
Specialised down to vectors v
, w
and matrices A
,B
, this allows to write all of the following:
v ∙ w
-> exceptionally returns a scalar instead of an rank 0 arrayA ∙ v
A ∙ B
v ⊗ w
v ∙ A
While one might want to use *
for ∙
, the pitfall is that the above definition of ∙
is not associative: (A ∙ v) ∙ w ≠ A ∙ (v ∙ w)
as the latter would not even be defined.
But then again, that's only if you are willing to ignore complex arrays.
As for a completely general implementation of tensors, I started (and abandoned) this a long time ago, before there were stack allocated tuples, generated functions and all these other goodies that would probably make it more feasible today.
Interesting approach, @Jutho. Seems intuitive enough. I'm _guessing_ that these definitions could be added regardless of what we do with *
and '
, and I would support that.
But then again, that's only if you are willing to ignore complex arrays.
Manually inserted conj()
fixes that. And I don't think that Julia wants to ignore complex matrices, does it?
I noticed a couple more things about having AbstractMatrix
as a special subtype rather than a type alias:
matrix[:,i]
-> AbstractVector{T}
matrix[i,:]
-> AbstractCoVector{T}
array_2d[:,i]
-> AbstractArray{T,1}
array_2d[i,:]
-> AbstractArray{T,1}
This is pretty cool - we can get column and row vectors from indexing a matrix. If it's just a 2D storage container we get a 1D storage array. Should cover all use cases! And it still obeys the AbstractArray
interface with APL slicing rules since both AbstractVector{T} <: AbstractArray{T,1}
and AbstractCoVector{T} <: AbstractArray{T,1}
.
One "interesting" fact is
array_3d[:,:,i]
-> AbstractArray{T,2}
matrix(array_3d[:,:,i])
-> `AbstractMatrix{T}
so you would have to manually wrap it as a matrix if you want to do matrix multiplication with the result. Is this a plus or a minus, I don't know? But this seems to be one potential complication and touches on what @eschnett mentioned about making it easy for users from other languages which conflate storage and linear algebra.
This may be a silly question, but @jutho mentioned earlier that writing code was limited by ASCII. Why in the world are we still limiting ourselves to a 7-bit set of characters developed in 1963, and last updated on 1986 (30 years ago)? This was an era when the famed 640KB was the maximum RAM available on a PC in 1981. In today's computer market, we now commonly have 64 bit processors with 32GB of RAM on sale (50,000X the previous max) and we are nowhere close to the theoretical limit for 64 bit processors. So, why are we still limiting ourselves to a character set developed 40 years ago?
We can use unicode, just keep in mind that unfortunately the size of keyboard didn't grow by a similar factor in the past 40 years neither does the number of fingers a normal person has (in the past 10000+ years afaict).
IMHO, ASCII should be put to rest. A special, mathematical keyboard for fast, math symbol programming is actually a good idea! Is there a good reason not to use more symbols that come with UTF? Can you justify the pains of trying to juice everything you possibly can out of ASCII?
@esd100: Please do not use this GitHub issue for such highly speculative discussions.
@Johnmyleswhite. I'm not sure what you mean by "speculative". Are you sure that's the word you meant to use? I thought the use of ASCII vs something else was relevant to the discussion, since it seems that the language, operators and functionality are tied to the symbols used. I'm not an expert by any means but it seems like an issue with discussing, or at least being clear on, related to the functionality being addressed.
I mean is there any reason we can't define the language the way we want it (with the foundational goals in mind: ease of use, speed)? Wouldn't the use of special symbols/ words make the language more rich and beautiful?
@esd100 please note that e.g. @Jutho's most recent comment (using 2 unicode symbols) was well received and @yuyichao agrees that we can use unicode. There are other proposals about for introducing more unicode-only operators (e.g. the compose symbol to compose functions, #17184). I don't think people are disagreeing with you (though we do have practical considerations to keep in mind) but if you have _specific_, on-topic suggestions please let us know.
I'm a bit confused by this new behavior in v0.5
which I believe arose out of this discussion:
julia> [ x for x in 1:4 ]' # this is fine
1×4 Array{Int64,2}:
1 2 3 4
julia> [ Symbol(x) for x in 1:4 ]' # bit this isn't? What is special about symbols?
WARNING: the no-op `transpose` fallback is deprecated, and no more specific
`transpose` method for Symbol exists. Consider `permutedims(x, [2, 1])` or writing
a specific `transpose(x::Symbol)` method if appropriate.
How do I make a (non-numeric) row vector out of a list comprehension? It has to be a row vector. (Would this be better as a separate issue or discussed elsewhere? Wasn't sure where to post...)
You can use reshape: reshape(v, 1, length(v))
. Maybe this should also be mentioned in the deprecation warning? I think the idea is that transpose is a mathematical operation and thus should only be defined for mathematical vectors/matrices.
It's mentioned in the depwarn: permutedims(x, [2, 1])
.
permutedims
does not work for vectors. See this brand new issue: #18320
I think the idea is that transpose is a mathematical operation and thus should only be defined for mathematical vectors/matrices.
This makes no sense to me. Is this up for discussion? I would really prefer transpose to work on non-numeric arrays unless there is a really good justification.
I think the idea is that transpose is a mathematical operation and thus should only be defined for mathematical vectors/matrices.
This makes no sense to me. Is this up for discussion? I would really prefer transpose to work on non-numeric arrays unless there is a really good justification.
I think it relatively complex to satisfy what you want _and_ make sense for mathematics. In a mathematical sense, transpose is often defined by swapping vector spaces and their dual spaces of a vector or matrix (well, there's some complication with with transpose vs conjugate transpose, as well). For a matrix M
of regular numbers, we have the transpose as permutedims(M, (2,1))
, but in general you may define e.g. a "block" matrix in terms of sub-matrices like this:
M = [A B;
C D]
where A
etc are matrices themselves. My understanding is that Julia likes to have
M' = [A' C';
B' D']
which is what you would do using pen-and-paper mathematics and is therefore "desirable syntax".
This means the elements of a matrix have to accept '
and .'
. For numbers these are defined as complex conjugation and no-ops respectively. IMHO I think this is a pun of convenience, but it works and, most importantly, it is implemented with _simple_ rules ("transpose is recursive" - as opposed to needing BlockMatrix
classes and so-on). But this pun on transpose has been removed from non-number types in 0.5, because it doesn't make much sense. What's the transpose of a Symbol
?
If you have _data_, not numbers, then using permutedims
is perfectly well defined in its meaning and behaviour: it is non-recursive. Using a mathematical pun like .'
may save you a few characters typing - but it will make a _lot_ more sense for somebody else (who might not be extremely familiar with mathematics or MATLAB) to read your code if you are using reshape
and permutedims
as necessary. _To me, this is the_ most _important point_.
IMHO this very long issue is all about making a mathematically consistent interface for linear algebra, and it seems natural to me that the result will be less mathematical puns for arrays of data.
Thanks for the thoughtful response. I still disagree pretty strongly
But this pun on transpose has been removed from non-number types in 0.5, because it doesn't make much sense. What's the transpose of a Symbol?
I'm not sure defining the transpose of a Symbol
as a no-op makes any less sense than defining transpose on a real number. I get that the "pun" is convenient, but it seems like the "proper" thing to do would be to have transpose only defined for vectors and matrices and have transpose(A::Matrix{Complex})
apply conj
as part of its execution.
Using a mathematical pun like
.'
may save you a few characters typing - but it will make a lot more sense for somebody else (who might not be extremely familiar with mathematics or MATLAB) to read your code if you are using reshape and permutedims as necessary. To me, this is the most important point.
I agree that .'
should be used judiciously and that a more explicit call to transpose
is often better. I think reshape
and permutedims
can require a non-trivial amount of cognitive overhead to read and to code in the first place. Be honest, which is quicker to parse:
transpose([ f(x) for x = 1:length(A) ])
reshape([ f(x) for x = 1:length(A) ], 1, length(A))
Even in simple cases like this you need to bounce from the start of the line (to read reshape
) all the way to the end (to read length(A)
) to understand what is going on. (You then likely work your way back to the middle to understand why length(A)
is there in the first place.)
To me, the bigger problem is that if I'm a new Julia (meaning I've likely used numpy and MATLAB before) and I see that this works:
[ x for x = 1:10 ]'
I'm naturally going to assume that the same thing will work for arrays of strings, symbols, etc. I'm not going to read through long online discussions to figure out the semantics of .'
-- I'm going to try some things out and generalize/infer by past experience. Perhaps having a better error message would help here.
Overall, I don't see how keeping the no-op transpose on Symbol
and other non-numeric inputs interferes with the nice mathematical framework proposed here (a worthy goal!). But keeping the no-op seems harmless.
But keeping the no-op seems harmless.
You can't really define anything meaningfully for Any
because everything is a subtype of Any
. The definition transpose(x)=x
is plain wrong for any kind of matrix and to avoid some of the silent errors we had to add these definitions. So it is a tradeoff between the convenience of allowing the relative weird syntax '
for a completely non-mathematical operation and avoiding silent errors.
I'm not sure defining the transpose of a Symbol as a no-op makes any less sense than defining transpose on a real number. I get that the "pun" is convenient, but it seems like the "proper" thing to do would be to have transpose only defined for vectors and matrices and have
transpose(A::Matrix{Complex})
applyconj
as part of its execution.
I don't completely disagree, but we would need to implement some kind of BlockMatrix
or have special methods defined for Matrix{M<:Matrix}
. I'm not sure if either is a popular idea or not? (This is a serious question to those who have been following as it could simplify some of these related issues).
Be honest, which is quicker to parse:
transpose([ f(x) for x = 1:length(A) ]) reshape([ f(x) for x = 1:length(A) ], 1, length(A))
The second, because I don't relate to/like Julia's current transposition of vectors (clearly, _I take vector transposes_ too _seriously_ :) ) If I had to do the second I would write verbosely rowvec = reshape(colvec, (1, n))
, or probably just [f(x) for _ = 1:1, x = 1:n]
to force the comprehension to make the correct shape to begin with, or if you really like .'
then map(f, (1:n).')
and f.((1:n).')
currently work too.
it is a tradeoff between the convenience of allowing the relative weird syntax ' for a completely non-mathematical operation and avoiding silent errors
If this was causing silent errors and other problems, then I guess I'm probably going to lose this argument. (I don't see why it would cause errors -- but I believe you.) On the other hand....
we would need to implement some kind of BlockMatrix or have special methods defined for Matrix{M<:Matrix}
I might be wrong, but I think you'd just need to do the second, which seems like a reasonable option to me. But this is starting to get above my paygrade.
[f(x) for _ = 1:1, x = 1:n]
I forgot about this! This is probably what I'll end up doing. Overall, I still disagree with your tastes for readable code, but to each his/her own! ¯\_(ツ)_/¯
clearly, I take vector transposes too seriously
Yes. I think so. 😉
This (https://github.com/JuliaLang/julia/issues/16790) would also make using reshape
in place of transpose
slightly more palatable to me.
I might be wrong, but I think you'd just need to do the second (edit: have special transpose methods defined for
Matrix{M<:Matrix}
), which seems like a reasonable option to me.
Unfortunately now we get back to the distinction between data and linear algebra yet again. You want linear algebra block matrices to have recursive transpose but generic 2D data arrays of 2D data arrays to not have recursive transpose... but while Matrix{T}
and Array{T,2}
are the same thing we can't do that.
This (#16790) would also make using reshape in place of transpose slightly more palatable to me.
True!!
You want linear algebra block matrices to have recursive transpose but generic 2D data arrays of 2D data arrays to not have recursive transpose
This doesn't seem like something I'd obviously want... Why not just have two different functions to deal with these different kinds of transposes? I guess I missed the memo on there being a very strict distinction between linear algebra objects and data objects.
Reading through this whole thread seems like a daunting task, but perhaps I should do so before commenting further. I don't want to add uninformed noise.
I guess I missed the memo on there being a very strict distinction between linear algebra objects and data objects.
There is not a strict distinction between Linear Algebra objects, and Data object.
Not in current implementation, nor in ideal use.
If there were, then modifying the size (with push!
) or even values of Linear Algebra objects would not be supported (and such users would be using StaticArrays.jl etc), and broadcast
would only be supported on Linear Algebra objects.
Data Objects would be modifiable, expandable, and would support map
, (and reduce
, and filter
) but not broadcast
.
But we don't live in the world where people are either thinking binary of Data object, vs Linear Algebra objects.
Thus this 2.5 year, 340 comment thread.
Re the no-op
transpose.
We could add, very high up the type hierarchy a Scalar
and Nonscalar
abstract type.
Scalars
all fallback to a no-op transpose.
Nonscalars
have no fall back, (but for right now fall back to a deprecation warning)
Numbers, characters, Strings and maybe even Tuples would be Scalar
and would have no-op transpose defined. Some scalars, eg Complex Numbers would over-write this definition of transpose.
Arrays (matrix, vectors and others) would be subtypes of Nonscalar
and would have recursive transpose.
I'm not sure if I like that, or not.
The recent spate of posts rehashes discussions about _matrix transpose_ and "scalar" types in #18320 #13171 #13157 #8974.
I really would like to dispel the notion that the transpose(x)=x
no-op fallback is harmless. In my mind, a fallback should still produce the correct result, just more slowly than an optimized algorithm. It is dangerous when the fallback method silently computes the wrong answer, because it means that the fallback doesn't have the correct semantics: transpose(x)
means different things depending on the type of x
.
The no-op transpose fallback is not only wrong for matrices of matrices, but it is wrong for all matrix-like objects that are not subtypes of AbstractMatrix
(which by #987 means that they have explicitly stored entries, _not_ that they have the algebra of matrices). Here is one poster child example (which we have quite a few of):
julia> A = rand(5,5); F = qrfact(A); R = F[:R]; Q = F[:Q] #Q is an example of a matrix-like object
5x5 Base.LinAlg.QRCompactWYQ{Float64,Array{Float64,2}}:
-0.518817 0.0315127 0.749223 0.410014 -0.0197446
-0.613422 -0.16763 -0.609716 0.33472 -0.3344
-0.0675866 0.686142 0.0724006 -0.302066 -0.654336
-0.582362 -0.0570904 0.010695 -0.735632 0.341065
-0.104062 0.704881 -0.248103 0.295724 0.585923
julia> norm(A - Q*R) #Check an identity of the QR factorization
8.576118402884728e-16
julia> norm(Q'A - R) #Q'A is actually an Ac_mul_B in disguise
8.516860792899701e-16
julia> Base.ctranspose(Q::Base.LinAlg.QRCompactWYQ)=Q; #Reintroduce no-op fallback
julia> norm(ctranspose(Q)*A - R) #silently wrong
4.554067975428161
This example shows that just because something is a subtype of Any
doesn't mean that you can assume it is a scalar and has a no-op transpose. It also illustrates why the parent issue in the OP is so hard to resolve. For a matrix-like non-array object Q, Q'
has no real meaning as an array operation but it has an unambiguous algebraic meaning: expressions like Q'A
are perfectly well-defined. Other people who work with arrays and not linear algebra simply want non-recursive permutedims and don't care about matrix-like non-arrays. Nevertheless, the fact remains that you cannot have a consistent spelling of both the algebraic and axis-swapping semantics for all types.
Maybe I'm being dense, but I would have thought the comparison would be this:
julia> A = rand(5,5); F = qrfact(A); R = F[:R]; Q = F[:Q]
julia> Base.ctranspose(Q::Any) = Q;
WARNING: Method definition ctranspose(Any) in module Base at operators.jl:300 overwritten in module Main at REPL[6]:1.
julia> norm(ctranspose(Q)*A - R) # still works fine
4.369698239720409e-16
Overwriting transpose
in this way seems to allow transpose([ :x _=1:4 ])
- ie what I posted about. I would have thought that as long as you implement transpose
/ ctranspose
correctly for everything that needs it (e.g. QRCompactWYQ
) the fallback would never be called (since a more specific call can be made).
Your code is not calling the ctranspose
method you wrote (you can verify this with @which
). It's calling a different fallback method in Julia v0.5 (which doesn't exist in v0.4) which is essentially doing ctranspose(full(Q))
. This other fallback is correct, but defeats the very reason why we have this fancy Q-type (so that multiplying with it can be done accurately). My comment that fallbacks should be correct still stands.
Yes, as long as you implement transposition correctly for everything that
needs it, the fallback of course does no harm. The harm is that you don't
get a no method error if you forget to do that, but silently the wrong
result.
Thanks @toivoh that made it click for me. I really appreciate the explanations.
But if you define a fallback transpose(X::AbstractMatrix)
and transpose(X::AbstractVector)
functions then presumably you would always get the correct result (even if it was slow... calling full
for example) no? And you could always write a more specialized function to do it better/faster. Then transpose(::Any)
shouldn't ever cause silent errors other than the "puns" mentioned earlier (i.e. when handling Complex
numbers... maybe other scalar use cases that I don't know about?)
For historical reasons QRCompactWYQ <: AbstractMatrix
, but according to #987 #10064 this subtyping relation is not correct and should be removed.
Another example of a matrix-like non-array is IterativeSolvers.AbstractMatrixFcn
, which is not a subtype of AbstractMatrix
. For this type the fallbacks you reference would never be dispatched on, which again was my main point.
We should continue this discussion in https://github.com/JuliaLang/julia/issues/13171. The issue is actually mainly about vectors with number like elements.
Someone from "team linear algebra" needs to step up and commit to doing something about this for 0.6 or it's going to get bumped yet again.
So to reboot, what is the actual plan?
My line of thinking brings me to: transpose(v::AbstractVector) = TransposedVector(v)
where TransposedVector <: AbstractVector
. The only semantic thing that will distinguish TransposedVector
from AbstractVector
will be how it behaves under *
(and all the A_mul_B
s, \
, /
, ...). I.e. it's a decorator to determine which indices to contract over under *
(etc...). Transposition would be a linear algebra concept and if you want to rearrange an array of "data", reshape
and permutedims
should be encouraged.
The only semantic thing that will distinguish
TransposedVector
fromAbstractVector
will be how it behaves under*
So v'==v
but v'*v != v*v'
? While this may make sense, it also looks potentially confusing.
So
v'==v
butv'*v != v*v'
? While this may make sense, it also looks potentially confusing.
IMO this is unavoidable (though possibly unfortunate).
The dual of a vector is also a vector. The transpose is also still a one-dimensional data structure with well defined elements which should be able to get, mutate, map, reduce, etc.
Unless we separate linear algebra from arrays (e.g. make Matrix{T}
both a subtype of and an immutable wrapper of Array{T,2}
, with more (linear algebra specific) methods defined), then I'm not sure there is much choice which is consistent with both its array and linear algebra properties.
The one confusing question (to me) is whether it broadcasts like a vector, or over its second dimension. If it's size is (1, n)
then this change whole is not really doing much except asserting its like a matrix where the first dimension is known to be length 1
. The transpose of a Vector{T}
would have to remain an Array{T,2}
(i.e. Matrix
...), however the transpose of that could be a Vector
again (e.g. we could have v'' === v
).
Is that a better idea? It would be less breaking and still improve semantics w.r.t. linear algebra. EDIT: and it behaves differently w.r.t. ==
as @martinholters brings up).
To me, having TransposedVector{T <: AbstractVector{Tv}} <: AbstractMatrix{Tv}
*) with size(::TransposedVector, 1)==1
but transpose(::TransposedVector{T})::T
sounds like the sanest approach, but there has been so much debate that there probably is some good argument against this?
*) I know this is syntactically invalid, but I hope the intended semantics are clear.
Yes, upon playing with ideas in code I find I agree with you @martinholters.
I made a start at https://github.com/andyferris/TransposedVectors.jl. This can all be achieved with only a small amount of type-piracy and method overides from a package outside of base, and I made a package compatible with Julia 0.5. If it turns out well, we could maybe port it to Base? (Or else learn some lessons).
One big question is whether we can live without a CTransposedVector
type for complex conjugation, or however that will be handled.
I wouldn't use a separate CTransposedVector
, because that combines two concepts (conjugation and reshaping) into one object. It's much more composable to keep those two concepts implemented as separate entities. For example, with MappedArrays.jl it's as easy as
conjview(A) = mappedarray((conj,conj), A)
and you get great performance too.
Right, Thanks Tim. I was thinking of/hoping for something like that - my only concern was how you might ensure the two wrappers always appear in the "correct" order, but I think what I've implemented so far can handle that. Also we would need to support BLAS for all these combinations, which I've managed to avoid touching so far.
One beautiful thing is that a seperate change which defaults conj
to conjview
would be independent to this change (I think they could both be tinkered with in independent packages and they would compose together correctly). Is there any appetite to make conj
a view? Are we waiting for inline non-isbits inmutables to make such views free? Or no need to wait?
@andyferris: I think your approach is a good one – thanks for pushing it forward. If the implementation works out well, we can just use that code in Base. One thing to keep in mind is that we may also want TransposedMatrix as well for https://github.com/JuliaLang/julia/issues/5332. Beyond 1 and 2 dimensions, I don't think transposes make sense so it's a finite set of types.
Now that we have continued the conversion I'd mention again that we should try to avoid a transposed vector type. It has been mentioned a couple of times above but I doubt that anyone will ever again be able to read through all comments of this issue. Introducing a transpose type for vectors really complicates things for almost no benefit. If you really think that a vector has a direction then make it matrix and things will just work. It doesn't make much sense to have linear algebra vectors that are different from matrices and then introduce a new wrapper type to make the vectors behave like 1xn
matrices.
Right now the asymmetry is that x'
promotes x
to a matrix whereas A*x
doesn't promote the x
to a matrix. If linear algebra operations were just for 2D then A*x
should promote and x'x
would be a 1x1
matrix. Alternatively, we could let vectors be vectors and therefore also A*x
be a vector but then x'
should be a noop or an error. Introducing a transposed vector will require a lot of new method definitions and the only benefit seems to be that x'x
becomes a scalar.
I think the matrix transpose is very different. It will allow us to get rid of all the Ax_mul_Bx
methods and the behavior is not debatable in the same way the behavior of vector transposes is.
I don't really see how introducing a TransposedVector type causes more problems than introducing a TransposedMatrix type.
Yes, the strong consensus somewhere in the middle of this opus was that the vector transpose is weird, and that, if it gets implemented it definitely shouldn't be a subtype of AbstractArray
— I'd even disallow size
entirely. My summary above enumerates some of the nuts and bolts challenges of including a covector — not least of which is notation (note that I'd recommend taking Tim's approach of mapping conj
over the array instead of including it as a type parameter).
But there was an even stronger call for the vector transpose to simply be an error. If that were the case, then I think it could actually live in a package.
Having vector transpose be an error seems like throwing the baby out with the bathwater. Not being able to write v'
to transpose a vector would be really, really annoying. I don't really get what's so bad about having a TransposedVector type that behaves like a row matrix aside from how it multiplies with vectors and matrices.
It depends upon how many behaviors you're wanting to address with a vector transpose. If you simply want v'' == v
, then it's ok for typeof(v') <: AbstractMatrix
. But if you want the other things we've talked about in this thread like typeof(v'v) <: Scalar
or typeof(v'A) <: AbstractVector
, then it needs to be a different, more complicated, beast.
The idea of having TransposedVector be a kind of Vector seems to be at the root of many problems. It seems like it would be much less disruptive to have it be a kind of matrix and have then size(v.') == (1,length(v))
just like we do now. The only difference would be that a double transpose gives you a vector back and v'v
would produce a scalar. If one wants to treat a transpose as a vector, one can since length(v')
gives the right answer and linear indexing works as expected.
I agree 110% with @StefanKarpinski. I toyed with making it a type of vector but it doesn't make too much sense and is rather breaking, for the kinds of reasons which were discussed earlier in this thread.
The approach of making it have size (1, n)
means in terms of Array
behaviour it behaves exactly as it does now. The _only_ operations which will distinguish it from a 1-by-N Matrix
is behaviour under '
, .'
, *
, \
. and /
.
None of those are "array like" operators. They are there purely to implement linear algebra between matrices and vectors, and come directly from MATLAB ("matrix laboratory"). This final refinement simply says that the size(tvec, 1) = 1
statically to the compiler _and_ that I want v''
to be v
. (I kind of think it as a bit like a StaticArray
where one dimension is fixed and the other is dynamically sized).
If you simply want v'' == v, then it's ok for typeof(v') <: AbstractMatrix.
Right.
But if you want the other things we've talked about in this thread like typeof(v'v) <: Scalar or typeof(v'A) <: AbstractVector, then it needs to be a different, more complicated, beast.
Why? We aren't breaking any of its array-like properties, so it can still be an array. Any dot-call will work just like before, and other vectorized ops are being removed. I think that *
is "specialized" on knowledge of the various shapes of Vector
and Matrix
and that this is because we are trying to emulate the behaviour of written linear algebra. Making v' * v
be a scalar is _very_ standard mathematics and it is my interpretation that the only reason it hasn't been so from the beginning is because it's not trivial to implement in a consistent way (and the inspiration Julia takes from MATLAB), but Stefan could clarify on that. Making the inner product a scalar is the only breaking change here to speak of (there are other very minor things) - I don't see why that alone would make it unsuitable to be an Array
(there are many types of Array
that don't have valid '
, .'
, *
, \
and /
defined _at all_, noteably rank 3+)
One of the issues I ran into last year were ambiguities; IIRC they were much simpler to resolve when the vector transpose wasn't an array.
Yes, I'm a little worried how deep that will become before I'm finished... :)
As a general note, it would be wonderful if/when we make Base
less monolithic and factor it out into standard libraries. Having a self-contained package or module that just defines AbstractArray
and Array
would make this kind of development simpler.
What was exactly the problem with @Jutho's proposal? Couldn't *
automatically (conjugate) transpose the vector on the left?
If we have *
the matrix multiplication, '
the matrix (conjugate) transposition (leaves the vectors unchanged), and two operators promote
that adds a trailing singleton and demote
that removes a trailing singleton, then we can build a right application *: (n,m) -> (m,) -> (n,)
, a left application *: (n,) -> (n,m) -> (m,)
, a scalar product *: (n,) -> (m,) -> (,)
and an outer product ×: (n,) -> (m,) -> (n,m)
such that:
(*)(A::AbstractMatrix, v::AbstractVector) = demote(A*promote(v))
(*)(u::AbstractVector, A::AbstractMatrix) = demote((promote(u)'*A)')
(*)(u::AbstractVector, v::AbstractVector) = demote(demote(promote(u)'*promote(v)))
(×)(u::AbstractVector, v::AbstractVector) = promote(u)*promote(v)'
I am not sure I would ever need to transpose a vector with these operators. Am I missing something?
Edit: Not exactly @Jutho's proposal.
@Armavica , I am not quite sure that it was exactly my proposal to assign this meaning to *
but rather to a new (unicode) operator ∙
, which is currently equivalent to dot
. Furthermore, I don't think you would be able to implement the promote
and demote
approach in a type stable manner.
The purist in me agrees with @andreasnoack that transpose of a vector itself should be avoided (I personally prefer writing dot(v,w)
over v'w
or v'*w
anytime), but I can also appreciate the flexibility of having it work in the way proposed by @andyferris . Therefore I am not going to add anything further to this discussion and support any sane attempt at getting an actual implementation running.
Right, @Jutho. Interestingly the two aren't mutually exclusive: having no transpose on vector defined by default means this functionality can legitimately live in a separate package, module, or "standard library" (i.e. without "type piracy").
(I personally prefer writing dot(v,w) over v'w or v'*w anytime)
Jutho - on the other hand, I bet you don't write |ψ> ⋅ |ψ> on a piece of paper or a whiteboard though, but rather <ψ|ψ> (and the same analogy holds for how we draw inner products with tensor network diagrams).
Of course I would love to see Dirac's braket notation as the standard formulation of linear algebra in all of mathematics or even science, and by extension in julia, but probably it's not going to happen.
Now you force me to digress, which I was not going to do anymore. I certainly agree to preferring <ψ|ψ> for the inner product, but that's independent of the fact that I don't think of <ψ| as the Hermitian conjugate of |ψ>. Hermitian conjugation is defined for operators. The natural isomorphism between vectors and their associated dual vectors (covectors, linear functionals,...) is known as Riesz representation theorem, though the former is of course equivalent to Hermitian conjugating when interpreting length n
vectors as linear maps from C to C^n, i.e. as matrices of size (n,1)
.
Of course I would love to see Dirac's braket notation as the standard formulation of linear algebra in all of mathematics or even science, and by extension in julia, but probably it's not going to happen.
Lol
Now you force me to digress, which I was not going to do anymore.
Sorry... I really shouldn't have mentioned it... :)
I don't think of <ψ| as the Hermitian conjugate of |ψ>. Hermitian conjugation is defined for operators.
True, I hadn't considered that.
The natural isomorphism between vectors and their associated dual vectors (covectors, linear functionals,...) is known as Riesz representation theorem, though the former is of course equivalent to Hermitian conjugating when interpreting length n vectors as linear maps from C to C^n, i.e. as matrices of size (n,1).
I'm trying not to digress (but I'm bad at that), but I think we get a bit of this last bit since we are associating AbstractVector
with a 1D Array
. In other words AbstractVector
doesn't mean "abstract vector" in a mathematical sense, but instead means "the numerical elements of a vector w.r.t. some predefined basis". MATLAB got out of this easily because vectors were of size (n,1)
.
As food for thought, what do you call the (antilinear) operator which takes all tensors of the form |a>|b><c|
and maps them to |c><a|<b|
? I always bundled this together with the vector dual and standard operator conjugation, as "Hermitian conjugation" but perhaps that's too blasé.
Had a conversation with @alanedelman which crystallized a few things about my position on #4774:
Covector
or RowVector
type;*
, '
, .'
, /
, \
, maybe norm
? others?) this type acts as a covector in linear algebra;size(v')
of a covector is (1, length(v))
.This approach requires classifying all generic functions in Base as linear algebraic or not. In particular, size
is an array function and doesn't make sense in the domain of linear algebra, which essentially only has vectors (which have no "shape", just a cardinality of dimensions), linear transformations (which may happen to be represented by 2-dimensional arrays), and scalars. One particular example that came up was cumsum
, which currently operates on 2-dimensional arrays as if a dimension argument of 1 were supplied. This is inconsistent with sum
, which rather than defaulting to a dimension argument of 1 returns the sum of the whole array. It also prevents cumsum
from operating on covectors in a corresponding way to how it operates on vectors. I think that cumsum
should operate on general arrays by cumulatively summing in N-dimensional column-major order. More generally, dimension arguments should not default to 1.
I would be slightly worried if there were no interface for identifying covectors. Most operations, e.g. size and ndims, would say they are just like other 2d or 1d arrays. Only by attempting e.g. a dot product do you see a difference. AFAICT you'd have to check whether the object is a RowVector, but it's very rare to require an object to have a specific type in order to have a certain general property.
@StefanKarpinski I agree with all of that. Especially that functions are either "array" operations or "linear algebra" operations.
I made a start at a size = (1,n) TransposedVector
here which is exactly as you say. It's taken me a while to get good coverage of tests and all combinations *
, \
, /
for each possible c
and t
method and the mutating methods, while avoiding ambiguities with other methods in base. It's slow but systematic work, and from that I though we could pull it to base when its ready (possibly renaming things).
Their is a distinction with Covector
where that it really should be the conjugate transpose (or an even more general transformation, in the general case!), while a RowVector
or TransposedVector
is conceptually simpler.
@JeffBezanson Is there something we can do with indices()
to have a "singleton" dimension?
but it's very rare to require an object to have a specific type in order to have a certain general property.
Right, it would be cool if this was a trait or something. I was hoping that we could disentangle linear algebra from arrays more generally, but that is a huge change. and probably couldn't be neat without nice trait syntax implemented in Julia. I think here the issue is that we are mapping 3 things (vectors, covectors and matrices) onto two types (AbstractArray{1}
and AbstractArray{2}
), and so one of them (covectors) becomes a special subtype of the other.
I would have put AbstractTransposedVector
into the package if I could have thought for a place where someone would need anything other than the basic "wrapper" implementation.
@JeffBezanson: I don't get your concern. The idea is that it behaves just like a 1×n 2d abstract array except for linear algebra operations, for which it behaves like the dual space of column vectors (which is isomorphic to 1×n 2d abstract arrays). If you want to check for a covector, you can check the type e.g. by dispatching on it.
An update on my attempts at tackling this:
TransposedVectors.jl is now, I believe, "feature complete". It contains all the machinery necessary to do what @StefanKarpinski has spoken of here - the transpose of a vector is a wrapper of a vector which behaves as a 2-dimensional, 1xn sized abstract array - but for linear algebra operations (*
, /
, \
, '
, .'
and norm
) it behaves as a row vector (or dual vector).
You can check it out like so:
julia> Pkg.clone("https://github.com/andyferris/TransposedVectors.jl")
...
julia> using TransposedVectors
WARNING: Method definition transpose(AbstractArray{T<:Any, 1}) in module Base at arraymath.jl:416 overwritten in module TransposedVectors at /home/ferris/.julia/v0.5/TransposedVectors/src/TransposedVector.jl:28.
WARNING: Method definition ctranspose(AbstractArray{#T<:Any, 1}) in module Base at arraymath.jl:417 overwritten in module TransposedVectors at /home/ferris/.julia/v0.5/TransposedVectors/src/TransposedVector.jl:29.
WARNING: Method definition *(AbstractArray{T<:Any, 1}, AbstractArray{T<:Any, 2}) in module LinAlg at linalg/matmul.jl:86 overwritten in module TransposedVectors at /home/ferris/.julia/v0.5/TransposedVectors/src/mul.jl:9.
WARNING: Method definition At_mul_B(AbstractArray{#T<:Real, 1}, AbstractArray{#T<:Real, 1}) in module LinAlg at linalg/matmul.jl:74 overwritten in module TransposedVectors at /home/ferris/.julia/v0.5/TransposedVectors/src/mul.jl:37.
WARNING: Method definition Ac_mul_B(AbstractArray{T<:Any, 1}, AbstractArray{T<:Any, 1}) in module LinAlg at linalg/matmul.jl:73 overwritten in module TransposedVectors at /home/ferris/.julia/v0.5/TransposedVectors/src/mul.jl:64.
julia> v = [1,2,3]
3-element Array{Int64,1}:
1
2
3
julia> vt = v'
1×3 TransposedVectors.TransposedVector{Int64,Array{Int64,1}}:
1 2 3
julia> vt*v
14
julia> vt*eye(3)
1×3 TransposedVectors.TransposedVector{Float64,Array{Float64,1}}:
1.0 2.0 3.0
There may be some performance degradations - benchmarking would be useful. In some cases it will be making less copies (the transpose is lazy) and in other cases it makes more copies (sometimes a conjugated copy of a vector (never a matrix) is created in Ac_mul_Bc
for example). And the wrapper itself has a slight cost until we inline mutable fields in immutables (fortunately for me, this already works well with StaticArrays.jl :smile:). There is also the issue of making sure it is some kind of StridedArray
when appropriate.
If people like this approach we can see about making a PR soon to migrate code to Base
(the package already has unit tests and all ambiguities resolved). (also, @jiahao had mentioned wanting to experiment with a 1-dimensional abstract array version of a dual vector, not sure if any progress was made?)
Do people think that such a PR would make any sense in v0.6 without a wrapper type for transposed matrices? While the transposed vectors are a semantic change, the transposed matrices are more of an optimization, so I suppose they could go in separately. Of course, it might seem "odd" if some transposes are views and some are copies - opinions? Another point to consider is that once both transposed matrices and vectors are in, a lot of work would need to be done to remove all the At_mul_Bt
functions, replaced by methods on *
, to complete the transition (though the simplification will be rewarding!) - I doubt anyone is able or willing to do all of that by the end of this month...
Great work, @andyferris! Did you experiment with a generic lazy Conjugate
wrapper at all?
@andyferris I kicked the tires, and like this quite a bit. Seems strictly an improvement, and I am hoping performance issues can be handled as we find them. Let's see what others have to say.
Thanks, guys :)
Did you experiment with a generic lazy
Conjugate
wrapper at all?
Not yet, though it might be possible without too much effort, but to be honest I was worried about not finishing this much by the end of December. Looking to the future, I was considering that in the end we might have the following "unionall" types to dispatch on for linear algebra:
AbstractVector
Conjugate{V where V <: AbstractVector}
(or Conj{V}
, maybe even ConjArray
? Of course it would accept arrays of any dimension)TransposedVector
(or RowVector
?)TransposedVector{Conjugate{V where V <: AbstractVector}}
AbstractMatrix
Conjugate{M where M <: AbstractMatrix}
TransposedMatrix
(or simply Transpose{M}
?)TransposedMatrix{Conjugate{M where M<:AbstractMatrix}}
(plus all the given traits of the underlying storage, such as what we now call DenseArray
and StridedArray
- I'm hoping #18457 might make it slightly easier to express these too).
Apart from naming, does that sound like a reasonable plan?
As for the immediate bikeshedding, for a PR to Base of what is currently in the package, do we prefer TransposedVector
, RowVector
, a generic Transpose
wrapper for both vectors and matrices, or something else?
I think the behavior provided by @andyferris' implementation is pretty good, and an improvement. But let me try to express my concern a bit better.
The problem is with defining new array types. For example DArray
is a subtype of AbstractArray
, so a DArray
can also be an AbstractVector
or an AbstractMatrix
. Ideally, we would just extend the Vector/Matrix distinction to Row/Column/Matrix, so a DArray
could be an AbstractRow
, AbstractCol
, or AbstractMatrix
. The type hierarchy would be something like
AbstractArray
AbstractVector
AbstractRowVector
AbstractColumnVector
AbstractMatrix
...
Was talking with @jiahao about this yesterday.
There may be some precedence here, in that not all AbstractVector
s can be used as columns, e.g.:
julia> isa(1.0:10.0, AbstractVector)
true
julia> randn(10,10) * 1.0:10.0
ERROR: MethodError: no method matching colon(::Array{Float64,2}, ::Float64)
which could address the issues I had back in https://github.com/JuliaLang/julia/issues/4774#issuecomment-59428215
That is just missing parentheses around the range?
That's just a precedence issue:
julia> randn(10,10) * (1.0:10.0)
10-element Array{Float64,1}:
-22.4311
⋮
I agree that the optimal solution here seems to require traits, but I don't think that should hold up @andyferris' work.
ah, good point. Would love to have incorrect whitespace precedence be a syntax error, similar to Fortress.
The MIT group pointed out that one way to implement the type hierarchy I described above is to add a type parameter to the array hierarchy:
abstract AbstractArray{T,N,row}
type Array{T,N,row} <: AbstractArray{T,N,row}
end
typealias AbstractVector{T} AbstractArray{T,1}
typealias AbstractRowVector{T} AbstractArray{T,1,true}
typealias AbstractColVector{T} AbstractArray{T,1,false}
typealias AbstractMatrix{T} AbstractMatrix{T,2}
typealias Vector{T} Array{T,1,false}
typealias Matrix{T} Array{T,2,false}
I haven't worked out all the implications of this --- probably the worst of it is that Array{Int,1}
becomes an abstract type --- but this seems to get the type hierarchy and needed abstractions pretty much exactly right.
For what it's worth, this is exactly a use-case for incompletely parameterized supertypes; they can become implicit default parameters.
abstract AbstractArray{T,N,row}
type Array{T,N} <: AbstractArray{T,N}
end
isrow{T,N,row}(::AbstractArray{T,N,row}) = row
isrow{T,N}(::AbstractArray{T,N}) = false
julia> isrow(Array{Int,2}())
false
Of course, it makes dispatch a little messy… and it may not be worth supporting. It's just a spit-ball.
To me that seems equivalent to defining
type Array{T,N} <: AbstractArray{T,N,false}
end
What we might want here is some kind of "default type parameters" though, such that making the type Array{X,Y}
where X and Y have no free variables actually gives Array{X,Y,false}
.
Another idea: preserve Householder notation for inner products v'v
and left covector multiply v'A
by making infix '
its own operator instead of parsing these as v'*v
and v'*A
. Coupled with making v'
the identity, v*v
an error, and v*A
an error, this could give much of what is desired. You would have to write outer products as outer(v,v)
.
In such a scheme, v'
would be a noop or even an error – since there's no reason to transpose a vector in such a scheme, it would only make sense for a matrix.
@JeffBezanson I agree having an AbstractRowVector
would be best (but I honestly can't think of a use case, so I didn't implement it in the package). I'd like to also point out this could live as a subtype of AbstractMatrix
. The reason I've moved in this direction is that broadcast
seems to be more of a core concept to Julia than linear algebra is, and people will expect a row vector to be, well, a row!
Of course having RowVector <: AbstractMatrix
is an unfortunate use of terminology! I think this results from having 2D arrays and abstract matrices being given the same name.
I've said this before, far far above, but since this issue is so long I'll restate it: in a generic language like Julia, the "array of data" property has to be the primary consideration for AbstractArray
. Answering how you want broadcast
to behave for a "transposed" vector tells you whether it is 1D or 2D. If you want to think in terms of rows and columns, then 1 x n
row vectors makes most sense. If you want to consider dual vectors, then 1D makes most sense - and I'm happy with that too! But taking the dual of a vector is more complicated than reshaping the data (e.g. we have to at least support conjugation).
I would guess the "row" picture would match the expectations of "typical" programmers, while people with advanced mathematical training would possibly get better use out of dual vectors as it is a better abstraction (though I'm sure they would be able to sympathize with those with only basic linear algebra knowledge). So - which audience does Julia target - people with more mathematical finesse than many typical MATLAB users, or people with less?
(My take was that as Julia was aiming to be a "generic" programming language, the latter audience was the target).
Since we are discussing possible type trees - in the future, with Buffer
and resizeable "lists", I'm imagining an abstract tree of interfaces that goes something along the lines of
AbstractArray{T,N} # interface includes broadcast, Cartesian, getindex, setindex!, etc.
AbstractArray{T,1}
AbstractList{T} # resizeable, overloaded with `push!` and so-on
AbstractVector{T} # non-resizeable, overloaded for *, /, \, ', .', etc
AbstractArray{T,2}
AbstractRowVector{T} # non-resizeable, overloaded for *, /, \, ', .', etc
AbstractMatrix{T} # non-resizeable, overloaded for *, /, \, ', .', etc
Of course, we could equally have AbstractDualVector{T} <: AbstractArray{T,1}
instead of AbstractRowVector
.
Having a flexible, concrete Array
type to fit all of these would be difficult (and perhaps unnecessary). Traits would certainly let us express these differences in supported interfaces more easily.
Having AbstractVector
being both a C++ std::vector
and a linear algebra vector seemed a little cheeky to me :) (especially as the array interface means that it can never truly be an "abstract vector" in the linear algebra sense (e.g. basis free))
Yes, it would be good to separate the resizing behavior. I'm on board for that.
That type hierarchy seems to imply that we'd have separate "matrix" and "2-d array" concrete types in Base. We could try to get around it e.g. by having the constructor for Array{T,2}
actually return some other matrix type, but it seems pretty ugly. Maybe I'm misunderstanding though.
That type hierarchy seems to imply that we'd have separate "matrix" and "2-d array" concrete types in Base.
Right... I think to do this nicely, we'd need traits. Keep in mind I called it an "abstract tree of interfaces". Attempting this at the moment would necessitate something ugly like you said. But we could probably insert AbstractList <: AbstractVector
(and move associated methods) as soon as Buffer
is done.
At the moment the connection between supported interfaces and the type tree is pretty loose. It's hard to figure out at dispatch if an (abstract) array is mutable (i.e. supports setindex!
), whether it is resizeable, strided only works for types defined in Base
, etc. This makes it hard for users to provide a function with methods that work and are efficient with a wide range of inputs (this is my experience from StaticArrays
).
A thought for the conversation about mathy regions of the type hierarchy and reasonable roles for parameterized abstractions. When attainable, it is good policy for Julia to simplify and lighten any separation of how code is written to do what expertise intends from what is expertly expressed intention.
One convention we might choose to use makes common practice using an abstract type other than Any to fashion an ontotopological region amidst which concomitant concrete types find shared repose easy. That would bring, at very little cost, greater multi-contextual sensibleness -- understanding some part of the type hierarchy helps one approach other parts.
As I see this, it is a generalized rapport bringer. An elucidative abstraction illuminates as a neighborhood cointendening concretions, the generative alikenesses that construct. With parameterizations carrying intuition's simple clarity, Julia gets many accomplishing more with less nonsense time.
OK, any more comments on TransposedVectors would be appreciated. I feel it's getting quite solid and ready to be turned into a PR, but there are some relevant issues that I've listed here.
(for instance: Is RowVector
a better name than TransposedVector
? Is [1 2 3]
a RowVector
or a Matrix
? What is indices(row, 1)
?)
+1 for RowVector
On Dec 20, 2016 7:01 AM, "Andy Ferris" notifications@github.com wrote:
OK, any more comments on TransposedVectors would be appreciated. I feel
it's getting quite solid and ready to be turned into a PR, but there are
some relevant issues that I've listed here
https://github.com/andyferris/TransposedVectors.jl/issues.(for instance: Is RowVector a better name than TransposedVector? Is [1 2
3] a RowVector or a Matrix? What is indices(row, 1)?)—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/4774#issuecomment-268170323,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAm20YYqsXmprI23GgI5PYyWStpTOq5qks5rJ309gaJpZM4BMOXs
.
I would really like to keep the row-/column- convention out of it, and I think it sounds weird to have Vector
and RowVector
(or even ColVector
and RowVector
). +1 for TransposedVector
or DualVector
@felixrehren I feel that DualVector
should have a different semantic to what I've implemented, primarily being 1-dimensional (mathematically, the dual of a vector is a vector) and have other duality properties (e.g. complex conjugation). Which is fine, but I found it a bit harder to implement and a bit less backward compatible.
The name TransposedVector
is ok. But it is a little long. Also, it kind of suggests that an object of that type can only be obtained by transposing a Vector
. But I presume you could make a TransposedVector
by other means as well, say, by extracting one row of a matrix?
I think RowVector
would be a good name -- it is concise, accurate, and intuitive. @felixrehren, I think the row/column picture is helpful. It's probably even unavoidable, since whenever you do concatenation or other common array operations (other than multiplication) you need to think about which way the vector is oriented.
DualVector
isn't bad either, but CoVector
would sound less formal.
The PR I threatened earlier is now submitted at #19670. I went with RowVector
for now.
But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?
This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?
But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?
This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?
While at first it did seem appealing to have matrix[scalar,range]
(and other similar constructs) yield a row vector, that would be a significant departure from the current indexing semantics for multidimensional arrays, and special cases make me leery.
Instead I'd like to see RowVector
(and Vector
for that matter) convert any iterable type into the appropriate kind of vector. Then you could do something like RowVector(matrix[scalar,range])
which would be quite clear and not disturb the current behavior of array indexing.
Right, the i
th row can be extracted in a row-shape by A[i,:].'
, currently in v0.5 and would continue to do-so in the future (to make a RowVector
or Transpose{V<:AbstractVector}
or whatever we settle on eventually (see here for ongoing discussion)). Maybe that's the answer.
What about just adding two new functions to Base?
row(A,i)
and col(A,i)
The latter is not needed, but just for symmetry. It's concise, clear, and as many characters as A[i,:].'
@benninkrs that makes sense. In contrast my intuitive interpretation is the linear algebra one, where a vector has no orientation at all. All the points of view on this are reasonable, I just don't like Vector
and RowVector
together because the naming looks like it mixes an abstract and a concrete paradigm.
At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously. As an example of an alternative, here's an approach which treats the worst symptoms: leave v'
as it currently is – i.e. producing a row matrix – but parse a'b
as an infix operator with precedence just below multiplication. In other words we would have the following behaviors:
v'
is a row matrix (ctranspose)v'v
is a scalar (dot product)v'*v
is a 1-element vector (row matrix * vector)v*v'
is an outer product matrix (vector * row matrix)v'A
is a row matrix ("vecmat")v'*A
is a row matrix (matmat)v'A*v
is a scalar (matvec A*v then dot product)(v'A)*v
is a 1-element vector (vecmat then matvec)v'*A*v
is a 1-element vector (matmat then matvec)v''
is a column matrix (vector ctranspose, then matrix ctranspose)In the current setup 2 and 3 are equivalent and 7, 8 and 9 are equivalent, which this change breaks. But crucially, the bold items are the ones people will commonly reach for since they're the shortest and most convenient of the similar forms – and they all do what we'd like them to. No new types, just one new infix operator. The main drawback is 10 – v''
is still a column matrix. Arguably, this could be seen as a benefit since postfix ''
is the operator for turning a vector into a column matrix.
Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.
Another approach would be to parse infix a'b
specially (just below *
) and have v'
return a conjugated vector. More generally, postfix A'
could conjugate an array and lazily reverse its dimensions while A.'
would just lazily reverse the dimensions of an array thus acting as the identity on vectors. In this scheme the list of properties could be the following:
v'
is a vector (conjugated)v'v
is a scalar (dot product)v'*v
is an error (recommend v'v
for inner product and outer(v,v)
for outer product)†v*v'
is an error (recommend v'v
for inner product and outer(v,v)
for outer product)†v'A
is a vector (vecmat)v'*A
is an error (recommend v'A
for vecmat)v'A*v
is a scalar (matvec A*v then dot product)(v'A)*v
is an error (recommend v'v
for inner product and outer(v,v)
for outer product)†v'A'v
is a scalar (v'(A'v)
– conjugate matvec then inner product)v''
is a vector (v'' === v
and v.' === v
)Now that I write out all of these properties, this may be the preferable option: all the error cases actually serve to help people discover and use perfered syntaxes, and of course it has the desirable v'' === v
property (and dovetails nicely with .'
being a generic dimension reversal operator). Having very similar syntaxes that are only subtly different is probably more confusing.
† We could potentially catch these at parse time for more precise errors at the risk of giving errors for cases where user code has overloaded '
and *
so that these operations work. I believe having a lazy conjugate wrapper may be necessary to make these recommendations correct, but we need that anyway for #5332.
Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.
:100:
Explicitly making postfix '
and .'
generic array (rather than linear algebra) operations nicely sidesteps doubling down on conflation of storage and linear algebra types, and leaves the door open for frameworks involving fewer compromises. In the interim, those operations' ability to simulate Householder notation in most common cases should provide most of the desired convenience. Also less code and complexity. Best!
One issue with v.'
being a no-op is that A .+ v.'
would change meaning from adding the values of v
to each column of A
to adding the values to the rows of A
. This would generally make it harder to do row-like operations with vectors and it would definitely need a full deprecation cycle to avoid silently making code do the wrong thing (in cases like this where A
happens to be square).
At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously.
I realize the deadline for v0.6 is almost upon us, but I would caution against throwing out the baby with the bathwater. At this stage, it appears that the mentioned RowVector
plus views for matrix transpose and array conjugation will let us get:
v'' === v
v1'v2
is a dot productsize(v')
is unchanged, but we have v''
as 1-dimensionalAc_mul_Bc
functions in favour of *
and A_mul_B!
only (and similarly for \
, /
).Getting all of this working for Array
honestly would not take too much effort (for me the problem is finding time at this time of year, and chasing down all the other types we have in our linear algebra suite). And the last point would be a sigh of relief.
On the flip side - IMHO those rules seem to complicate parsing slightly and might be a little confusing and/or surprising how they compose '
with *
(3, 4, 6 and 8 would work with RowVector
).
And yes we'd have to deprecate v.'
or something in order to highlight potential bugs, at which point it almost seems better to make v.'
a missing method error (we simply won't support row/dual vectors, but won't stop a package doing so if they wish)
BAM
Woot.
Was this our longest issue thread?
No, #11004 has more
Sorry. You're right, I should have specified open issue thread.
Most helpful comment
BAM