Julia: Taking matrix transposes seriously

Created on 10 Mar 2017  ·  141Comments  ·  Source: JuliaLang/julia

Currently, transpose is recursive. This is pretty unintuitive and leads to this unfortunateness:

julia> A = [randstring(3) for i=1:3, j=1:4]
3×4 Array{String,2}:
 "J00"  "oaT"  "JGS"  "Gjs"
 "Ad9"  "vkM"  "QAF"  "UBF"
 "RSa"  "znD"  "WxF"  "0kV"

julia> A.'
ERROR: MethodError: no method matching transpose(::String)
Closest candidates are:
  transpose(::BitArray{2}) at linalg/bitarray.jl:265
  transpose(::Number) at number.jl:100
  transpose(::RowVector{T,CV} where CV<:(ConjArray{T,1,V} where V<:(AbstractArray{T,1} where T) where T) where T) at linalg/rowvector.jl:80
  ...
Stacktrace:
 [1] transpose_f!(::Base.#transpose, ::Array{String,2}, ::Array{String,2}) at ./linalg/transpose.jl:54
 [2] transpose(::Array{String,2}) at ./linalg/transpose.jl:121

For some time now, we've been telling people to do permutedims(A, (2,1)) instead. But I think we all know, deep down, that's terrible. How did we get here? Well, it's pretty well-understood that one wants the ctranspose or "adjoint" of a matrix of matrices to be recursive. A motivating example is that you can represent complex numbers using 2x2 matrices, in which case the "conjugate" of each "element" (actually a matrix), is its adjoint as a matrix – in other words, if ctranspose is recursive, then everything works. This is just an example but it generalizes.

The reasoning seems to have been the following:

  1. ctranspose should be recursive
  2. ctranspose == conj ∘ transpose == conj ∘ transpose
  3. transpose therefore should also be recursive

I think there are a few problems here:

  • There's no reason that ctranspose == conj ∘ transpose == conj ∘ transpose has to hold, although the name makes this seem almost unavoidable.
  • The behavior of conj operating elementwise on arrays is kind of an unfortunate holdover from Matlab and isn't really a mathematically justifiable operations, much as exp operating elementwise isn't really mathematically sound and what expm does would be a better definition.
  • It is actually taking the conjugate (i.e. adjoint) of each element that implies that ctranspose should be recursive; in the absence of conjugation, there's no good reason for transpose to be recursive.

Accordingly, I would propose the following changes to remedy the situation:

  1. Rename ctranspose (aka ') to adjoint – that is really what this operation does, and it frees us from the implication that it must be equivalent to conj ∘ transpose.
  2. Deprecate vectorized conj(A) on arrays in favor of conj.(A).
  3. Add a recur::Bool=true keyword argument to adjoint (née ctranspose) indicating whether it should call itself recursively. By default, it does.
  4. Add a recur::Bool=false keyword argument to transpose indicating whether it should call itself recursively. By default, it does not.

At the very minimum, this would let us write the following:

julia> A.'
4×3 Array{String,2}:
 "J00"  "Ad9"  "RSa"
 "oaT"  "vkM"  "znD"
 "JGS"  "QAF"  "WxF"
 "Gjs"  "UBF"  "0kV"

Whether or not we could shorten that further to A' depends on what we want to do with conj and adjoint of non-numbers (or more specifically, non-real, non-complex values).

[This issue is the second of an ω₁-part series.]

breaking decision linear algebra

Most helpful comment

(OT: I'm already looking forward to "Taking 7-tensors seriously," the next installment in the highly-successful 6-part miniseries...)

All 141 comments

The logical successor to a previous issue... 👍

The behavior of conj operating elementwise on arrays is kind of an unfortunate holdover from Matlab and isn't really a mathematically justifiable operations

This is not true at all, and not at all analogous to exp. Complex vector spaces, and conjugations thereof, are a perfectly well-established mathematical concept. See also https://github.com/JuliaLang/julia/pull/19996#issuecomment-272312876

Complex vector spaces, and conjugation thereof, are a perfectly well-established mathematical concept.

Unless I'm mistaken, the correct mathematical conjugation operation in that context is ctranspose rather than conj (which was precisely my point):

julia> v = rand(3) + rand(3)*im
3-element Array{Complex{Float64},1}:
 0.0647959+0.289528im
  0.420534+0.338313im
  0.690841+0.150667im

julia> v'v
0.879291582684847 + 0.0im

julia> conj(v)*v
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
 [1] *(::Array{Complex{Float64},1}, ::Array{Complex{Float64},1}) at ./linalg/rowvector.jl:180

The problem with using a keyword for recur is that, last I checked, there was a big performance penalty for using keywords, which is a problem if you end up recursively calling these functions with a base case that ends up on scalars.

We need to fix the keyword performance problem in 1.0 anyway.

@StefanKarpinski, you're mistaken. You can have complex conjugation in a vector space without having adjoints — adjoints are a concept that requires a Hilbert space etc., not just a complexified vector space.

Furthermore, even when you do have a Hilbert space, the complex conjugate is distinct from the adjoint. e.g. the conjugate of a complex column vector in ℂⁿ is another complex vector, but the adjoint is a linear operator (a "row vector").

(Complex conjugation does not imply that you can multiply conj(v)*v!)

Deprecating vectorized conj is independent of the rest of the proposal. Can you provide some references for the definition of complex conjugation in a vector space?

https://en.wikipedia.org/wiki/Complexification#Complex_conjugation

(This treatment is rather formal; but if you google "complex conjugate matrix" or "complex conjugate vector" you will find zillions of usages.)

If mapping a complex vector to the conjugate vector (what conj does now) is an important operation distinct from mapping it to is conjugate covector (what ' does), then we can certainly keep conj to express that operation on vectors (and matrices, and higher arrays, I guess). That provides a distinction between conj and adjoint since they would agree on scalars but behave differently on arrays.

In that case, should conj(A) call conj on each element or call adjoint? The representation of complex numbers as 2x2 matrices example would suggest that conj(A) should actually call adjoint on each element, rather than calling conj. That would make adjoint, conj and conj. all different operations:

  1. adjoint: swap i and j indices and recursively map adjoint over elements.
  2. conj: map adjoint over elements.
  3. conj.: map conj over elements.

conj(A) should call conj on each element. If you are representing complex numbers by 2x2 matrices, you have a different complexified vector space.

For example, a commonplace usage of conjugation of vectors is in the analysis of eigenvalues of real matrices: the eigenvalues and eigenvectors come in complex-conjugate pairs. Now, suppose you have a block matrix represented by a 2d array A of real 2x2 matrices, acting on blocked vectors v represented by 1d arrays of 2-component vectors. For any eigenvalue λ of A with eigenvector v, we expect a second eigenvalue conj(λ) with eigenvector conj(v). This won't work if conj calls adjoint recursively.

(Note that the adjoint, even for a matrix, might be different from a conjugate-transpose, because the adjoint of a linear operator defined in the most general way depends also on the choice of inner product. There are lots of real applications where some kind of weighted inner product is appropriate, in which case the appropriate way to take the adjoint of a matrix changes! But I agree that, given a Matrix, we should take the adjoint corresponding to the standard inner product given by dot(::Vector,::Vector). However, it's entirely possible that some AbstractMatrix (or other linear-operator) types would want to override adjoint to do something different.)

Correspondingly, there is also some difficulty in defining an algebraically reasonable adjoint(A) for e.g. 3d arrays, because we haven't defined 3d arrays as linear operators (e.g. there is no built-in array3d * array2d operation). Maybe adjoint should only be defined by default for scalars, 1d, and 2d arrays?

Update: Oh, good: we don't define ctranspose now for 3d arrays either. Carry on.

(OT: I'm already looking forward to "Taking 7-tensors seriously," the next installment in the highly-successful 6-part miniseries...)

If you are representing complex numbers by 2x2 matrices, you have a different complexified vector space.

I'm not following this – the 2x2 matrix representation of complex numbers should behave exactly like having complex scalars as elements. I would think, e.g. that if we define

m(z::Complex) = [z.re -z.im; z.im z.re]

and we have an arbitrary complex vector v then we'd want this identity to hold:

conj(m.(v)) == m.(conj(v))

I would spell the example out more and make a comparison with ' which is supposed to already commute with m but I can't because of https://github.com/JuliaLang/julia/issues/20979, which accidentally breaks that commutation. Once that bug is fixed then m.(v)' == m.(v') will hold, but if I'm understanding you correctly, @stevengj, then conj(m.(v)) == m.(conj(v)) should not?

conj(m(z)) should == m(z).

Your m(z) is an isomorphism to complex numbers under addition and multiplication, but it is not the same object in other ways for linear algebra, and we shouldn't pretend it is for conj. For example, if z is a complex scalar, then eigvals(z) == [z], but eigvals(m(z)) == [z, conj(z)]. When the m(z) trick is extended to complex matrices, this doubling of the spectrum has tricky consequences for e.g. iterative methods (see e.g. this paper).

Another way of putting it is that z is already a complex vector space (a vector space over the complex numbers), but the 2x2 real matrix m(z) is a vector space over the real numbers (you can multiply m(z) by a real number) … if you "complexify" m(z) by multiplying it by a complex number e.g. m(z) * (2+3im) (not m(z) * m(2+3im)) then you get a different complex vector space not isomorphic anymore to the original complex vector space z.

This proposal looks like it will produce a real improvement :+1: I'm thinking about the mathematical side:

  • as I understand, conjugation is an action on the ring (read: numbers) which provide the coefficients for our vector space/vectors. This action induces another action on the vector space (and on its mappings, read:matrices), also called conjugation, by the linearity of the construction of the vector space over the ring. Hence conjugation necessarily works by elementwise application to the coefficients. For Julia, that means that at heart for a vector v, conj(v) = conj.(v), and it's a design choice whether conj has methods also for arrays or only for scalars, both of which seem ok? (Steven's point about the complex-numbers-as-2x2-matrices example is that this is a vector space whose coefficients are formally/actually real, so conjugation has no effect here.)
  • adjoint has an algebraic meaning that's fundamental in lots of important domains intimately involving linear algebra (see 1 and 2 and 3, all with big applications in physics too). If adjoint is added, it should allow keyword arguments for the action under consideration -- in a vector spaces/functional analysis, this action is typically induced by the inner product, hence the keyword argument can be the form instead. Whatever is designed for Julia's transposes should avoid conflicting with these applications, so I think adjoint is a bit highly-charged?

@felixrehren, I don't think you would use a keyword argument to specify the inner product that induces the adjoint. I think you would just use a different type instead, the same as if you wanted to change the meaning of dot for a vector.

My preference would be a bit simpler:

  • Keep conj as-is (elementwise).
  • Make a' correspond to adjoint(a), and make it recursive always (and hence fail for arrays of strings etc where adjoint is not defined for the elements).
  • Make a.' correspond to transpose(a), and make it never recursive (just a permutedims), and hence ignore the type of the elements.

I really don't see a use case for a non-recursive adjoint. As a stretch, I can sort of imagine use-cases for a recursive transpose, but in any application where you have to change a.' to transpose(a, recur=true) it seems just as easy to call another function transposerecursive(a). We could define transposerecursive in Base, but I think the need for this will be so rare that we should wait to see if it actually comes up.

I personally feel that keeping this simple will be the most straightforward for users (and for implementation), and still be quite defensible on the linear algebra front.

So far (most) of our linear algebra routines are pretty strongly rooted to standard array structures and the standard inner product. In each slot of your matrix or vector, you put an element of your "field". I will argue that for elements of fields, we care about +, *, etc, and conj, but not transpose. If your field was a special complex representation which looks a bit like a 2x2 real matrix but changes under conj, then that's OK - it's a property of conj. If it's just a 2x2 real AbstractMatrix that doesn't change under conj, then arguably such elements of a matrix shouldn't change under the adjoint (@stevengj said it better than I can describe - there might be an isomorphism to complex numbers but that doesn't mean it behaves the same in all ways).

Anyway, the 2x2 complex example feels a little bit of a red herring to me. The real cause of the recursive matrix behavior was as a shortcut to doing linear algebra on block matrices. Why don't we treat this special case with proper care, and simplify the underlying system?

So my "simplified" suggestion would be:

  • Keep conj as-is (or make it a view for AbstractArrays)
  • Make a' a non-recursive view such that (a')[i,j] == conj(a[j,i])
  • Make a.' a non-recursive view such that (a.')[i,j] == a[j,i]
  • Introduce a BlockArray type for handling block-matrices and so-on (in Base, or possibly in a well-supported package). Arguably, this would be much more powerful and flexible than a Matrix{Matrix} for this purpose, but equally efficient.

I think these rules would be simple enough for users to embrace, and build upon.

PS - @StefanKarpinski For practical reasons, a boolean keyword argument for recursion won't work with view's for transposition. The type of the view may depend on the boolean value.

Also, I mentioned elsewhere, but I'll add it here for completeness: recursive transpose views have the annoying property that the element type might change compared to the array it is wrapping. E.g. transpose(Vector{Vector}) -> RowVector{RowVector}. I haven't thought of a way of getting this element type for RowVector without a run-time penalty or by invoking inference to compute the output type. I'm guessing the current behavior (invoking inference) is undesirable from a language standpoint.

NB: there's also nothing stopping users from defining conj to return a different type - so the ConjArray view also suffers from this problem, whether transposition is recursive or not.

@stevengj – you point about "complex" 2x2 matrices being a formally real vector space rather than a complex vector space makes sense to me, but then that point calls into question for me the original motivation for recursive adjoint, which leads me to wonder if @andyferris's proposal wouldn't be better (non-recursive transpose and adjoint). I guess the fact that both the complex 2x2 example and the block matrix representation "want" adjoint to be recursive is suggestive, but given your comments about that first example, I have to wonder if there aren't other cases where non-recursive adjoint is more correct/convenient.

If the adjoint isn't recursive, it isn't an adjoint. It's just wrong.

Can you give a little bit more of a justification for that when you've got a moment?

The adjoint of a vector must be a linear operator mapping it to a scalar. That is, a'*a must be a scalar if a is a vector. And this induces a corresponding adjoint on matrices, since the defining property is a'*A*a == (A'*a)'*a.

If a is a vector of vectors, this implies that a' == adjoint(a) must be recursive.

OK, I think I follow this.

We also have recursive inner products:

julia> norm([[3,4]])
5.0

julia> dot([[3,4]], [[3,4]])
25

Clearly, the "adjoint" or "dual" or whatever should be similarly recursive.

I think the central question then is, do we require that a' * b == dot(a,b) for all vectors a, b?

The alternative is to say that ' doesn't necessarily return the adjoint - it is just an array operation that transposes the elements and passes them through conj. This just happens to be the adjoint for the real or complex elements.

There's one and only one reason that we have a special name and special symbols for adjoints in linear algebra, and that is the relationship to inner products. That's the reason why "conjugate transpose" is a important operation and "conjugate rotation of matrices by 90 degrees" is not. There's no point in having something that is "just an array operation that swaps rows and columns and conjugates" if it is not connected to dot products.

One can define a similar relationship between transpose and an unconjugated "dot product", which would argue for a recursive transpose. However, unconjugated "dot products" for complex data, which aren't really inner products at all, don't show up nearly as often as true inner products — they arise from bi-orthogonality relations when when an operator is written in complex-symmetric (not Hermitian) form — and don't even have a built-in Julia function or a common terminology in linear algebra, whereas wanting to swap rows and columns of arbitrary non-numeric arrays seems far more common, especially with broadcasting. This is why I can support making transpose non-recursive while leaving adjoint recursive.

I see. So renaming ' to adjoint would be part of the change, to make it obvious that it is not conj ∘ transpose?

The thing which always confused me with recursive arrays is: what is a scalar in this context? In all the cases I'm familiar with, it's said the elements of a vector are scalars. Even in the case where a mathematician writes a block-vector/matrix structure on a bit of paper, we still know this is just shorthand for a larger vector/matrix where the elements are probably real or complex numbers (i.e. the BlockArray). You expect the scalars to be able to multiply under *, and the type of the scalar doesn't typically differ between the vector and it's dual.

@andyferris, for a general vector space, the scalars are a ring (numbers) that you can multiply vectors by. I can do 3 * [[1,2], [3,4]], but I can't do [3,3] * [[1,2], [3,4]]. So, even for Array{Array{Number}}, the correct scalar type is Number.

Agreed - but the elements are usually said to be (the same) scalars, no?

The treatments I've seen, anyway, start with a ring, and build a vector space out of them. The ring supports +, *, but I haven't seen a treatment that requires it to support adjoint, dot, or whatever.

(Sorry, I'm just trying to understand the underlying mathematical object we are modelling).

It depends on what you mean by "shorthand" and what you mean by "elements".

For example, it is quite common to have finite-size vectors of functions, and "matrices" of infinite dimensional operators. e.g. consider the following form of the macroscopic Maxwell equations:

image

In this case, we have a 2x2 matrix of linear operators (e.g. curls) acting on 2-component vectors whose "elements" are 3-component vector fields. These are vectors over the scalar field of complex numbers. In some sense, if you drill down far enough the "elements" of the vectors are complex numbers — individual components of the fields at individual points in space — but that is rather murky.

Or perhaps more precisely, can't we use always the ring to describe the coefficients of the vector in some basis (given the array nature of things in Julia, we aren't going basis-free)?

In any case, the consequence is still that adjoints have to be recursive, no? I don't understand what point you are getting at.

(We absolutely can go basis-free, I think, since the objects or the elements of the arrays could be symbolic expressions ala SymPy or some other data structure representing an abstract mathematical object, and the adjoint could return another object that when multiplied computes an integral, for example.)

I'm just trying to understand better, not making a particular point :)

Certainly, in the above, the adjoint is recursive. I was wondering if e.g. in the above it is best to think of [E, H] as a BlockVector whose (infinitely?) many elements are complex, or as a 2-vector whose elements are vector fields. I think in this case, the BlockVector approach would be unworkable.

Anyway, thanks.

So, perhaps I can distil my thoughts in this form:

For a vector v, I kind of expect that length(v) describes the dimension of the vector space, i.e. the number of scalar coefficients I need to (fully) describe an element of the vector space, and also that v[i] returns the ith scalar coefficient. To date, I haven't thought of AbstractVector as an "abstract vector", but rather an object that holds the coefficients of an element of some vector space given some basis that the programmer knows about.

It seemed like a simple & useful mental model, but perhaps this is too restrictive/impractical.

(EDIT: It's restrictive because in Vector{T}, T must behave like a scalar for linear algebra operations to function.)

but I can't do [3,3] * [[1,2], [3,4]]

We could fix that. I'm not sure whether it's a good idea or not, but it could certainly be made to work.

I'm not sure it's desirable... in this case [3,3] actually isn't a scalar. (We also already have the ability to do [3,3]' * [[1,2], [3,4]] - which I really don't know how to interpret in a linear algebra sense).

This shows an interesting corner-case: we seem to be saying that v1' * v2 is the inner product (and thus returns a "scalar"), but [3,3]' * [[1,2], [3,4]] == [12, 18]. A contrived example, I suppose.

In that case [3,3] is a scalar: scalars are elements in the underlying ring, and there are plenty of good ways to make elements of the form [a,b] into a ring (and thus defining vectors/matrices above it). The fact that they are written as vectors, or the fact that this ring itself forms a vector space over another underlying ring, doesn't change that. (You could take any other notation that may hide the vectors! Or make it look totally different.) As @andyferris was getting at, what is a scalar depends on context.

Formally, recursion is not part of the adjoint. Instead, the conjugation of the scalar elements does that part of the work -- and often, if a scalar s is represented as a matrix, conjugating s will involve transposing the matrix we use to denote it. But that's not always the case, it depends on the user-given structure to the ring of scalars.

I think forcing adjoint recursion can be OK practical compromise, typical for matlab-type applications. But for me, taking it super seriously would mean non-recursive adjoint and using typed scalars + dispatch to let conj work the necessary magic on whatever the underlying ring's structure happens to be.

In that case [3,3] is a scalar: scalars are elements in the underlying ring

Except such a scalar isn't an element of the ring defined by +, *. It doesn't support *; I can't do [3,3] * [3,3].

I think forcing adjoint recursion can be OK practical compromise, typical for matlab-type applications. But for me, taking it super seriously would mean non-recursive adjoint and using typed scalars + dispatch to let conj work the necessary magic on whatever the underlying ring's structure happens to be.

I agree, if we want to do a practical comprise this is completely fine and it's what we've done until now, but it seems to me that the underlying scalars are those of the fully flattened array and that is what linear algebra is defined on. We have the technology to make a BlockVector and a BlockMatrix (and BlockDiagonal, etc) which present an efficient, flattened view of the fully expanded array, so the "super serious" approach should, at minimum, be doable. I also happen to think it would be just as user-friendly as the recursive approach (a few extra characters to type BlockMatrix([A B; C D]) in exchange for what might be a nicer semantic and potentially more readable code - I'm sure these points are up for debate). It would, however, be more work to implement all of that.

@andyferris you're right, they aren't a ring because * isn't defined -- but I guess we're on the same page that * easily could be defined for it, and there are many different ways to do it that'll give a ring structure. So I suppose we're on the same page: typing is the more extensible way of solving this.

(About scalars: the scalars are not necessarily the elements of the fully-flattened structure! The reason is this. A 1-dimensional complex vector space is 1-dimensional over the complex numbers and 2-dimensional over the real numbers; neither representation is special. Quaternions are 2-dimensional over complex numbers. Octonions are 2-dimensional over quarternions, 4-dimensional over complex numbers, and 8-dimensional over the reals. But there's no reason to insist that octonions are "8-dimensional" any more than to insist that their scalars are the real numbers not complex; those are choices not forced by logic. It is purely a question of representation, the user should be allowed this freedom as the linear algebra is the same in each case. And you get much longer chains of such spaces with [truncated] multipolynomial rings, or algebraic extensions of number fields, both of which have lively applications with matrices. Julia doesn't have to go this far down the rabbit hole -- but we should remember it exists, so as not to block it off. Maybe a discourse topic? :) )

@felixrehren, a vector of vectors over a ring R is best understood as a direct-sum space, which also a vector space over R. It is nonsensical to call the vectors themselves "the underlying ring", because in general they are not a ring. That reasoning applies perfectly to [[1,2], [3,4]].

Conjugation and adjoints are normally separate concepts; saying that "conjugation of the scalar elements" should do the work here seems totally incorrect to me — the opposite of "serious" — except in the special case noted below.

Consider the case of "2-component column vectors" (|u⟩,|v⟩) of two elements |u⟩ and |v⟩ in some Hilbert space H over some ring (say the complex numbers ℂ), in Dirac notation: "vectors of vectors." This is a direct-sum Hilbert space H⊕H with the natural inner product ⟨(|u⟩,|v⟩), (|w⟩,|z⟩)⟩ = ⟨u|w⟩+⟨v|z⟩. The adjoint should therefore produce the linear operator consisting of the "row vector" (⟨u| ⟨v|), whose elements are themselves linear operators: the adjoint is
"recursive"
. This is totally different from the complex conjugate, which produces an element of the same Hilbert space H⊕H, induced by conjugation of the underling ring.

You are confusing the matter by referring to vectors over quaternions, etcetera. If the "elements" of the vector are also elements of the underlying "complexified" ring, then of course the adjoint and the conjugate of these elements is the same thing. However, that is not applicable to all direct-product spaces.

Put another way, the object type needs to indicate two different pieces of information:

  • The scalar ring (hence the conjugate, which produces an element of the same space).
  • The inner product (hence the adjoint, which produces an element of the dual space).

For Vector{T<:Number}, we should read it as indicating that the scalar ring is T (or complex(T) for the complexified vector space), and that the inner product is the usual Euclidean one.

If you have a vector of vectors, then it is a direct-sum Hilbert space and the ring is the promotion of the scalar rings of the individual vectors, and the dot product is the sum of the dot products of the elements. (If the scalars can't be promoted or the dot products can't be summed, then it isn't a vector/Hilbert space at all.)

If you have a scalar T<:Number, then conj(x::T) == adjoint(x::T).

So, if you try represent complex numbers z::Complex{T} by the 2x2 matrices m(z)::Array{T,2}, then your type indicates both a different ring T and a different inner product compared to z, and hence you shouldn't expect either conj or adjoint to give equivalent results.

I kind-of see what you are saying @felixrehren. If the scalar is real, then an octernian can be thought of as an 8 dimensional vector space. But if the scalar coefficient was an octernian, then there is trivial dimension-1 basis that represents all octernians. I'm not sure if this is different to what I said (or what I meant :smile:), if we have an AbstractVector{T1} it might be isomorphic to an AbstractVector{T2} of a different dimensionality (length). But T1 and T2 should both be rings under the Julia operators + and * (and the isomorphism might preserve the behavior of + but not *, or the inner product or adjoint).

@stevengj I've always thought of the direct sum exactly like my BlockVector. You have two (disjoint) incomplete bases. Within each basis you might be able to form superpositions like c₁|X₁⟩ + c₂|X₂⟩ in the first basis or c₃|Y₁⟩ + c₄|Y₂⟩ in the second. The "direct sum" represents the space of states that look like (c₁|X₁⟩ + c₂|X₂⟩) + (c₃|Y₁⟩ + c₄|Y₂⟩). It seems to me that the only special thing that separates this from a basis of dimension four over the complex numbers (i.e. states like c₁|X₁⟩ + c₂|X₂⟩ + c₃|Y₁⟩ + c₄|Y₂⟩) is the brackets - to me, it seems notational; the direct sum is just a convenient way to write these on paper or with arrays on a computer (and might be useful to e.g. to let us catalogue and take advantage of symmetries, or to split up the problem (perhaps to parallelize it), etc). In this example X⊕Y is still a four-dimensional vector space where the scalars are complex.

This is what makes me want eltype(v) == Complex{...} and length(v) == 4 rather than eltype(v) == Vector{Complex{...}} and length(v) == 2. It helps me to see and reason about the underlying ring and vector space. Is it not also this "flattened" space where you can do "global" transposition and elementwise conj in order to compute the adjoint?

(you got in two posts while I was writing just one, @stevengj :smile:)

Of course, if we prefer to use nested arrays as convention for the direct sum (as we do now) rather than BlockArray, this can be nice and consistent too! We might benefit from some extra functions to determine the scalar field (a kind of recursive eltype) and what the effective flattened dimensionality might be (a kind of recursive size) to make it a bit easy to reason about what linear algebra we are doing.

To be clear, I'm happy with both approaches and I've learned a lot from this discussion. Thanks.

@andyferris, just because two spaces are isomorphic doesn't mean they are the "same" space, and arguing about which one of them is "really" the vector space (or whether they are "really" the same) is a metaphysical quagmire from which we will never escape. (But the case of direct sums of infinite-dimensional vector spaces illustrates a limitation of your "flattened" concept of a direct sum as being "all the elements of one followed by all the elements of the other".)

Again, I'm not sure what your point is. Are you seriously proposing that eltype(::Vector{Vector{Complex}}) in Julia should be Complex, or that length should return the sum of the lengths? That everyone in Julia should be forced to adopt your "flattened" isomorphism for direct-sum spaces? If not, then adjoint has to be recursive.

And if you are "just trying to understand better, not making a particular point", can you take it to a different forum? This issue is confusing enough without metaphysical arguments about the meaning of direct-sum spaces.

just because two spaces are isomorphic doesn't mean they are the "same" space

I definitely wasn't suggesting that, at all.

Again, I'm not sure what your point is

I am seriously suggesting that if you want to do linear algebra with an AbstractArray, then the eltype better be a ring (support +, * and conj) which rules out e.g. eltypes of Vector because [1,2] * [3,4] doesn't work. And the length of a vector would actually represent the dimensionality of your linear algebra. Yes, this rules out infinitely-dimensional vector spaces, but generally in Julia AbstractVector can't be infinitely sized. I feel this would make it easier to reason about how to implement and to use the linear algebra functionality on Julia. However, users would have to explicitly denote a direct sum via a BlockArray or similar rather than use nested array structures.

This is quite a breaking suggestion, and does have notable downsides (some you've mentioned), so I'm not going to be unhappy if we say "that's a nice idea, but it's not going to be practical". However, I have also been trying to point out that the nested array approach makes some things about the underlying linear algebra a bit more opaque.

All this seems directly relevant to the OP. With an enforced flattened approach we would ditch recursive transpose/adjoints, and if we ditched recursive transpose/adjoints, only flattened structures could be viable for linear algebra. If we don't enforce a flattened approach, we must keep a recursive adjoint. To me, it seemed to be a linked decision.

Are you seriously proposing that eltype(::Vector{Vector{Complex}}) in Julia should be Complex, or that length should return the sum of the lengths?

No, sorry, maybe what I wrote there wasn't clear - I was merely suggesting that two new convenience functions for this purpose might come in handy in certain circumstances. Sometimes you might want the zero or one of that ring, for instance.

Are you saying that everyone in Julia should be forced to adopt your "flattened" isomorphism for direct-sum spaces?

I was suggesting that as a possibility, yes. I'm not 100% convinced it's the best idea, but I feel there are some advantages (and some disadvantages).

Returning the realm of actionable changes here, it seems that @stevengj's simplified version of my proposal is the way to go here, i.e.:

  • Keep conj as-is (elementwise).
  • Make a' correspond to adjoint(a), and make it recursive always (and hence fail for arrays of strings etc where adjoint is not defined for the elements).
  • Make a.' correspond to transpose(a), and make it never recursive (just a permutedims), and hence ignore the type of the elements.

The main actual "todos" that can be extracted from this are:

  1. [ ] Rename ctranspose to adjoint.
  2. [ ] Change transpose to be non-recursive.
  3. [ ] Figure out how this fits in with row vectors.

I suspect that relying on recursive transpose is sufficiently rare that we can just change this in 1.0 and list it as breaking in NEWS. Changing ctranspose to adjoint can go through a normal deprecation (however we do this for 1.0). Is there anything else that needs to be done?

@StefanKarpinski, we'll also need to make the corresponding change to RowVector. One possibility would be to split RowVector into Transpose and Adjoint types for lazy non-recursive transpose and lazy recursive adjoint, respectively, and to get rid of Conj (lazy conj).

A few more at least tangentially related changes

  • Some view types for transpose and adjoint of AbstractMatrix
  • Make adjoint(::Diagonal) recursive (arguably we should fix this "bug" in Diagonal for transpose and ctranspose before Julia v0.6.0)
  • Use the proper "scalar" type in things like: v = Vector{Vector{Float64}}(); dot(v,v) (currently an error - it tries to return zero(Vector{Float64}))

OK, I've been trying to take recursive block-matrices more seriously, and to back this up I'm also trying to improve the behavior of Diagonal here (there are already some methods which anticipate a block-diagonal structure, such as getindex, so it seems natural to make transpose recursive in this case).

Where I get stuck, and what directly motivated all of my discussion points above, is how to do linear algebra operations on such a structure, including inv, det, expm, eig, and so on. For instance, there is a method for det(::Diagonal{T}) where T which just takes the product of all the diagonal elements. Works brilliantly for T <: Number, and it also works if all the elements are square matrices of the same size. (Of course, if they are square matrices of the same size, then the elements form a ring, and it's all quite sensible - also recursive transpose (edit: adjoint) is the correct thing).

However, if we think of a general block-diagonal matrix Diagonal{Matrix{T}}, or any block-matrix Matrix{Matrix{T}}, then we can generally speak of its determinant as a scalar T. I.e. if the size were (3 ⊕ 4) × (3 ⊕ 4) (a 2×2 matrix with diagonal elements which are matrices of dimensions 3×3, 4×4, and matching off-diagonal elements), should det return the determinant of of the "flattened" 7×7 structure, or should it simply try and multiply the 2×2's elements as they are (and error-out, in this case)?

(edit: I changed the dimensions above to be all different, which should avoid ambiguous language)

I don't have a problem with a' being recursive, but I personally would find the new a.' notation extremely confusing. If you showed me the syntax a.' I would tell you, based on my mental model of Julia, that it performs transpose.(a) i.e. elementwise transpose of a, and I would be completely wrong. Alternatively, if you showed me that a' and a.' were both options for a transpose, and that one of them also recursed elementwise, I would tell you that a.' must have the elementwise recursion, and I would again be wrong.

Of course, my mental model of what . means is just wrong in this case. But I suspect I'm not the only one who would interpret that syntax in exactly the wrong way. To that end, I would suggest using something other than .' for the non-recursive transpose.

@rdeits I'm afraid this syntax derives from MATLAB. This became a little unfortunate once the (quite wonderful) dot-call broadcast syntax was introduced for v0.5.

We could forge our own path separate from MATLAB and change this - but I believe there was a separate discussion somewhere about that (anyone remember where?).

Ah, that's unfortunate. Thanks!

Another todo:

  • [ ] Fix issymmetric and ishermitian to match transpose and adjoint – the former of each pair being non-recursive, the latter of each pair being recursive.

The Matlab .' syntax is definitely quite unfortunate in the context of our new . syntax. I would not be against changing this, but then we need a new syntax for transpose, and I'm not sure we have one available. Anyone have any suggestions for transpose?

Let's move the transpose syntax discussion over here: https://github.com/JuliaLang/julia/issues/21037.

I don't have any strong opinion about transpose/ctranspose/adjoint being recursive, but I'd rather not treat A::Matrix{Matrix{T}} like block matrix in the sense of a lazy hvcat, which @andyferris seems to imply at least in part. That is, if A's elements were all square matrices of the same size (i.e. form a ring), I'd expect det(A) to return a square of that size again. If they are rectangular or of different sizes, I'd expect an error.

That said, a block matrix/lazy cat type might be useful, but it should go all the way and also define e.g. getindex to work on the flattened data. But I definitely would not like to see this concept absorbed by Matrix or any other of the existing matrix types like Diagonal.

This is relieving, @martinholters. What I was panicking about earlier in this thread was the idea that we should somehow be able to apply Julia's entire linear algebra framework to Matrix{Matrix} for arbitrary block matrices (where the submatrices have differing sizes).

What I was arguing for with the "flattening" is not doing any fancy introspection of what the elements are, and just treat the elements on their own merits as elements of a ring. However, recursive ctranspose/adjoint expands this to allowing elements which act like linear operators, which seems correct and useful. (The other case is elements of a vector which act like a vector, where a recursive adjoint is also correct).

To go back a little further - what was the original motivation for removing default no-op behavior for transpose(x) = x and ctranpsose(x) = conj(x)? These always seemed quite useful to me.

To go back a little further - what was the original motivation for removing default no-op behavior for transpose(x) = x and ctranpsose(x) = conj(x)? These always seemed quite useful to me.

That was motivated by custom linear-operator types (which can't be subtypes of AbstractArray) that failed to specialize ctranspose. This meant that they inherited the wrong no-op behavior from a fallback. We've tried to structure dispatch such that fallbacks are never silently incorrect (but they may have a pessimistic complexity). https://github.com/JuliaLang/julia/issues/13171

We seem to have two options – in both, transpose becomes non-recursive, otherwise:

  1. Non-recursive ctranspose.
  2. Recursive ctranspose.

@stevengj, @jiahao, @andreasnoack – what are your preferences here? Others?

I've been thinking a lot about this since @jiahao's JuliaCon 2017 talk.

To me, I still feel linear algebra should be defined with respect to a "scalar" field as an element type T. If T is a field (supports +, * and conj (also -, /, ...)) then I don't see why methods in Base.LinAlg should fail.

OTOH it is quite common (and valid) to talk about taking the transpose of a block matrix, for example. I think we can learn from "mathematical" type theory here, which for instance tries to deal with strange statements arising from discussing sets of sets by having "first order" sets of scalars, "second order" sets of sets of scalars, "third order" sets of sets of sets of scalars, and so-on. I think we have the same problem (and opportunity) here when deal with arrays of arrays - we can potentially use Julia's type system to describe "first order" arrays of "true" scalars, "second order" arrays of arrays of scalars, etc. I think the long discussions between @stevengj, myself, and others, resulted from the fact that, yes, you can come up with a self-consistent set of operations on arrays of arbitrary "order", and in this framework (c)transpose is clearly and correctly recursive, but however, you don't have to do that. As Jiahao's talk pointed out clearly, programming languages/frameworks make semantic choices about distinguishing scalars from arrays (or not), distinguishing vectors from matrices (or not) and so-on, and I think this is a related choice we can make.

To me, the crucial point here is that to make the "arbitrary order" array work you have to teach your "scalars" some operations which are normally only defined on vectors and matrices. The current Julia method which does this is transpose(x::Number) = x. However, I would really prefer if we extend our semantic distinction between arrays and scalars by removing this method - and I think it may be quite feasible.

The next step would be to teach Base.LinAlg the difference between a first-order matrix and a second-order matrix and so-on. I have thought of two viable options, there may be more, I really don't know:

  • Have some sort of opt-in trait to "arrayness", so that transpose(::AbstractMatrix{T}) is recursive when T is an AbstractMatOrVec, and not when T is a Number, and make it somehow configurable (opt-in, opt-out, whatever) otherwise. (In some ways this was and currently is done by controlling the behavior of the transpose method for the elements, but I think it's cleaner to rather control the behavior of the transpose method of the (outer) array).
  • Alternatively, assert that most AbstractArrays including Arrays are first order (their elements are scalar fields), and use a different AbstractArray type (say NestedArray) to wrap arrays and "mark" that their elements are to be treated as arrays not scalars. I tried playing around with this a little, but it seemed the above option is probably less of a burden for users.

I feel that Julia makes a strong semantic separation between arrays and scalars, and that the current situation seems (to me) a little bit inconsistent with that, since scalars had to be "taught" about the array property transpose. A user can clearly tell that Matrix{Float64} is a matrix of scalars (first-order array) and Matrix{Matrix{Float64}} is a block matrix (second-order array). It may be more consistent for us to use the type system or traits to determine whether an array is "first order" or whatever. I also think the first option I listed would make Julia more user friendly (so I can do ["abc", "def"].') yet retain the flexibility to do some sensible/useful default thing with block matrices.

Sorry to harp on so long, but after speaking with @jiahao at JuliaCon, I felt that we could do a little better here. I'm sure that we can make the other options mentioned by @StefanKarpinski above "work", but to me it would be "working" just like matrix/vector algebra was "working" before the introduction of RowVector.

The TLDR was - don't make (c)transpose be either recursive or not - let it do both (i.e. be configurable).

Those are good points but just want to point out that almost all (if not all) of the complaints about (c)transpose being recursive are related to non-mathy use of ' and .' when reshape e.g. Vector{String} and Vector{PyObject} into 1xn matrices. It is easy to fix type by type but I can see that it is annoying. However, the thinking was that the recursive definition was the mathematically correct one and that "correct but annoying in many cases" was better than "convenient but wrong in rare cases".

A possible solution could be your suggestion in the first bullet, i.e. only to have recursive transposes for array-like elements. I believe T<:AbstractVecOrMat would cover most of the cases changing the status to "convenient but wrong in very rare cases". The reason why it might still be wrong is that some operator-like types wouldn't fit in the AbstractMatrix category because the AbstractArray interface is mainly about array (getindex) semantics, not linear algebra.

@andyferris, the adjoint and dual of a scalar are perfectly well defined, and ctranspose(x::Number) = conj(x) is correct.

My feeling is that most uses of transpose are "non-mathy", and most uses of ctranspose (i.e. uses where the conjugating behavior is desired) are mathy (since there's no other reason to conjugate). Hence, I would tend to be in favor of non-recursive transpose and recursive ctranspose.

Personally, I think trying to look at block arrays as nested Arrays becomes complicated for the reasons here and it is perhaps better to have a dedicated type à la https://github.com/KristofferC/BlockArrays.jl for this.

Looks nice, @KristofferC.

There is one very valid point that @stevengj makes above, and it is that sometimes your elements might e.g. be vectors or linear operators in the mathematical sense, but not AbstractArrays. We definitely need a way of doing recursive (c)transpose for these - not sure if you've thought about this or not, but I thought I'd mention it.

Highlights of a slack/#linalg conversation on this topic. Recaps some of the above thread. Focuses on semantics, avoids spelling. (Thanks to all who participated in that conversation! :) )

Three semantically distinct operations exist:
1) "mathematical adjoint" (recursive and lazy)
2) "mathematical transpose" (ideally recursive and lazy)
3) "structural transpose" (ideally non-recursive and ?eager?)

Present situation: "mathematical adjoint" maps to ctranspose, "mathematical transpose" to transpose, and "structural transpose" to permutedims(C, (2, 1)) for two-dimensional C and reshape(C, 1, length(C)) for one-dimesional C. The issue: "structural transpose" is a common operation, and the permutedims/reshape story is somewhat confusing/unnatural/annoying in practice.

How that arose: Previously "structural transpose" was conflated with "mathematical adjoint"/"mathematical transpose" via generic no-op fallbacks like transpose(x::Any) = x, ctranspose(x::Any) = conj(x), and conj(x::Any) = x. These fallbacks made [c]transpose and the associated postfix operators '/.' serve for "structural transpose" in most cases. Great. But they also made operations involving [c]transpose on some user-defined numerical types fail silently (return incorrect results) absent definition of [c]transpose specializations for those types. Ouch. Hence those generic no-op fallbacks were removed, yielding the present situation.

The question is what to do now.

Ideal outcome: Provide a unified, natural, and compact incantation for "structural transpose". Simultaneously support semantically correct mathematical adjoint and transpose. Avoid introducing tricky corner cases in use or implementation.

Two broad proposals exist:

(1) Provide mathematical adjoint, mathematical transpose, and structural transpose as three syntactically and semantically distinct operations. Upsides: Makes everything work, cleanly separates concepts, and avoids tricky corner cases. Downsides: Three operations to explain and implement.

(2) Shoe-horn the three operations into two. Three forms of this proposal exist:

(2a) Make transpose semantically "structural transpose", also serving as "mathematical transpose" in common cases. Upsides: Two operations. Works as expected for containers with unambiguously scalar elements. Downsides: Anyone expecting "mathematical transpose" semantics will silently receive incorrect results on objects more complex than containers with unambiguously scalar elements. If the user catches that issue, achieving "mathematical transpose" semantics requires defining new types and/or methods.

(2b) Introduce a trait indicating a type's "mathiness". When applying adjoint/transpose to an object, if the container/element types are "mathy", recurse; if not, don't recurse. Upsides: Two operations. Could cover a variety of common cases. Downsides: Suffers from the same issue as the generic no-op [c]transpose fallbacks, i.e. could silently yield incorrect results for user-defined numeric types lacking the necessary trait definitions. Does not enable application of structural transpose to a "mathy" type or mathematical transpose to a "non-mathy" type. Not clear how to decide whether an object is "mathy" in all cases. (E.g., what if the element type is abstract? Is the recursive/non-recursive decision then runtime and elementwise, or runtime and requiring a first sweep over all elements in the container to check their collective type?)

(2c) Keep adjoint (ctranspose) "mathematical adjoint" and transpose "mathematical transpose", and introduce specialized (not generic fallback) methods for adjoint/transpose for non-numeric scalar types (e.g. adjoint(s::AbstractString) = s). Upsides: Two operations. Correct mathematical semantics. Downsides: Requires piecemeal definition of adjoint/transpose for non-numeric types, impeding generic programming. Does not enable application of structural transpose to "mathy" types.

Proposals (1) and (2a) enjoyed substantially broader support than (2b) and (2c).

Please find more discussion in #19344, #21037, #13171, and slack/#linalg. Best!

Thanks for the nice write-up!

I'd like to put another possibility on the table that would go well with option 1: Have different container types for mathematical matrices and tabular data. Then the meaning of A' could be decided by the type of A (note: not its element type as discussed above). Cons here are that this might require a lot of converts between the two (would it?) and of course, would be highly disruptive. Admittedly, I'm more than skeptical whether the benefits would justify this disruption, but still wanted to mention it.

Thanks, excellent writeup. I vote for (1).

Great write up. Note that for (1) the spellings could be:

  • Adjoint: a' (recursive, lazy)
  • Mathematical transpose: conj(a') (recursive, lazy)
  • Structural transpose: a.' (non-recursive, eager)

So there wouldn't necessarily need to be any new operator introduced – mathematical transpose would just be a (lazy and therefore efficient) composition of adjoint and conjugation. The biggest issue is that it makes two similar-seeming operators, i.e. postfix ' and .', quite distinct semantically. However, I would posit that in most correct generic code whether someone has used ' or .' is a 99% accurate indicator of whether they meant "give me the adjoint of this thing" or "swap the dimensions of this thing". Moreover, in cases where someone has used ' and actually meant "swap the dimensions of this thing", their code would already be incorrect for any matrix with elements whose scalar adjoint is non-trivial, e.g. complex numbers. In the few remaining cases where someone actually meant "give me the conjugate of the adjoint of this", I would argue that writing conj(a') makes that meaning much clearer since in practice people actually use a.' to mean "swap the dimensions of a".

Working out all the underlying generic types we would need for this remains to be determined, but @andyferris and @andreasnoack already have some thoughts on the matter and it seems possible.

I thought I should also update, since this discussion went on elsewhere. I admit that there was one (obvious?) thing about this proposal that I had completely missed which was that I assumed we would continue to use .' for linear algebra, but I should have realized that this is not the case!

For instance, vector.' will no longer need to be a RowVector (since RowVector is a linear algebra concept, our first attempt at a "dual" vector) - it can simply be a Matrix. When I want the "non-conjugating transpose" in linear algebra, what I'm really doing is taking conj(adjoint(a)), and that's what we'll use and recommend all linear algebra users to use (up until now I have had a long-standing "bad" habit since MATLAB to just use a.' rather than a' for transposing any matrix (or vector) which I knew was real, when what I really wanted was the adjoint (or dual) - the change in name will be a big help here).

I'll also briefly point out that this leaves open an interesting space. Previously vector' and vector.' had to satisfy the dual needs of doing a "data transpose" and taking something like the dual vector. Now that .' is for manipulating array sizes and ' is a linear algebra concept, we might be able to change RowVector into a 1D DualVector or something else entirely. (perhaps we should not discuss this here - if anyone has an appetite for this, let's make a separate issue.)

Finally, I'll copy a proposed action plan from Slack:

1) move transpose from LinAlg to Base and add depwarns (0.7 only) about it no longer making a RowVector or being recursive (if that's possible)
2) rename ctranspose to adjoint everywhere
3) make sure Vector, ConjVector and RowVector work with combinations of ' and conj and *, possibly rename RowVector. (here we also make conj(vector) lazy).
4) introduce a lazy matrix adjoint that also interacts well with conj and ConjMatrix
5) remove A_mul_Bc etc. Rename A_mul_B! to mul! (or *!).
6) profit

@stevengj wrote:

@andyferris, the adjoint and dual of a scalar are perfectly well defined, and ctranspose(x::Number) = conj(x) is correct.

For the record, I definitely agree with this.

My feeling is that most uses of transpose are "non-mathy", and most uses of ctranspose (...) are mathy

So the idea will be to formalize this: all uses of transpose become "non-mathy" and the only use of adjoint will be "mathy".

For instance, vector.' will no longer need to be a RowVector (since RowVector is a linear algebra concept, our first attempt at a "dual" vector) - it can simply be a Matrix.

We still want .' to be lazy however. e.g. X .= f.(x, y.') should still be non-allocating.

Here's a question: How should we implement issymmetric(::AbstractMatrix{<:AbstractMatrix})? Would it be a non-recursive check, to match transpose? I'm looking at the implementation; it assumes that the elements can transpose. OTOH it seems quite valid to check if issymmetric(::Matrix{String})...

It currently is not recursive if wrapped in Symmetric btw

julia> A = [rand(2, 2) for i in 1:2, j in 1:2]; A[1, 2] = A[2, 1]; As = Symmetric(A);

julia> issymmetric(A)
false

julia> issymmetric(As)
true

julia> A == As
true

Yep, I'm working on creating a PR for this and there's lots of these kinds of inconsistencies. (I came across that one not long ago while searching for every instance of transpose, adjoint and conj in the array code).

Unless otherwise directed I will implement behavior such that issymmetric(a) == (a == a.') and ishermitian(a) == (a == a'). These seem pretty intuitive in themselves and AFAICT existing usage of issymmetric is for Number element types (or else frequently has other mistakes/assumptions that won't make a whole lot of sense for nested arrays).

(The alternative implementation is issymmetric(a) == (a == conj(adjoint(a)))... not as "pretty" nor will it work for "data" arrays (of String, etc), but it coincides on arrays of Number)

Unless otherwise directed I will implement behavior such that issymmetric(a) == (a == a.') and ishermitian(a) == (a == a')

Seems like the right solution to me. issymmetric(Matrix{Matrix{Complex{}}}) wouldn't give the expected result but at this point, it is probably the smallest price possible.

Update: Changed my mind. Symmetric is probably mainly for linear algebra

Just a small suggestion: it is often useful to do a sum over the product of two vectors (dotu ), and x.’y returning a scalar is a convenient way to do this. So I would be in favour of not return a Matrix from transpose(::Vector)

Yes, it will be a RowVector so you should get a scalar. (Note that the transpose won't be recursive).

Sorry for the possibly tangential comment, but many people in this thread keep talking about a "vector space's ring of scalars". By definition, a vector space's scalars must form a field, not just any ring. An algebraic structure that is very similar to a vector space but whose scalars only form a ring rather than a field is called a "module", but I think that modules are a little too esoteric to be handled in the Base ... er ... module.

Since we support integer arrays, we effectively support modules, not just vector spaces. Of course, we could also view integer arrays as embedded in floating-point arrays behaviorally, so they're a partial representation of a vector space. But we can also create arrays of modular integers (for example), and with a non-prime modulus we'd be working with a ring that isn't naturally embedded in any field. In short, we do actually want to consider modules in general rather than just vector spaces, but I don't think there's any significant difference for our purposes (we're generally only talking about + and *) so "vector space" serves as more familiar shorthand for "module" for our purposes.

Yes, Julia certainly does (and should) support algebraic operations on general modules as well as on true vector spaces. But as I understand it, the community's general philosophy is that functions in Base should be designed with "'ordinary' generic linear algebra routines" in mind - so, for example, exact linear-algebra calculations on integer matrices don't belong in Base - so when making fundamental design decisions, we should just assume that the scalars form a field. (Although as you said, practically speaking, it doesn't really matter.)

Crossposting https://github.com/JuliaLang/julia/pull/23424#issuecomment-346678279

I much appreciate the effort to move #5332 and #20978 forward that this pull request represents, and am on board with most of the associated plan. I would much like to also be on board with the terminology choices and downstream impact that this pull request advances, and so regularly try to convince myself that those choices yield the best set of tradeoffs available.

Each time I try to convince myself of that position, I run aground on the same set of misgivings. One of those misgivings is the substantial implementation complexity this choice forces upon LinAlg: Conflating transpose and array-flip forces LinAlg to handle both, requiring LinAlg support a much larger set of type combinations in common operations.

To illustrate, consider mul(A, B) where A and B are bare, adjoint-wrapped, transpose-wrapped, or array-flip-wrapped Matrixs. Without conflating transpose and array-flip, A can be a Matrix, an adjoint-wrapped Matrix, or a transpose-wrapped Matrix (and likewise B). So mul(A, B) need support nine type combinations. But conflating transpose and array-flip, A can be a Matrix, an adjoint-wrapped Matrix, a transpose-wrapped Matrix, or an array-flip-wrapped Matrix (and likewise B). So now mul(A, B) need support sixteen type combinations.

This problem worsens exponentially with the number of arguments. For example, without conflation mul!(C, A, B) need support twenty-seven type combinations, whereas with conflation mul!(C, A, B) need support sixty-four type combations. And of course adding Vectors and non-Matrix matrix/operator types to the mix further complicates things.

This side effect seems worth considering before moving forward with this change. Best!

This post consolidates/reviews recent discussion on github, slack, and triage, and analyzes possible paths forward. (The spiritual successor to https://github.com/JuliaLang/julia/issues/20978#issuecomment-315902532 born of thinking about how to get to 1.0.)

Three semantically distinct operations are at issue:

  • adjoint (linear algebraic, recursive, ideally lazy by default)
  • transpose (linear algebraic, recursive, ideally lazy by default)
  • array-flip (abstract-array-ic, non-recursive, ideally ?lazy? by default)

Status of these operations on master

  • Adjoint is called adjoint/' (but is eager apart from '-involving expressions specially lowered to A[c|t]_(mul|rdiv|ldiv)_B[c|t][!] calls, which avoid intermediate eager adjoints/transposes).

  • Transpose is called transpose/.' (with the same caveat as adjoint).

  • Array-flip is called permutedims(C, (2, 1)) for two-dimensional C and reshape(C, 1, length(C)) for one-dimensional C.

The relevant issues

  1. 5332: The special lowering to A[c|t]_(mul|rdiv|ldiv)_B[c|t] and associated combinatorial collection of method names should disappear by 1.0. Removing that special lowering / the associated methods names requires lazy adjoint and transpose.

  2. 13171: Adjoint (née ctranspose) and transpose were formerly conflated with array-flip via generic no-op fallbacks like transpose(x::Any) = x, ctranspose(x::Any) = conj(x), and conj(x::Any) = x. These fallbacks made operations involving [c]transpose on some user-defined numerical types fail silently (return incorrect results) absent [c]transpose specializations for those types. Silently returning incorrect results is bad news, so those fallbacks were removed (#17075). Not introducing more silent failures would be great.

  3. 17075, #17374, #19205: Removing the preceding fallbacks made array-flip less convenient. And coupled with (apologies, my fault) less-than-great associated deprecation warnings, that removal (transiently?) caused complaints. A more convenient incantation for array-flip would be nice.

  4. 21037: Removing the now-confusing .' syntax for 1.0 would be lovely. The special lowering mentioned above must be removed to enable this change.

Design objectives for resolving these issues

Remove the special lowering and associated method names, avoid introducing silent failures, provide intuitive and convenient incantations for transpose and array flip, and remove .'. Achieve the preceding with minimal additional complexity and breakage.

Design proposals

The field has been whittled down to two design proposals:

  1. Call adjoint adjoint, transpose conjadjoint ("conjugate adjoint"), and array-flip transpose. adjoint and conjadjoint live in LinAlg, and transpose lives in Base.

  2. Call adjoint adjoint, transpose transpose, and array-flip flip. adjoint and transpose live in LinAlg, and flip lives in Base.

At first blush these proposals seem only superficially different. But further examination reveals deep practical differences. Eschewing discussion of the relative superficial merits of these naming schemes, let's have a look at those deep practical differences.

Differences, high-level view

  1. Complexity:

    Proposal one, by calling array-flip transpose, forces LinAlg to handle array-flip in addition to transpose and adjoint. Consequently, LinAlg must substantially expand the set of type combinations supported by common operations; https://github.com/JuliaLang/julia/pull/23424#issuecomment-346678279 illustrates this additional complexity by way of example, and the following discussion implicitly affirms that additional complexity's existence.

    Proposal two requires LinAlg support only transpose and adjoint, which LinAlg does now.

  2. Breakage:

    Proposal one changes the basic semantics of existing operations: transpose becomes array-flip rather than transpose, and all transpose-related functionality must change accordingly. (For example, all multiplication and left-/right-division operations in LinAlg associated with the name transpose would require semantic revision.) Depending on how this change is realized, this change causes silent breakage wherever the present semantics are relied upon (purposefully or inadvertently).

    Proposal two retains the basic semantics of all existing operations.

  3. Coupling:

    Proposal one brings a linear algebraic concept (transpose) into Base, and an abstract-array concept (array-flip) into LinAlg, strongly coupling Base and LinAlg.

    Proposal two cleanly separates things abstract array and linear algebra, allowing the former to live solely in base and the latter solely in LinAlg, with no new coupling.

  4. Silent vs. loud failure:

    Proposal one leads to silently incorrect result when one calls transpose expecting transpose semantics (but instead gets array-flip semantics).

    The analog under proposal two is calling transpose on a non-numeric array expecting array-flip semantics. In this case, transpose can throw an error helpfully pointing the user to flip.

  5. .': The primary argument for not deprecating .' is transpose's length. Proposal one replaces .' with the names transpose and conjadjoint, which does not improve that situation. In contrast, proposal two provides the names flip and transpose, improving that situation.

Differences in the paths to 1.0

What does getting to 1.0 take under each proposal? The path to 1.0 is simpler under proposal two, so let's start there.

The path to 1.0 under proposal two

  1. Introduce lazy adjoint and transpose wrapper types in LinAlg, say Adjoint and Transpose. Introduce mul[!]/ldiv[!]/rdiv[!] methods dispatching on those wrapper types and containing the corresponding A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] methods' code. Reimplement the latter methods as short children of the former methods.

    This step breaks nothing, and immediately enables both removal of the special lowering and deprecation of .':

  2. Remove the special lowering that yields A[c|t]_{mul|ldiv|rdiv}_B[c|t], instead simply lowering '/.' to Adjoint/Transpose; formerly-specially-lowered expressions yielding A[c|t]_{mul|ldiv|rdiv}_B[c|t] calls then instead become equivalent mul/ldiv/rdiv calls. Deprecate A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] to corresponding mul[!]/ldiv[!]/rdiv[!] methods. Deprecate .'.

    These steps could be done in 0.7. They breaks two things only: (1) Code relying upon the special lowering to hit A[c|t]_{mul|ldiv|rdiv}_B[c|t] methods for non-Base/LinAlg types will break. Such code will emit explicit MethodErrors indicating what the new lowering yields / what the broken code need migrate to. (2) Code relying upon isolated 's/.'s behaving strictly eagerly will break. The common failure mode should also be explicit MethodErrors. All around, breakage is confined and loud.

    And that's it for changes strictly necessary for 1.0.

    At this point, Adjoint(A)/Transpose(A) would yield lazy adjoint and transpose, and adjoint(A)/transpose(A) would yield eager adjoint and transpose. The latter names could remain indefinitely or, if desired, be deprecated to some other spelling in 0.7, for example eagereval(Adjoint(A))/eagereval(Transpose(A)) modulo spelling of eagereval or eageradjoint(A)/eagertranspose(A). In the deprecation case, adjoint/transpose could then be repurposed in 1.0 (though with Adjoint(A)/Transpose(A) around, I am not certain whether that would be necessary).

    Finally...

  3. Introduce flip and/or Flip in Base. Being a feature addition, this change can happen in 1.x if necessary.

The path to 1.0 under proposal one

What about proposal one? The below outlines two possible paths. The first path consolidates changes in 0.7 but involves silent breakage. The second path avoids silent breakage, but involves more change 0.7->1.0. Both outlines continuously evolve the codebase; less continuous equivalents might consolidate/avoid some work/churn, but likely would be more challenging and error prone. The ongoing work seemingly follows the first path.

First path under proposal one (with silent breakage)
  1. Change transpose's semantics from transpose to array-flip, and necessarily also the semantics of all transpose-related functionality. For example, all A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] methods beginning At_... or ending ..._Bt[!] potentially need semantic revision. Also must change e.g. the definitions and behavior of Symmetric/issymmetric. Move transpose itself into Base.

    These changes would be silently and broadly breaking.

  2. Introduce conjadjoint in LinAlg. This step requires restoring all the methods touched in the preceding step, but in their original semantic form, and now with different names associated with conjadjoint (say Aca_... and ..._Bca[!] names). Also requires adding methods for the additional type combinations that simultaneously supporting array-flip (now transpose), transpose (now conjadjoint), and adjoint in LinAlg requires (for example the ca variants among A[c|t|ca]_{mul|ldiv|rdiv}_B[c|t|ca][!]).

  3. Introduce lazy adjoint and transpose (called conjadjoint) in LinAlg, say Adjoint and ConjAdjoint. Introduce a lazy array-flip (called transpose) wrapper type in Base, say Transpose. Introduce mul[!]/ldiv[!]/rdiv[!] methods dispatching on those wrapper types and containing the corresponding A[c|t|ca]_{mul|ldiv|rdiv}_B[c|t|ca][!] methods' code. Reimplement the latter methods as short children of the former methods.

  4. Remove the special lowering that yields A[c|t]_{mul|ldiv|rdiv}_B[c|t], instead simply lowering '/.' to Adjoint/Transpose; formerly-specially-lowered expressions yielding A[c|t]_{mul|ldiv|rdiv}_B[c|t] calls then instead become equivalent mul/ldiv/rdiv calls (though recall that semantics will have silently changed). Deprecate A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] to corresponding mul[!]/ldiv[!]/rdiv[!] methods. Similarly remove the recently introduced Aca_.../...Bca[!] methods in favor of mul[!]/ldiv[!]/rdiv[!] equivalents. Deprecate .'.

These changes would need to go in 0.7. The semantic changes to existing operations would yield broad, silent breakage. The special lowering removal would yield the same confined, loud breakage described above.

At this point Adjoint(A)/Transpose(A)/ConjAdjoint(A) would respectively yield lazy adjoint, array-flip, and transpose, and adjoint(A)/transpose(A)/conjadjoint(A) would respectively yield eager adjoint, array-flip, and transpose. The latter names could remain indefinitely or, if desired, be deprecated to some other spelling also in 0.7 (ref. above).

One could introduce ConjAdjoint earlier in this process to consolidate/avoid some work/churn, though chances are that approach would be more challenging and error prone.

Second path under proposal one (avoiding silent breakage)
  1. Introduce eager conjadjoint in LinAlg. Migrate all functionality and methods presently associated with transpose (including e.g. A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] methods involving t) to conjadjoint and derived names. Make all transpose-related names short children of the new conjadjoint equivalents. Deprecate all transpose-related names to conjadjoint equivalents.

  2. Introduce lazy adjoint and transpose (called conjadjoint) wrapper types in LinAlg, say Adjoint and ConjAdjoint. Introduce mul[!]/ldiv[!]/rdiv[!] methods dispatching on those wrapper types and containing the corresponding A[c|ca]_{mul|ldiv|rdiv}_B[c|ca][!] methods' code. Reimplement the latter methods as short children of the former methods.

  3. Introduce a lazy array-flip wrapper type in Base, called Transpose (somewhat confusingly, as simultaneously transpose is deprecated to conjadjoint with transpose semantics). Add all general methods necessary for array-flip (Transpose) to function to Base. Then add methods to LinAlg for all the additional type combinations that simultaneously supporting array-flip (now Transpose, but not transpose), transpose (now conjadjoint and ConjAdjoint), and adjoint in LinAlg requires (e.g. the mul[!]/rdiv[!]/ldiv[!] equivalents of A[c|t|ca]_{mul|ldiv|rdiv}_B[c|t|ca][!] that do not presently exist).

  4. Remove the special lowering that yields A[c|t]_{mul|ldiv|rdiv}_B[c|t], instead simply lowering '/.' to Adjoint/Transpose; formerly-specially-lowered expressions yielding A[c|t]_{mul|ldiv|rdiv}_B[c|t] calls then instead become equivalent mul/ldiv/rdiv calls. Deprecate A[c|t]_{mul|ldiv|rdiv}_B[c|t][!] to corresponding mul[!]/ldiv[!]/rdiv[!] methods. Deprecate .'.

The preceding changes would need to go in 0.7. No silent breakage, only the loud breakage from the special lowering removal as described above.

At this point Adjoint(A)/Transpose(A)/ConjAdjoint(A) would respectively yield lazy adjoint, array-flip, and transpose. adjoint(A)/conjadjoint(A) would respectively yield eager adjoint and transpose. transpose(A) would be deprecated to conjadjoint; transpose could be repurposed in 1.0 if desired. adjoint could remain indefinitely or be deprecated in favor of another spelling in 0.7. Another way to spell conjadjoint could go directly into 0.7.

One could somewhat consolidate steps 1 and 2, avoiding some work/churn, though chances are that approach would be more challenging and error prone.

#

Thanks for reading!

Thanks for the great writeup. I generally also agree with proposal 2, with as additional reason that conjugate adjoint is not at all standard terminology and I personally find it very confusing (since adjoint is sometimes used as terminology for plain transpose in certain branches of mathematics, and the extra adjective conjugate seems to imply that conjugation takes place).

I tried reading the elaborate discussions above, but failed to see at what point it was decided/motivated that a mathematical transpose should be recursive. This seemed to follow from some private discussion (somewhere between July 14 and 18), after which suddenly a mathematical and structural transpose was introduced.
@Sacha0 describes it as

How that arose: Previously "structural transpose" was conflated with "mathematical adjoint"/"mathematical transpose"

I agree (structural) transpose was conflated with "adjoint", but here "mathematical transpose" appears out of nowhere? For completeness/self-containedness, could that concept and why it should be recursive be briefly reiterated/summarised ?

In relation to that, I think that nobody is really using transpose in the abstract mathematical sense, as an operation on linear maps, for the simple reason that the transpose of a matrix would be an object that does not act on vectors but on RowVectors, i.e. it would map RowVector to RowVector. Given the lacking argumentation for recursive transpose, I really see no distinction between the plain matrix transpose that is typically defined as a (basis dependent) concept, and the newly proposed flip operation.

Thanks @Sacha0 for the great write up. I favor proposal 2 (Call adjoint adjoint, transpose transpose, and array-flip flip. adjoint and transpose live in LinAlg, and flip lives in Base.). To me this seems to give the best end result (which should be the primary concern), and also allows for a cleaner way there for v1.0. I agree with your points above so I will not repeat those, but here are some extra comments.

One argument in favor of non-recursive transpose is that the .'-syntax is very convenient, and can thus be used for e.g. Matrix{String}. Assuming that syntax is going away (#21037) and one have to use transpose(A) instead of A.', then I think that a flip-function would be a welcome addition (shorter and clearer than transpose(A), and definitely nicer than permutedims(A, (2,1))).

The decoupling between LinAlg and Base that proposal 2 gives is also very nice. Not only since LinAlg only need to care about how to handle Transpose and Adjoint, but also that it makes it clear that we consider transpose and adjoint to be LinAlg operations, and flip to be a generic AbstractMatrix thing that just flips the dimensions of a matrix.

Lastly, I have not seen much complaints about transpose(::Matrix{String}) etc not working. Is it really a common use case? In most cases it should be better to construct the flipped matrix from the start anyway.

This naming scheme (proposal 2) might indeed be better. Adjoint and transpose are pretty standard terms in linear algebra, and it is less breaking...

But failed to see at what point it was decided/motivated that a mathematical transpose should be recursive

@Jutho It's the same argument as the adjoint (the adjoint of a complex number is it's conjugate (since complex numbers are valid rank-one vector spaces and linear operators, the inner product and adjoint concepts apply), and the adjoint of a block matrix is recursive...). People may want to do linear algebra with nested, real matrices, for example. I still see a lot of code using .' for the "adjoint of a real matrix or vector", and I also used to do this. Here, ' would not only be valid but also be more generic (working for arrays which might be complex valued and/or nested). More genuine usage of recursive transpose on nested complex matrices happens when your equations give you conj(adjoint(a)), which is relatively rarer but still happens (it's rare enough that I'm happy if there is no function associated with it - in the discussions above and elsewhere various people started calling it different things like "mathematical transpose", I chose conjadoint, but none of it is great terminology IMO, except maybe transpose itself). In linear algebra the non-recursive operation which rearranges the blocks of a block matrix but doesn't do anything to the blocks themselves generally doesn't appear.

Lastly, I have not seen much complaints about transpose(::Matrix{String}) etc not working. Is it really a common use case? In most cases it should be better to construct the flipped matrix from the start anyway.

I think this type of thing is quite desirable with Julia's broadcasting operations. For example, it's pretty common to want to take the outer (Cartesian) product of some data and perhaps map it through a function. So instead of something like map(f, product(a, b)) we can use broadcast(f, a, transpose(b)) or simply f.(a, b.'). It's a lot of power in few characters.

My thoughts: to avoid adding even more function names to this space (i.e. flip), I'm wondering if we could have a default value for the permutation in permutedims(a) = permutedims(a, (2,1)). Unfortunately this is not as short as flip, but significantly less obnoxious than the full permutedims(a, (2,1)) form (and has the side effect of teaching users about the more general function).

(@Sacha0 Thank you very much for your writeup here. FYI in the choice between Adjoint and Transpose wrappers, or RowVector + MappedArray + whatever for flipped matrices as planned earlier (maybe just PermutedDimsArray), I still tend to favor the latter...)

More genuine usage of recursive transpose on nested complex matrices happens when your equations give you conj(adjoint(a))

Once you obtain conj(adjoint(a)) in your equations, how do you know that its actually conj that you wanted to apply to your matrix entries and maybe not adjoint, i.e. maybe you actually want adjoint.(adjoint(a)).

This will probably get me banned/expelled, but I am not a big fan of the whole recursive idea. I am not necessarily opposed to it either, I just don’t think there is a mathematically rigorous reason for this, as vectors and operators/matrices are not defined recursively, they are defined over fields of scalars (and by extension rings if you want to also work with modules instead of vector spaces). And so Base.LinAlg should just work in that case. The main argument

The adjoint of a vector must be a linear operator mapping it to a scalar. That is, a'*a must be a scalar if a is a vector.

is incompatible with how Julia Base works: Make a=[rand(2,2),rand(2,2)] and observe a'*a. It is not a scalar. So is a a matrix now, because it's a vector filled with matrices? The above is only a scalar if indeed you intended to use the ring of 2x2 matrices as scalars over which you defined a module. But in those contexts, it's not clear to me the above result is the intended one. Euclidean inner product is anyway rarely used in the context of modules etc, so I don’t think Base should even bother to define correct behaviour for it.

So really, even though the above statement might be an attempt to extend the motivation beyond interpreting matrices filled with matrices as block matrices, in the end I think that block matrix interpretation was the only true motivation for making adjoint and now even transpose (which is never used in the mathematical sense but always in the flip matrix indices sense) recursive in the first place. If you define a new scalar field type yourself, it’s just a strange burden that you need to define adjoint and transpose for it in addition to conj, before you can use it in a matrix.

So why, with such a powerful type system as in Julia, not just simply have a dedicated type for block matrices. And thus, playing the devil's advocate:

  • How many people are actually using/relying on the current recursive behaviour of adjoint for any practical work?
  • Are there any other languages using such a recursive approach? (Matlab, using cells of matrices, does not)

As said, I am not necessarily opposed, just questioning the validity (even after having read all of the above discussion), especially for the transpose case.

This will probably get me banned/expelled

I realize this is a joke but having an unpopular opinion absolutely will not get anyone banned. Only bad behavior that goes on for a long period can lead to banning. Even then the ban is not permanent.

I just don’t think there is a mathematically rigorous reason for this, as vectors and operators/matrices are not defined recursively, they are defined over fields of scalars (and by extension rings if you want to also work with modules instead of vector spaces).

This statement makes no sense to me. A vector of vectors, or a vector of functions, or a vector of matrices, still forms a vector space (over an underlying scalar field) with + and * scalar defined recursively, and similarly it forms an inner product space with the inner product defined recursively. The idea that vectors can only be "columns of scalars" defies both the definitions and common usage in mathematics, physics, and engineering.

Make a=[rand(2,2),rand(2,2)] and observe a'*a. It is not a scalar.

In this case the "scalar ring" of a is the ring of 2x2 matrices. So this is a linguistic problem, not a problem with the way adjoint works.

Arguing about how a'*a should work from how it currently works (or doesn't work) in Julia seems rather circular.

If you define a new scalar field type yourself, it’s just a strange burden that you need to define adjoint and transpose for it in addition to conj, before you can use it in a matrix

I think this burden is more aesthetical than practical. If you are okay with making it a subtype of Number you don't have to define anything and even if you believe that Number is not the right thing for you, the definitions are so trivial that adding them is hardly a burden.

So why, with such a powerful type system as in Julia, not just simply have a dedicated type for block matrices.

https://github.com/KristofferC/BlockArrays.jl/ Contributors welcome :)

My apologies for the "ban" joke. This will be my last reaction in order not to derail this discussion any further. It's clear that recursive adjoint (but maybe not transpose) is settled, and I am not trying to undo that decision. I am just trying to point out some inconsistencies.

@StefanKarpinski : The example of the scalar ring is exactly in contradiction with the recursion: are the 2x2 matrices themselves the scalars, or is the definition recursive and is the underlying Number type the scalar?

In the former case, you actually have a module instead of a vector space and it probably depends on the use case whether you want conj or adjoint on those 'scalars', if you would even be using inner products and adjoints at all.

I am fine with accepting the latter definition. I am using arbitrary elements in group-invariant subspaces of graded tensor products of super vector spaces in my job, so I am not really confined to columns of numbers in my definition of vectors. But vectors of matrices, are they just pure vectors or should they also inherit some matrix behavior. In the former case, dot(v,w) should produce a scalar, probably by recursively calling vecdot on its elements. In the latter case, at least vecdot(v,w) should produce a scalar by acting recursively. So that would be a minimal fix to be consistent with recursive adjoint.

But it does seem simpler not to have a recursive transpose and just allow it to be used for non-math applications as well, since I think even in typical matrix equations it has very little to do with the actual mathematical transpose of a linear map.

I think dot should call dot recursively, not vecdot, so it would error for a vector of matrices since we don't define a canonical inner product for matrices.

I’m always surprised that transpose is recursive .. and I can’t imagine that anyone is not. (I’m guessing that the views on “transpose of a linear map” Wikipedia article have at least tripled because of this discussion.)

I too have often found the recursive definitions unsettling, and have generally shared the view expressed above by @Jutho (but then again we have had a very similar training).

I think I've figured out the inconsistency that's been bothering me here - we keep using rings and fields such as the 2x2 matrix embedding of complex numbers as an example here, but it doesn't actually work out nor motivate recursive transpose (and adjoint).

Let me explain. The things I broadly agree with

  1. Scalars are valid rank-1 linear operators. Given Julia's powerful type system there is no reason why LinAlg concepts like adjoint wouldn't apply, e.g. we should define adjoint(z::Complex) = conj(z). (Beyond vectors and matrices of scalars, LinAlg concepts can also be extended (by users, I presume) to other objects - @stevengj mentioned e.g. infinite-sized vector spaces (Hilbert spaces)).
  2. We should be able to somehow deal with scalars with different representations. The prototypical example here is a complex number z = x + y*im can be modelled as Z = x*[1 0; 0 1] + y*[0 1; -1 0] and the operations +, -, *, / and \ are preserved in this isomorphism. (Note however that conj(z) goes to adjoint(Z) / transpose(Z) / flip(Z) - more on that later).
  3. Block matrices should be possible somehow (The current approach relies on recursive adjoint by default, etc).

It seems reasonable that Base.LinAlg be compatible with 1 and 2, but IMO 3 should only be done in Base if it fits in naturally (otherwise I would tend to defer to external packages like https://github.com/KristofferC/BlockArrays.jl).

I now realize we have been conflating 2 and 3 and this leads to some inconsistencies (@Jutho pointed this out too). Below, I wish to show that 3. using recursive adjoint and other operations as per block matrices does not give us 2. The easiest way is by example. Let's define a 2x2 matrix of complex numbers as m = [z1 z2; z3 z4], the isomorphic representation M = [Z1 Z2; Z3 Z4], and a standard 2x2 block matrix b = [m1 m2; m3 m4] where m1 etc are equally sized square matrices of Number. The semantically correct answers for common operations are listed below:

| operation | z | Z | m | M | b |
| - | - | - | - | - | - |
| +, -, * | recursive | recursive | recursive | recursive | recursive |
| conj | conj(z) | Z' or Z.' or flip(Z) | conj.(m) | adjoint.(M) (or transpose.(M)) | conj.(b) |
| adjoint | conj(z) | Z' or Z.' or flip(Z) | flip(conj.(m)) or recursive | flip(transpose.(m)) or recursive | recursive |
| trace | z | Z | z1 + z4 | Z1 + Z4 | trace(m1) + trace(m4) |
| det | z | Z | z1*z4 - z2*z3 | Z1*Z3 - Z2*Z3 | det(m1) * det(m4 - m2*inv(m1)*m3) (if m1 invertible, see e.g. Wikipedia) |

Considering operations like trace and det which return scalars, I think it's pretty clear that our Julia type system for LinAlg couldn't possibly handle our 2x2 matrix embedding of Complex in an "automatic" way, inferring what we happen to mean by "scalar" in any particular moment. A stark example is trace(Z) where Z = [1 0; 0 1] is 2, while this is our representation of 1. Similarly for rank(Z).

One way to recover consistency is to explictly define our 2x2 representation as scalar, e.g. by subtyping Number like below:

struct CNumber{T <: Real} <: Number
    m::Matrix{T}
end
CNumber(x::Real, y::Real) = CNumber([x y; -y x])

+(c1::CNumber, c2::CNumber) = CNumber(c1.m + c2.m)
-(c1::CNumber, c2::CNumber) = CNumber(c1.m - c2.m)
*(c1::CNumber, c2::CNumber) = CNumber(c1.m * c2.m)
/(c1::CNumber, c2::CNumber) = CNumber(c1.m / c2.m)
\(c1::CNumber, c2::CNumber) = CNumber(c1.m \ c2.m)
conj(c::CNumber) = CNumber(transpose(c.m))
zero(c::CNumber{T}) where {T} = CNumber(zero(T), zero(T))
one(c::CNumber{T}) where {T} = CNumber(one(T), one(T))

With these definition, generic LinAlg methods will probably work fine.

The conclusion I draw from this: recursive adjoint is a convenience for block matrices and vectors of vectors. It should not be motivated by mathematical correctness for the types of "scalar" 2x2 matrices I labelled Z above. I see it as our choice whether we support block matrices or not by default, with the following pros and cons:

Pros

  • Convenience for block array users
  • Vectors of vectors are valid vector spaces, and block matrices are valid linear operators, under +, *, conj, etc. If it is possible to make this a natural isomorphism (unlike the Z example above, which requires CNumber), then why not?

Cons

  • Significant additional complexity to our implementation of LinAlg (that could potentially live in another package).
  • It's a bit difficult (but not impossible) to support things like eig(block_matrix). If we say LinAlg supports eig and LinAlg supports block matrices, then I consider this a bug until it is fixed. Given the large amount of functionality provided by LinAlg it's hard to see that we'll ever be "finished".
  • Inconvenience of data users wanting to use operations like transpose in a non-recursive way,

To me the question is - do we choose to say that by default LinAlg expects the elements of AbstractArrays to be "scalar" (subtypes or duck-typings of Number) and farm out the complexity of block arrays to external packages? Or do we embrace the complexity in Base and LinAlg?

The plan until a day ago had been the latter.

@Sacha0 I've been trying to complete the necessary RowVector work to support the changes here (the earlier ones: non-recursive transpose, RowVector supporting strings and other data) and now I'm wondering what you had in mind in terms of design.

From the recent discussions, I see these cases with a guess of what might be desired

| | Vector | Matrix |
|-|-|-|
| adjoint | RowVector with recursive adjoint | AdjointMatrix or TransposedMatrix with recursive adjoint |
| transpose | RowVector with recursive transpose | TransposeMatrix with recursive transpose |
| flip | a copy or PermutedDimsArray? | a copy or PermutedDimsArray? |
| conj of AbstractArray | lazy or eager? | lazy or eager? |
| conj of RowVector or TransposedMatrix | lazy | lazy |

(The above attempts to separate linear algebra concerns from permuting dimensions of arrays of data.)

So a few basic questions to get me unstuck:

  • Are we doing recursive transpose or not? What about adjoint?
  • If so, will we continue to assume conj(transpose(array)) == adjoint(array)?
  • It seems at least some complex conjugations will be lazy, to support e.g. all the current BLAS operations with no extra copies. Do we make conj lazy for all arrays?
  • If we introduce flip, is it lazy or eager?

FYI I tried a low-friction approach to "flipping" the dimensions of a matrix at #24839, using a shorter form for permutedims.

I am strongly in favour of @Sacha0's proposal 2. We have not fully sorted out the theoretical underpinnings of recursive vs non-recursive adjoint, but independently from that: how it is now has turned out to be useful, at least to me, and a last minute change away from that is perhaps ill advised. Transpose should follow adjoint (=conj∘adjoint) in this regard, if it is at all needed.

FWIW, Mathematica does not do recursive Transpose nor ConjugateTranspose:

untitled

I tried reading the elaborate discussions above, but failed to see at what point it was decided/motivated that a mathematical transpose should be recursive. [...] For completeness/self-containedness, could that concept and why it should be recursive be briefly reiterated/summarised?

I understand and appreciate this sentiment, and would like to address it insofar as time allows. Accessibly and lucidly explaining why adjoint and transpose should be recursive by definition is unfortunately challenging at best and likely not possible in brief. Coherent, comprehensive writeups like that above take an enormous amount of time to construct. Time being short, over the course of the day I will instead try to address some of the confusion in the preceding few posts by responding to particular points/questions therein; please bear with me while I do so piecemeal :). Best!

explaining why adjoint... should be recursive by definition…

I'm pretty everyone who has contributed to this discussion agrees that adjoint(A) should be recursive. The only point of dispute is whether transpose(A) should also be recursive (and whether a new function flip(A) should be introduced for non-recurrsive transpose).

I'm pretty everyone who has contributed to this discussion agrees that adjoint(A) should be recursive.

See, e.g., https://github.com/JuliaLang/julia/issues/20978#issuecomment-347777577 and earlier comments :). Best!

The argument for recursive adjoint is pretty basic to the definitions. dot(x,y) should be an inner product, i.e. produce a scalar, and the definition of adjoint is that dot(A'*x, y) == dot(x, A*y). Recursion (for both dot and adjoint) follows from these two properties.

(The relationship to dot products is the whole reason why adjoint and transpose are important operations in linear algebra. It's why we have a name for them, and not for other operations like rotating matrices by 90 degrees (rot90 in Matlab).)

Note that one problem with using flip for non-recursive transpose is that both Matlab and Numpy use flip for what we call flipdim.

I think that nobody is really using transpose in the abstract mathematical sense, as an operation on linear maps, for the simple reason that the transpose of a matrix would be an object that does not act on vectors but on RowVectors, i.e. it would map RowVector to RowVector.

But it does seem simpler not to have a recursive transpose and just allow it to be used for non-math applications as well, since I think even in typical matrix equations it has very little to do with the actual mathematical transpose of a linear map.

transpose (which is never used in the mathematical sense but always in the flip matrix indices sense)

This will seem a bit roundabout, so bear with me :).

Julia's adjoint refers specifically to Hermitian adjoint. In general, for U and V complete normed vector spaces (Banach spaces) with respective dual spaces U* and V, and for linear map A : U -> V, then the adjoint of A, typically denoted A, is a linear map A* : V* -> U*. That is, in general the adjoint is a linear map between dual spaces, just as in general the transpose A^t is a linear map between dual spaces as mentioned above. So how does one reconcile these definitions with the familiar notion of Hermitian adjoint? :)

The answer lies in the additional structure possessed by the spaces one typically works in, namely complete inner product spaces (Hilbert spaces). An inner product induces a norm, so a complete inner product space (Hilbert) is a complete normed space (Banach), and thereby supports the concept of (Hermitian) adjoint. Here's the key: Having an inner product rather than merely a norm, one of the most beautiful theorems in linear algebra applies, namely the Riesz representation theorem. In a nutshell, Riesz representation theorem states that a Hilbert space is naturally isomorphic to its dual space. Consequently, when working with Hilbert spaces, we generally identify the spaces and their duals and drop the distinction. Making that identification is how you arrive at the familiar notion of Hermitian adjoint as A* : V -> U rather than A* : V* -> U*.

And the same identification is generally made for transpose when considering Hilbert spaces, such that also A^t : V -> U, yielding the familiar notion of transpose. So to clarify, yes, the common notion of transpose is the general notion of transpose applied to the most familiar (Hilbert) setting, just as the common notion of Hermitian adjoint is the general notion of adjoint applied to that setting. Best!

Apologies for beating a dead horse, but I thought I'd briefly summarize the issues of mathematical confusion in a different way. The key point of disagreement is how to mathematically interpret an object Vector{T} when T is not just a subtype of Number with no array-like substructure.

One school of thought accepts @stevengj's claim that

a vector of vectors over a ring R is best understood as a direct-sum space, which also a vector space over R.

Modulo certain subtleties regarding infinite-dimensional vector spaces, etc., this basically just means that we should think of vectors of vectors as block vectors. So a Vector{Vector{T<:Number}} should be mentally "flattened out" into a simple Vector{T}. Within this paradigm, linear operators that are represented as matrices of matrices should be similarly thought of as block matrices, and adjoint should certainly be recursive, as should transpose if we are using the word in the mathematical sense. Please correct me if I'm wrong, but I believe that the people who take this point of view think that something like Vector{Matrix{T}} has no natural-enough interpretation that we should design for it. (In particular, we should not simply assume that the inner Matrix is a matrix representation of a complex number, because as @stevengj said,

If you are representing complex numbers by 2x2 matrices, you have a different complexified vector space.

)

The other school of thought is that a Vector{T} should always be thought of a representation of a vector in an abstract vector space over scalars (in the linear algebra sense of the word) of type T, regardless of the type T. In this paradigm, a Vector{Vector{T'}} should not be thought of as an element of a direct sum space, but instead as a vector over the scalars Vector{T'}. In this case, transpose(Matrix{T}) should not be recursive, but should simply flip the outer matrix. One issue with this interpretation is that in order for the elements of type T to form a valid ring of scalars, there must be a well-defined notion of (commutative) addition and of multiplication. For a vector like Vector{Vector{T'}}, we would need a rule for multiplying two "scalars" Vector{T'} into another Vector{T'}. While one could certainly come up with such a rule (e.g. elementwise multipication, which punts the issue down to how to multiply T'), there is no universal natural way to do so. Another issue is with how adjoint would work under this interpretation. The Hermitian adjoint is only defined on linear operators over a Hilbert space, whose scalar field by definition must be either the real or complex numbers. So if we want to apply adjoint to a matrix Matrix{T}, we must assume that T is some representation of the complex numbers (or real numbers, but I'll just stick with complex because that case is more subtle). In this interpretation, adjoint should not be recursive but should flip the outer matrix and then apply conjugate. But there are more problems here, because the correct action of conjugate(T) depends on the nature of the representation. If it's the 2x2 representation described on Wikipedia, then conjugate should flip the matrix. But for reasons described above, conjugate definitely shouldn't always flip matrices. So this approach would be a bit of a mess to implement.

Here are my own thoughts: there is no "objectively correct" answer as to whether transpose should be recursive when applied to arrays whose elements have further complicated substructure. It depends on the how exactly the user is choosing to represent their abstract algebraic structure in Julia. Nevertheless, supporting completely arbitrary rings of scalars seems like it would be very difficult, so I think for practicality's sake we shouldn't try to be that ambitious and should forget about the esoteric math of modules over non-standard rings. We should certainly have a function in Base (with simpler syntax than permutedims(A, (2,1))) that has nothing to do with the linear algebra concept of transposition and simply flips matrices and does nothing recursive, regardless of whether it's called transpose or flip or what. It would be nice if adjoint and a separate transpose function (possibly with a different name) in LinAlg were recursive, because then they could handle block vectors/matrices and the simple implementation of the direct sum as vectors of vectors, but this isn't required by "objective mathematical correctness," and it would be fine to make that decision purely on the grounds of ease of implementation.

Despite my previous promise not to comment anymore, I unfortunately have to respond to this one.

Julia's adjoint refers specifically to Hermitian adjoint. In general, for U and V complete normed vector spaces (Banach spaces) with respective dual spaces U* and V, and for linear map A : U -> V, then the Hermitian adjoint of A, typically denoted A, is a linear map A* : V* -> U*. That is, in general the Hermitian adjoint is a linear map between dual spaces, just as in general the transpose A^t is a linear map between dual spaces as mentioned above. So how do you reconcile these definitions with the familiar notion of Hermitian adjoint? :)

Really that is not true. What you are describing here is really the transpose, but (as I already mentioned), in some fields this is referred to as the adjoint (without Hermitian) and denoted as A^t or A^* (never A^dagger). In fact, it extends well beyond vector spaces, and in category theory such a concept exists in any monoidal category (e.g. the cateogory Cob of n-dimensional oriented manifolds with cobordisms as linear maps), where it is referred to as adjoint mate (in fact there can be two different ones A and A, since left and right dual space are not necessarily the same). But note, this does never involve complex conjugation. The elements of V* are indeed the linear maps f:V->Scalar , and for a linear map A:U->V and a vector v from U, we have f(Av) = (A^t f)(v). Since the action of f does not involve complex conjugation, neither does the definition of A^t.

The answer lies in the additional structure possessed by the spaces you typically work in, namely complete inner product spaces (Hilbert spaces). An inner product induces a norm, so a complete inner product space (Hilbert) is a complete normed space (Banach), and thereby supports the concept of Hermitian adjoint. Here's the key: Having an inner product rather than merely a norm, one of the most beautiful theorems in linear algebra applies, namely the Riesz representation theorem. In a nutshell, Riesz representation theorem states that a Hilbert space is naturally isomorphic to its dual space. Consequently, when working with Hilbert spaces, we generally identify the spaces and their duals and drop the distinction. Making that identification is how you arrive at the familiar notion of Hermitian adjoint as A* : V -> U rather than A* : V* -> U*.

Again, I don't think that is fully correct. In inner product spaces, the inner product is a sesquilinear form dot from conj(V) x V -> Scalar (with conj(V) the conjugate vector space). This allows indeed to establish a map from V to V* (or technically from conj(V) to V*), which is indeed the Riesz representation theorem. However, we don't need that to introduce Hermitian adjoint. Really, the inner product dot is itself sufficient, and the Hermitian adjoint of a linear map A is such that
dot(w, Av) = dot(A' w, v). This involves complex conjugation.

Really that is not true. What you are describing here is really the transpose, but (as I already mentioned), in some fields this is referred to as the adjoint (without Hermitian) and denoted as A^t or A^* (never A^dagger). [...]

@Jutho, please see, for example, the Wikipedia page on Hermitian adjoint.

Maybe there are inconsistenties between different fields of mathematics, but:
https://en.wikipedia.org/wiki/Transpose_of_a_linear_map
and in particular
https://en.wikipedia.org/wiki/Transpose_of_a_linear_map#Relation_to_the_Hermitian_adjoint
and uncountably many references in category theory, e.g.
https://arxiv.org/pdf/0908.3347v1.pdf

https://en.wikipedia.org/wiki/Transpose_of_a_linear_map
and in particular
https://en.wikipedia.org/wiki/Transpose_of_a_linear_map#Relation_to_the_Hermitian_adjoint

@Jutho, I see no inconsistency between that page section and the definitions given on the page I linked above, nor do I see any inconsistency with what I posted above. Best!

I'll also sign on to @Sacha0 's proposal 2. I'm ok with a 1-argument permutedims for now as well; I think that's better than flip.

@Sacha0 , then we have a different way of interpreting that. I read this as
For a given A:U->V ,
transpose(A)=dual(A)=(sometimes also) adjoint(A): V* -> U*
hermitian adjoint(A)=dagger(A)=(typically just) adjoint(A): V->U
and the relation between both is exactly obtained by using the map from space to dual space (i.e. Riesz ...), which involves complex conjugation. Hence, hermitian adjoint involves conjugation, transpose does not.

If you also want to call the first one hermitian adjoint, then what do you call transpose? You didn't really define what that was in your description, you just mentioned

Hermitian adjoint of A, typically denoted A, is a linear map A : V* -> U*. That is, in general the Hermitian adjoint is a linear map between dual spaces, just as in general the transpose A^t is a linear map between dual spaces

So are transpose and Hermitian adjoint then two different ways of transforming A:U->V into a map from V->U ? I'd really be happy to discuss further about this, but I guess we better do this elsewhere. But really, contact me as I am actually very interested to learn more about this.

Also see http://staff.um.edu.mt/jmus1/banach.pdf for a reference that adjoint as used in the context of Banach spaces is really transpose, and not Hermitian adjoint (in particular its a linear and not an antilinear transformation). Wikipedia (and other references) are really conflating these two concepts, using the notion of Hermitian adjoint in Hilbert spaces as a motivation for a generalized definition of adjoint in Banach spaces. However, the latter is really transpose (and does not need an inner product, nor a norm). But that's the transpose that I was talking about, that nobody is really using in computer code.

For Julia Base: I am not opposed to recursive Hermitian conjugation; I agree it will often be the correct thing to do. I am just not sure Base should try to do clever things when the element type is not Number. Even with T is number, there is no support in Base for the much more common usage of non-Euclidean inner products (and associated modified definitions of adjoint), nor do I think there should be. So I think the main motivation was block matrices, but there I just think a special purpose type (in Base or in a package) is much more Julian, and, as @andyferris also mentioned, it's not like all the rest of LinAlg supports that notion of block matrices, even simple things like inv (let alone matrix factorizations etc).

But if recursive Hermitian conjugation is here to stay (fine by me), then I think for consistency dot and vecdot should act recursively on the elements. Currently, this is not the case: dot calls x'y on the elements (which is not the same when the elements are matrices) and vecdot calls dot on the elements. So for a vector of matrices, there is actually no way to get a scalar result. I'd be happy to prepare a PR if people agree that the current implementation is not really inconsistent with recursive adjoint.

As for transpose, it just seems simpler to make it non-recursive and also let it be used by those that do not work with numerical data. I think most people working with matrices know the term transpose and will look for it. There is still conj(adjoint()) for those that need a recursive transpose.

Triage thinks @Sacha0 should move ahead with proposal 2 so we can try it out.

I strongly agree with @ttparker that recursive adjoint is a choice, and not the only mathematically consistent option. For example, we could simply state:

1 - to LinAlg, an AbstractVector v is a length(v)-dimensional vector with (scalar) basis weights v[1], v[2], ..., v[length(v)].

(and similarly for AbstractMatrix).

This would probably be the assumption many people would make coming from other libraries, and having such simple definition of basis vectors, rank, etc, helps keep implementation simple. (Many would probably say that numerical linear algebra is feasible to implement on a computer precisely because we have a nice basis set to work in.)

Our current approach is more like:

2 - to LinAlg, an AbstractVector v is a direct sum of length(v) abstract vectors. We also include enough definitions on scalar types like Number so that to LinAlg they are valid one-dimensional linear operators / vectors.

and similarly for (block) matrices. This is wildly more generalized than the linear algebra implementations in MATLAB, numpy, eigen, etc, and it's a reflection of Julia's powerful type/dispatch system that this is even feasible.

The overarching reason I see option 2 as being desirable is again that Julia's type/dispatch system allows us to have a much broader goal, which vaguely goes like:

3 - In LinAlg, we are attempting to create a generic linear algebra interface that works for objects which satisfy linearity (etc) under +, *, conj (etc), treating such objects as linear operators / members of a Hilbert space / whatever as appropriate.

Which is a really cool goal (surely much beyond any other programming language/library that I'm aware of), completely motivates recursive adjoint and 2 (because +, * and conj are themselves recursive) and is why @Sacha0's proposal 2 and the triage decision is a good choice :)

I'd really be happy to discuss further about this, but I guess we better do this elsewhere. But really, contact me as I am actually very interested to learn more about this.

Cheers, let's do! I look forward to chatting further offline :). Best!

Nice summary Andy! :)

Fully agreed Andy, at least for adjoint (which was the topic of your comment).

However, one final plea for a non-recursive transpose, before I forever hold my peace (hopefully).
I see a non-recursive transpose having the following advantages:

  • It can be used by all people working with matrices, even containing non-numerical data. That's also how they will likely know this operation and look for it, from other languages and from basic math extrapolated to their non-numerical use case
  • No need to write extra code to have the lazy flip type or PermutedDimsArray interact with LinAlg. What if I have a numerical matrix which I flipped instead of transposed; will I still be able to multiply it with other matrices (preferably using BLAS)?

  • With a non-recursive transpose and a recursive adjoint, we can easily have a non-recursive adjoint as conj(transpose(a)) and a recursive transpose conj(adjoint(a)). And still everything will interact nicely with LinAlg.

So what are the disadvantages. I see none. I still stand by my point that really nobody is using transpose in its mathematical sense. But instead of trying to argue any further, can anybody give me an concrete example where a recursive transpose is needed or useful, and where really it is a mathematical tanspose? This excludes any example where you actually intended to use adjoint but are just having real numbers. So an application where there is a mathematical reason to transpose a vector or matrix filled with more matrices which are itself complex valued.

I can say that at least Mathematica (which one would expect to have devoted considerable thought to this) does not do recursive transpose:

A = Array[1 &, {2, 3, 4, 5}];
Dimensions[A]  # returns {2, 3, 4, 5}
Dimensions[Transpose[A]] # returns {3, 2, 4, 5}

EDIT: Ooops, this was also commented above, sorry

So I'm confused. There seemed to be pretty solid consensus that transpose should be made non-recursive - e.g. https://github.com/JuliaLang/julia/issues/20978#issuecomment-285865225, https://github.com/JuliaLang/julia/issues/20978#issuecomment-285942526, https://github.com/JuliaLang/julia/issues/20978#issuecomment-285993057, https://github.com/JuliaLang/julia/issues/20978#issuecomment-348464449, and https://github.com/JuliaLang/julia/pull/23424. Then @Sacha0 gave two proposals, one of which would leave transpose recursive but introduce a non-recursive flip function, which got strong support despite (as far as I can tell) not really having been raised as a possibility before. Then @JeffBezanson suggested that we don't need flip after all if we give permutedims a default second argument, which also got strong support.

So now the consensus seems to be that the only real changes to transpose should be "behind the scenes": regarding special lowering and lazy vs. eager evaluation, which the typical end user probably won't know or care about. The only really visible changes are essentially just spelling changes (depreciating .' and giving permutedims a default second argument).

So the community consensus seems to have changed almost 180 degrees in a very short time (around the time of @Sacha0's post https://github.com/JuliaLang/julia/issues/20978#issuecomment-347360279). Was Sacha's post so eloquent that it just changed everyone's mind? (That's fine if so, I just want to understand why we seem to moving forward on a path that just a few days ago we'd all seemed to agree was the wrong one.)

I forget if anyone's suggested this, but could we just make transpose(::AbstractMatrix{AbstractMatrix}) (and possibly transpose(::AbstractMatrix{AbstractVector}) as well) recursive and transpose non-recursive otherwise? That seems like it would cover all the bases and I can't think of any other use case where you'd want tranpose to be recursive.

So the community consensus seems to have changed almost 180 degrees in a very short time (around the time of @Sacha0's post #20978 (comment)). Was Sacha's post so eloquent that it just changed everyone's mind? (That's fine if so, I just want to understand why we seem to moving forward on a path that just a few days ago we'd all seemed to agree was the wrong one.)

If only I were so eloquent 😄. What you are seeing is that consensus had not actually formed. Rather, (1) participants that favor the status quo, but had withdrawn from discussion due to attrition, returned to express an opinion; and (2) other parties that had not considered what moving away from the status quo would entail in practice (and how that might play with release considerations) formed a stronger opinion in favor of the status quo and expressed that opinion.

Please consider that this discussion has been ongoing in one form or another on github since 2014, and likely earlier offline. For long-term participants, such discussions become exhausting and cyclic. There being meaningful work to do other than engage in this discussion --- like writing code, which is more enjoyable --- the result is attrition among those long-term participants. Consequently, the conversation appears lopsided during one period or another. Personally, I am about at that attrition threshold, so I am going to focus on writing code now rather than continue engaging in this discussion. Thanks all and best! :)

I'll cast a small vote in favor of non-recursive transpose and ctranspose for AbstractArrays, with both being recursive on AbstractArray{T} where T<:AbstractArray.

I agree that recursive behavior is 'correct' in some cases, and I see the question as how do we achieve the correct behavior with the least amount of surprise for those using and developing packages.
In this proprosal, recursive transpose behavior for custom types is opt-in: you opt-in by making your type an AbstractArray or by defining the appropriate method
Base.transpose(AbstractArray{MyType}) or Base.transpose(AbstractArray{T}) where T<: MyAbstractType.
I think the duck typing strategy for recursive transposes (just recursing without asking) produces a few surprises as documented above. If you introduce distinct ctranspose and adjoint, or more complicated proposals like conjadjoint and flip, users will encounter these and try to use them, and package maintainers will try to support them all.

As an example of something that would be hard to support under the new proposals: normal, transpose, ctranspose, and conj arrays all should be able to have views (or lazy evaluation) which interop with ReshapedArray and SubArray views. (I'm agnostic about whether these produce views by default or only when using @view. ) This connects to the work on the A*_mul_B* lowering and on lower level BLAS calls with flags 'N', 'T', and 'C' for dense arrays, as has been noted elsewhere. These would be easier to reason about if they treated normal, transpose, ctranspose, and conj
on equal footing. Note that BLAS itself only supports 'N' for normal, 'T' for transpose, and 'C' for ctranspose and has no flag for conj, which I think is a mistake.

Finally, for consistency with higher dimensional Arrays and reshapes, I believe the appropriate generalization of transpose and ctranspose is to reverse all of the dimensions, i.e.
transpose(A::Array{T, 3}) = permutedims(A, (3, 2, 1)).

Cheers!

I very much appreciate the people actually doing the work. What has been discussed at way too much length is vector adjoints/transpose (but never the recursive aspect of it), until @andyferris stepped up and implemented this, and it works wonderfully well. Similarly, I also greatly appreciate the ongoing redesign of array constructors. Thumbs up for all of that.

That being said, matrix transpose and adjoint/ctranspose never got much discussion, especially not the recursive aspect of it, which was almost silently introduced in https://github.com/JuliaLang/julia/pull/7244 with as single motivation block matrices. Various reasons and motivations for recursive adjoint have been given (after the facts), and most people can agree on that being a good (but not the only) choice. Transpose however is lacking even a single motivation or actual use case.

There's a few separate things going on in these discussions, and it happens right now that we need a plan that can be implemented quickly.

  • We've discussed whether supporting block matrices (and more exotic structures) is worthwhile in LinAlg. Implementation choices are: no recursive stuff at all (except +, * and conj because that's the nature of generic functions in Julia), recursive everything (the status quo), or trying for some kind of type-check or trait for whether an element should do recursive linear algebra or be treated as scalar.
  • We want a nice way for users to permute the dimensions of a 2D array of data. We've got non-recursive transpose, flip, shortened permutedims syntax (that PR was submitted first purely because it is the fewest characters to implement, and probably makes sense even if we do something else as well), some kind of type-check or trait for whether an element should do recursive transpose (perhaps even reintroducing transpose(x::Any) = x...).
  • The Julia parser has strange behavior like x' * y -> Ac_mul_B(x, y) which is a bit of a wart, which ideally won't exist in v1.0. This hasn't been seen as feasible until we can support fast BLAS (no extra copies) without it, thus lazy matrix transpose & adjoint.
  • The code in LinAlg is quite large and built up over a number of years. Many things like matrix multiplication could be refactored to be more trait-friendly, perhaps with a dispatch system more like the new broadcast. I think this is where we can make it easier to send the right arrays (I'm thinking PermuteDimsArray of conjugated reshaped ajdointed views of strided matrices) to BLAS. However, this won't make v1.0 and we are also trying to avoid performance regressions without making the code much worse. As Sacha pointed out (and I'm discovering) having transpose views with a wide range of behaviors on the elements (recursive adjoint, recursive transpose, conjugation, nothing) makes for additional complexity and a bunch of new methods to keep things working as they are.

If we think about v1.0 as somewhat stabilizing the language, then in some senses the biggest priority to make a change in behavior is the third one. I'd say: the language (including parser) should be most stable, followed by Base, followed by the stdlib (which may or may not include LinAlg, but I think almost definitely will include BLAS, Sparse, etc one day). It's a change that doesn't really affect users (mostly library developers), so I wouldn't be surprised if people's opinions differ here.

Spot on Andy! :)

I think the only thing left to do here is make adjoint and transpose lazy by default?

Can this be closed now?

Next up: "Taking scalar transposes seriously"

But seriously though, can we have a good interface for specifying the different 3D transposes and tensor multiplications which are used in PDE solvers? Kind of serious, but I'm not sure if I could handle being the OP to the next iteration of this madness.

no

:)

can we have a good interface for specifying the different 3D transposes and tensor multiplications which are used in PDE solvers

Definitely seems like a good subject for a package.

a good interface for specifying the different 3D transposes and tensor multiplications

Does TensorOperations.jl not do what you need here? (Note that at this level "a good interface" means something like a tensor network diagram, which is slightly challenging to write in code any more succinctly than the syntax of TensorOperations).

Yeah, TensorOperations.jl looks good. I was slightly kidding, but I got what I needed out of it 👍 .

Was this page helpful?
0 / 5 - 0 ratings

Related issues

musm picture musm  ·  3Comments

felixrehren picture felixrehren  ·  3Comments

iamed2 picture iamed2  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments

i-apellaniz picture i-apellaniz  ·  3Comments