Julia: implement Ac_mul_B(Vector, Matrix) to avoid temporary

Created on 16 Aug 2016 · 35Comments · Source: JuliaLang/julia

Let mx be a 3x3 matrix, and vv be an nx3 matrix.

Up to now v=vv[row,:] was a row vector, i.e. a 1x3 matrix,
so the matrix multiplication v*mx worked, and provided a 1x3 row vector.

In 0.5.0 Julia this has changed,
v=vv[row,:] rows are now real 1-dimensional vectors.

This is welcome change.
But as a side effect, it became more obvious and painful
that v*mx does not work for real 1-dimensional vectors.

I know that transpose(mx)*v is a workaround,
but I think this code rewrite is an unnecessary requirement.

Considering the original definition of matrix multiplication,
summing inner indices should naturally work in this case.
A good example is Fortran's matmul(v,mx): it just works fine.

I would greatly appreciate your opinion,
and any pointers to your future plans.

help wanted linear algebra

Source

goszlanyi

Most helpful comment

What are the three types of product in linear algebra? I still don't see where vector times matrix is some standard mathematical product in any given context.

Either one works in MATLAB style where everything is matrices and * always denote matrix multiplication without discussion, or you work in some more abstract setting and you indeed need to define what * means. In abstract linear algebra, I know of inner and outer products, but most of the time * in code is neither of those two.

In an abstract setting, matrices correspond to linear mappings from vectors to vectors. So in mx*v I read * as function application. Composing such linear mappings gives rise to matrix matrix multiplication. So in m1*m2 I read * as function composition ◦. Given that these two different operations are rather close, I think it is ok to use the same symbol for both.

There is also the concept of linear forms / covectors which map vectors to numbers. These can be represented as one-dimensional objects and are isomorphic to vectors, namely using the inner product. To any linear form f, there corresponds a vector w such that f(v) = dot(w,v) for any v. You can also compose such a linear form with a linear mapping (represented by a matrix), which I think is the operation I think you are referring to. But I don't see why this would be a standard use of * in mathematics. Such linear forms are isomorphic to vectors, but not identical, and thus julia currently forces you to act with the transpose on a vector in order to get the corresponding covector (well, not exactly, but that is indeed the point of issue #4774 ).

I am not defending that this choice is ideal or optimal, but I also don't see how v*mx would in any way be more standard? If any, I would argue that you actually need to use the dot product for this operation, so that this should be written as dot(v,mx). Maybe that's actually a possibility which is open, since currently dot is not defined for a combination of a vector and a matrix argument.

Jutho on 22 Aug 2016

👍2

All 35 comments

What about either doing

v'mx

v=vv[row:row,:]

andreasnoack on 16 Aug 2016

Yes, these are all working tricks.
But conceptually not clean.

I think Julia is still struggling with some Matlab-like behaviour.
Having true 1-dimensional vectors is very important,
and with 0.5.0 Julia made a big step in this direction.
It would be nice reaching a really consistent state of its own.

goszlanyi on 16 Aug 2016

How does Fortran handle complex conjugation in matmul(Vector,Matrix)? Is x*A the same as A'x or conj(A'x)?

andreasnoack on 16 Aug 2016

Personally I don't see v'mx as a trick; it seems like the sensible way to go about this. If you think of two vectors in some space, the proper way to multiply them would be to multiply one by the transpose of the other, as vectors from a purely mathematical standpoint don't necessarily have an "orientation" (i.e. row or column).

I think Julia is still struggling with some Matlab-like behaviour.

Actually, what we had prior to APL-style slicing was more similar to Matlab, which makes a distinction between column and row vectors.

ararslan on 16 Aug 2016

@andreasnoack

As far as I see, there is no implicit complex conjugation involved.

This is what the Metcalf-book says:
matmul(matrix_a,matrix_b) performs matrix multiplication.
For numeric arguments three cases are possible:

i) matrix_a has shape (n,m) and matrix_b has shape (m,k).
The result has shape (n,k) and element (i,j) has the value
sum( matrix_a(i,:) * matrix_b(:,j) )

ii) matrix_a has shape(m) and matrix_b has shape (m,k).
The result has shape (k) and element (j) has the value
sum( matrix_a * matrix_b(:,j) )

iii) matrix_a has shape (n,m) and matrix_b has shape (m).
The result has shape (n) and the elemeny (i) has the value
sum( matrix_a(i,:) * matrix_b )

goszlanyi on 16 Aug 2016

Hm. Thanks. That was a surprising choice. In any case, I think this is very much #4774 territory.

andreasnoack on 16 Aug 2016

👍1

Also note, that in Fortran transpose is defined only for an array of rank two.
In _Fortran-speak_ creating a one-column matrix from a 1-dimensional vector is not a transpose.

goszlanyi on 16 Aug 2016

@ararslan
I agree that from the current options v'mx seems to be the best.
But is there a performance penalty?

goszlanyi on 16 Aug 2016

Performance penalty compared to what?

ararslan on 16 Aug 2016

Compared to storing predefined mxT=mx' and using mxT*v,
that also returns a nice 1-dimensional vector.

goszlanyi on 16 Aug 2016

I wouldn't think so offhand (though I'm certainly no expert), but you could check with @time. :)

ararslan on 16 Aug 2016

No performance penalty, no.

KristofferC on 16 Aug 2016

🎉1

I have just tested.
There is a significant performance penalty.

n = Int(10e6)
vv = rand(n,3)
mx = rand(3,3)
mxT = mx'

f1(vv,mx) = (@time for i=1:n; y=vv[i,:]'*mx; end)
f2(vv,mxT) = (@time for i=1:n; y=mxT*vv[i,:]; end)

f1(vv,mx)
  4.843726 seconds (120.00 M allocations: 5.215 GB, 8.36% gc time)
  4.836846 seconds (120.00 M allocations: 5.215 GB, 8.33% gc time)

f2(vv,mxT)
  2.714492 seconds (80.00 M allocations: 3.278 GB, 9.10% gc time)
  2.709794 seconds (80.00 M allocations: 3.278 GB, 9.07% gc time)

goszlanyi on 16 Aug 2016

Just checked and v'A actually calls the fallback in operators.jl so it allocates an unnecessary temporary. That could easily be avoided though.

andreasnoack on 16 Aug 2016

I have made a mistake.
Of course n should also be passed as a function argument.

f1(n,vv,mx) = (@time for i=1:n; y=vv[i,:]'*mx; end)
f2(n,vv,mxT) = (@time for i=1:n; y=mxT*vv[i,:]; end)

It brings a speedup, but the performance penalty becomes even worse:

f1(n,vv,mx)
  3.529541 seconds (100.00 M allocations: 4.768 GB, 11.86% gc time)
  3.434869 seconds (100.00 M allocations: 4.768 GB, 8.80% gc time)

f2(n,vv,mxT)
  1.490498 seconds (60.00 M allocations: 2.831 GB, 11.13% gc time)
  1.507337 seconds (60.00 M allocations: 2.831 GB, 10.67% gc time)

goszlanyi on 16 Aug 2016

Dunno if it'll make a difference but you should try moving @time outside of the functions and using @inbounds on the loops.

ararslan on 16 Aug 2016

@ararslan
I have done it. The results are basically the same.

goszlanyi on 16 Aug 2016

@andreasnoack
Does it mean that you can cure the performance penalty while keeping the current syntax?

goszlanyi on 16 Aug 2016

If you by y=vv[i,:]'*mx mean the current syntax then it should be a yes.

andreasnoack on 16 Aug 2016

Yes.
At present your suggestion seems to be the best,
and if there is no performance penalty involved, then I would happily use it.

goszlanyi on 16 Aug 2016

Hm. Maybe I spoke too soon. It seems like OpenBLAS' GEMM is slower than GEMV

julia> A = randn(3,3);

julia> x = randn(3);

julia> xt = x';

julia> let A = A, x = x
       @time for i in 1:10^6; BLAS.gemv('T', 1.0, A, x); end
       end
  0.113836 seconds (2.00 M allocations: 122.070 MB, 8.78% gc time)

julia> let A = A, xt = xt
       @time for i in 1:10^6; BLAS.gemm('N', 'N', 1.0, xt, A); end
       end
  0.188009 seconds (2.00 M allocations: 137.329 MB, 7.56% gc time)

which I don't think is reasonable.

andreasnoack on 16 Aug 2016

Even so, your testcode is instructive for me.

goszlanyi on 16 Aug 2016

I've updated the title. I think the remaining aspects of this discussion is fully covered by #4774.

andreasnoack on 17 Aug 2016

@andreasnoack @StefanKarpinski
My original issue (and title) was about making v*mx work
when v is a real 1-dimensional vector and not a 1xn matrix,
and returning the result as a real 1-dimensional vector.

Is this your present target?

Or is it just making v'mx work efficiently (without creating a temporary)
but still returning the result as a 1xn matrix?

goszlanyi on 17 Aug 2016

It seems that this simple code does what I miss.
It is fast and it returns a real 1-dimensional vector.

import Base.*
*(v::Vector{Float64},mx::Matrix{Float64}) = BLAS.gemv('T',mx,v)

v = rand(3)
mx = rand(3,3)

v*mx
3-element Array{Float64,1}:
 0.786774
 0.821454
 0.444747

Update:
Unfortunately, this is not the general solution.
Integer vectors and matrices are not handled by BLAS.gemv.

goszlanyi on 18 Aug 2016

Or is it just making v'mx work efficiently (without creating a temporary) but still returning the result as a 1xn matrix?

Yes. The idea was that this issue tracks the problem with the temporary. For you other suggestion please refer to #4774

andreasnoack on 19 Aug 2016

I don't understand this issue. v*mx is not standard mathematical notation right. Why not just write it as you suggest to implement it: BLAS.gemv('T',mx,v) is obtained by writing mx.'*v or just mx'*v if your working over the reals.

Jutho on 21 Aug 2016

I respectfully disagree.

v*mx is standard mathematical notation that contracts
the last (and only) index of v and the first index of mx.

Otherwise we should also leave mx*v undefined
in the case when v is a true 1-d vector.

goszlanyi on 21 Aug 2016

I don't think there is a or one standard mathematical definition for *; it's highly context dependent. But you are right to suggest other definitions than the one currently adopted in julia, which I agree is indeed a bit hard to capture in one `mathematical' operation.

Can I assume that your suggestion generalises to using it in a kind of multilinear algebra context as some default tensor contractions operator: contract the last index of the first object with the first index of the second, irrespective of the rank of the two tensors involved?

Jutho on 21 Aug 2016

I did not dare to go as far as multilinear algebra.
I have just suggested to clearly separate the three types of products in linear algebra.
See #4774.

goszlanyi on 21 Aug 2016

What are the three types of product in linear algebra? I still don't see where vector times matrix is some standard mathematical product in any given context.

Jutho on 22 Aug 2016

👍2

My apologies, wrong button.

Jutho on 22 Aug 2016

I find instructive all what you wrote.
Maybe it should be copied to #4774, and keep the present issue to track
just the problem with the temporary, as suggested by @andreasnoack.

goszlanyi on 22 Aug 2016

Yes indeed. My apologies for leading the original discussion astray. I'll make a similar post there and agree that, for now, v'mx should work efficiently and produce 1xn matrix.

Jutho on 22 Aug 2016

Fixed by https://github.com/JuliaLang/julia/pull/19670

andreasnoack on 20 Feb 2017

Was this page helpful?

0 / 5 - 0 ratings