Julia: Extending inner products

Created on 25 May 2016 · 28Comments · Source: JuliaLang/julia

Two related questions:

I have the need for a generalised euclidean inner product

   dot(x, M, y)  =  dot(x, M * y)

which I don't want to hard-code prefer an ability to overload it. Is this functionality sufficiently general (I think it is) that it should be in Base by default for the standard arrays?

EDIT: second question is deleted; it needs more thinking.

linear algebra sparse

Source

cortner

👍3

Most helpful comment

I still think it would be nice to have dot(x, A, y), even if ternary * calls this.

stevengj on 16 Oct 2017

👍3

All 28 comments

I that this would be useful to have in base and I've thought about adding it several times. One of the benefits is that you can completely avoid a temporary M*y.

andreasnoack on 25 May 2016

yes - that was exactly the origin of this - I should have said, sorry.

cortner on 25 May 2016

Would be nice to get some benefit from BLAS here, though it's only a BLAS2 operation. @xianyi, do you have a suggestion on the most efficient way to use BLAS for x'*M*y?

One option would be to break x into chunks of say, 16 elements (big enough to exceed a cache line but small enough that the memory allocation of a length-16 temporary vector is an issue, and compute x'*M on that chunk via BLAS2 using a subset of the columns of M (since Julia is column-major, this doesn't require any copying), then accumulate that chunk of the dot product with y. However, this is suboptimal because it doesn't fully exploit multi-threading: each of the chunk dot products could ideally be computed in a parallel thread.

stevengj on 25 May 2016

Ideally you'd also want a specialized method for the common case where M is a diagonal matrix, especially a diagonal real matrix.

And tridiagonal "mass" matrices are also common, as well as general sparse matrices M, in problems arising from discretized PDEs.

stevengj on 25 May 2016

I was mostly thinking of sparse cases for my own use. These should be relatively easy to implement in Julia?

cortner on 25 May 2016

For the matvec, it seems that multithreading is almost the only speedup you'll get from BLAS. For a 500x500 problem and a Julia double loop with @inbounds and @simd I get

julia> blas_set_num_threads(1);

julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 1 thread
  0.484342 seconds

julia> blas_set_num_threads(4);

julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 4 threads
  0.184681 seconds

julia> @time for i = 1:10000; mymul(y, A, x); end # Julia
  0.509110 seconds

so if/when the @threadsoverhead is gone, a pure Julia version should probably be okay.

andreasnoack on 25 May 2016

I'm surprised there isn't some SIMD speedup.

stevengj on 26 May 2016

Probably also worthwhile to special case x === y and issym(A).

mschauer on 26 May 2016

@stevengj I don't understand. Both OpenBLAS and Julia use SIMD so where would you have expected to have seen the speedup?

andreasnoack on 26 May 2016

In OpenBLAS the SIMD is hard-coded (I thought), whereas in Julia we rely on the compiler to do its magic, which doesn't always work as well. Maybe this is one of the cases where the compiler does a good job in exploiting SIMD.

stevengj on 26 May 2016

In my experiments with mat-vec and mat-mat, Julia has done a good job exploiting SIMD. For mat-vec my implemention was asymptotically as fast as OpenBlas, although they use a better algorithm for mat-mat, so theirs was faster in that case (about about 3x IIRC).

JaredCrean2 on 26 May 2016

To follow up on this, for when everything is sparse we can also do much better by interleaving the computations. e.g. in a recent project I did:

function mul_ut_X_v(u::SparseVector, X::SparseMatrixCSC, v::SparseVector)
    u.n == X.m && X.n == u.n || throw(DimensionMismatch())

    z = zero(promote_type(eltype(u),eltype(X),eltype(v)))
    for (vi,vv) in zip(v.nzind, v.nzval)        
        X_ptr_lo = X.colptr[vi] 
        X_ptr_hi = X.colptr[vi+1] - 1
        if X_ptr_lo <= X_ptr_hi        
            z += Base.SparseArrays._spdot(dot, 1, length(u.nzind), u.nzind, u.nzval,
                                            X_ptr_lo, X_ptr_hi, X.rowval, X.nzval) * vv
        end
    end
    z
end

Once we have lazy transposes, we could overload the ternary multiplication operator u' * X * v for this. Until then, I would be fine with overloading dot, or adding a bilinear function.

simonbyrne on 13 Oct 2017

As pointed out by @andyferris, we can now overload ternary * for this, e.g.

function Base.:*(u::RowVector{T}, X::Matrix{T}, v::Vector{T}) where {T}
    m,n = size(X)
    length(u) == m && length(v) == n || throw(DimensionMismatch())
    z = zero(T)
    for j = 1:n
        zz = zero(T)
        @simd for i = 1:m
            @inbounds zz += u[i]*X[i,j]
        end
        z += zz*v[j]
    end
    z
end

This gives a 2x speedup on my machine.

simonbyrne on 16 Oct 2017

Credit to @ViralBShah also I think? (Note we need to make the mechanics of Ac_mul_B a bit different (nuke it from orbit) to make this work).

andyferris on 16 Oct 2017

I still think it would be nice to have dot(x, A, y), even if ternary * calls this.

stevengj on 16 Oct 2017

👍3

@andyferris Why would we need to change Ac_mul_B?

simonbyrne on 17 Oct 2017

I just mean the lowering. You want it to come out as *(adjoint(x), A, y) rather than Ac_mul_B(x, A*y).

andyferris on 17 Oct 2017

Overloading the ternary * seems to do that already

simonbyrne on 17 Oct 2017

Yeah, as long as you use an explicit *. It's fiddly, though:

julia> expand(Main, :(x'*A*y))
:(adjoint(x) * A * y)

julia> expand(Main, :(x'A*y))
:(Ac_mul_B(x, A) * y)

julia> expand(Main, :(x'*A))
:(Ac_mul_B(x, A))

julia> expand(Main, :(x'A))
:(Ac_mul_B(x, A))

mbauman on 17 Oct 2017

😕1

This would be fixed if we settle on a definition of (and implement) a matrix transpose because then all the Ax_mul_Bxs can go away.

andreasnoack on 19 Oct 2017

👍1

Have there been any regrets about implementing a vector transpose?

mschauer on 19 Oct 2017

I think that people are generally happy with the vector transpose. There have been a few comments where people have been surprised/confused/annoyed but I think that is to be expected given the length of #4774.

andreasnoack on 19 Oct 2017

Yes, I have to say the experience with RowVectors behaviorally embedded as row matrices except for linear algebraic operations has been lovely. They generally just do what you want.

StefanKarpinski on 19 Oct 2017

👍2

The generalized inner product would be nice to have without creating a temporary.

ViralBShah on 17 Dec 2018

👍2

What is the current thinking on bringing this to Julia? This type of quadratic form x'Qy (often with x = y and Q symmetric) is ubiquitous in math-optimization (I'm surprised there is no blas function for it). I see there is some implementation in PDMats.jl for the case where Q is PSD (which is restrictive). Brief related discussion: https://discourse.julialang.org/t/matrix-multiplication-a-b-a

chriscoey on 9 May 2019

@chriscoey I suspect this might be an issue waiting for someone to contribute a pull request to LinearAlgebra? The discussion above seems pretty receptive to the idea.

andyferris on 10 May 2019

👍1

So, we do have a ternary * for Diagonals, and I quickly wrote up one for SparseMatrixCSC and AbstractVectors:

function *(x::Adjoint{<:Any,<:AbstractVector}, A::SparseMatrixCSC, y::AbstractVector)
    length(x) == A.m || throw(DimensionMismatch())
    A.n == length(y) || throw(DimensionMismatch())
    T = promote_type(eltype(x), eltype(A), eltype(y))
    out = T(0)
    rvals = rowvals(A)
    nzvals = nonzeros(A)

    @inbounds for col = 1:A.n
        temp = T(0)
        for k in nzrange(A, col)
            temp += x[rvals[k]] * nzvals[k]
        end
        out += temp * y[col]
    end
    return out
end

Would we insert this into SparseArrays as is (perhaps together with @simonbyrne's all-sparse version above), or should we introduce a generic dot(x, A, y) first, make it throw in general, but have more specific implementations and make the ternary *s call those?