Julia: make I(n) behave as `Diagonal` / `eye`?

Created on 6 Aug 2017  ·  33Comments  ·  Source: JuliaLang/julia

It might be nice if we made I callable so that you can write I(10) and get a 10x10 Diagonal matrix object. Of course it's a bit weird to call one kind of object and get another, but we do that with functions all the time so maybe that's no problem. Possible definition:

(σ::UniformScaling)(n::Integer) = Diagonal(collect(Iterators.repeated(σ.λ, n)))

With this definition, we could potentially deprecate eye(n) (which is usually quite inefficient) in favor of I(n) when one doesn't need to mutate any off-diagonals or full(I(n)) in the cases where one does need to mutate an off-diagonal.

linear algebra speculative

Most helpful comment

Alternate issue title: An I for an eye.

All 33 comments

What advantage does this have? Functions are pretty special (sure they are objects but they are very special ones) so I don't think functions being callable is a good arguments for this. It's a good argument for not implementing algebra or other operations on functions if anything. The syntax will also be really confusing with juxtapose multiplication when you have both I(2) and 2I in the code.

eye is also a well known function name from python and Matlab so I don't think removing it is a good idea. We should certainly make more efficient things easier to do (x * I sounds simple enough though) but I don't think we should make doing less efficient but sometimes necessary things much harder. It's also unclear to me what would be the new syntax to get a eye(Int8, 3) or eye(Bool, 2)

You can do copy!(Matrix{Int8}(n,n), I). eye is a pun that I've never really liked that much.

As in similar places: What problem is solved by this?

When I heard eye the first time in the MATLAB context, I typed I. I thought somebody was kidding, when said: . So I am conditioned for 'I'.

We currently have eye, I and Diagonal (not to mention speye), it would be nice to have one less function for making a linear algebra identity operator. There's almost never any reason to construct a dense one, so it's actively bad to have eye around since using I would be better in 99% of case. Since one want to use it in situations like A + I and [A I; I B] already, it seems much more natural to use I(n) for constructing a particular size of identity matrix instead of explaining "oh, there's this other thing with a really punny name to do that, but don't actually use it since it's not very efficient, instead do Diagonal(collect(Iterators.repeated(1, n)))". I'm proposing that we provide a more convenient spelling of that last expression – and I is a pretty natural name for it.

I kind of think we should double down on the whole I thing and allow it to produce rectangular matrices and make Z a binding for 0I that lets us write things like [A Z; Z B]. Either that or allow a scalar to implicitly broadcast in concatenation expressions so you can write [A 0; 0 B]. Writing [A zeros(size(A,2), size(B,1)); zeros(size(B,1), size(A,2)) B] is such a drag.

Diagonal(ones(n)) isn't that long. Also a different type than UniformScaling, odd to use callable I to construct something of a different type.

I'm also fine with Diagonal(ones(n)) and Array(Diagonal(ones(n))) (for a full eye replacement). It's pretty clear, and as Stefan said you don't really want to be constructing big mutable dense identity matrices if you can avoid it.

@StefanKarpinski "I kind of think we should double down … and make Z a binding for 0I "
Please, please, don't do this. Exporting single-character variables from the base library is rarely wise. (In fact, it's almost always foolish, but I'll save that for another day.) Those of us who actually use linear algebra employ Z for lots of obvious purposes, among which your proposal does not figure.

0I seems like a reasonable way to express 0I, so there's no need to spell zeros in full.

This issue as stated is not breaking, so doesn't belong on the milestone. However, it has been brought up previously that we may want to get rid of eye since it is inefficient and never really the best alternative. Deprecating it would force people to consider better alternatives. So perhaps that should be the real question at hand here. Should we deprecate eye and if so, do we have sufficient facilities to cover all use cases of it?

I would deprecate eye to something like Array(Diagonal(ones(n))) --- is there any way to do that without allocating the vector of ones? I see copy!(zeros(3,3), I) works, but that depends on mutation so isn't ideal.

Note that eye definitely belongs to the LinAlg module so could go in stdlib rather than Base.

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

To start I skimmed through all eye/speye calls in base, and then sketched a pull request deprecating speye (#24356). Observations from that deprecation sketch follow below, most of which also apply to eye.

speye takes the following forms: [sp]eye([T::Type], m::Integer[, n::Integer]) and speye(A::AbstractMatrix).

Where speye(m::Integer[, n::Integer]) cannot be replaced with something more clever (e.g. I alone), sparse(1.0I, m[, n]) works, as would SparseMatrixCSC{Float64}(I, m, n) (when that constructor exists). Fine so far, though having to specify the Float64 in SparseMatrixCSC{Float64} due to eltype(I) defaulting to Int may be slightly cumbersome.

Replacing speye(T, m[, n]) is trickier: Where typeof(one(T) * eltype(I)) == T, sparse(one(T)I, m[, n]) or some equivalent works reasonably well. But where typeof(one(T) * eltye(I)) != T, e.g. in speye(Bool, m), life is a bit tricker: Seemingly one must either write sparse(UniformScaling(one(T)), m[, n]), which is a bit cumbersome due to the UniformScaling(one(T)), or SparseMatrixCSC{T}(I, m, n) (when that constructor exists). A compact way to construct a UniformScaling with specified eltype might smooth that wrinkle out. Similarly, e.g. speye(Real, m) is tricky, as sparse(UniformScaling{Real}(1), m) yields a SparseMatrixCSC{Int} rather than SparseMatrixCSC{Real}. That either seems like a bug in sparse that should be fixed, or calls for a method to handle SparseMatrixCSC{Real}(I, m, n).

speye(A) is similarly tricky: The best rewrite I have so far is sparse(UniformScaling(one(eltype(A))), size(A)...), which is a bit cumbersome. Here again a compact way to construct a UniformScaling with specified eltype might be nice. Supporting array constructors that accept an iterable as their first argument, and another AbstractArray from which eltype and shape are taken as the second argument, might also smooth this bump out (e.g. SparseMatrixCSC(I, A)).

Methods for comparison against identity (foo == I, foo ≈ I) could be useful here and there too. Best!

To start I skimmed through all eye/speye calls in base

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

Fantastic question! In the speye deprecation and excluding tests specific to the corresponding methods, I found only one instance where m != n (and in that instance, the matrix's contents could just as well be zero or junk). In other words, at least the speye([T, ]m, n) (and so sparse(I, m, n)) methods are likely superfluous.

Similarly, I found only three uses of speye(A). Those three uses are essentially identical, occurring in definitions of arithmetic between SparseMatrixCSCs and UniformScalings. From the speye deprecation diff:

-(+)(A::SparseMatrixCSC, J::UniformScaling) = A + J.λ * speye(A)
-(-)(A::SparseMatrixCSC, J::UniformScaling) = A - J.λ * speye(A)
-(-)(J::UniformScaling, A::SparseMatrixCSC) = J.λ * speye(A) - A
+(+)(A::SparseMatrixCSC, J::UniformScaling) = A + sparse(J, size(A)...)
+(-)(A::SparseMatrixCSC, J::UniformScaling) = A - sparse(J, size(A)...)
+(-)(J::UniformScaling, A::SparseMatrixCSC) = sparse(J, size(A)...) - A

As shown, those uses are perhaps better written sparse(J, size(A)...). But these methods could be written more efficiently without the sparse(J, size(A)...)/speye(A) altogether. Moreover, the same arithmetic operations between AbstractMatrixs and UniformScalings throw if the AbstractMatrix is not square. So one could argue that the uses in the copied methods are incorrect. Altogether, the speye(A) methods might also be superfluous.

So perhaps deprecating sparse(I, m, n) (as opposed to sparse(I, n)) alongside speye is reasonable, which would simplify life a bit.

I will keep this question in mind while reviewing uses of eye again! Thank you! :)

On second thought, with the future SparseMatrixCSC(I, m, n) around, perhaps sparse(I, m, n) should remain around as well? Thoughts? Best!

Another observation:

About a third of eye invocations in test/ are in spirit either res == eye(...) or res ≈ eye(...). Where the intention of such tests is to check that the contents of res match identity, res == I and res ≈ I are superior rewrites. But such tests also implicitly test res's shape, and whether that shape check was intended / is an important part of the test is rarely (if ever) clear.

So while most of those test could be nicely rewritten as res == I or res ≈ I, strictly they should be (res = ...; res == I && size(res) == ...). On the one hand, where necessary the latter rewrite might sometimes be a bit verbose. On the other hand, the latter rewrite makes what the test actually tests explicit, which is of tremendous benefit to parties reading/modifying the tests. Best!

Additional observations. Most eye calls serve one of the following purposes, ordered roughly by prevalence:

  1. Comparison (==, ) against identity, discussed in the preceding comment.
  2. Concise array construction, where frequently the contents do not much matter.
  3. Construction of a dense identity matrix for subsequent in-place multiplication.
  4. Construction of a dense identity for subsequent out-of-place multiplication (with some implicitly represented operator).

For purposes (3) and (4), Matrix(I, m, n) for eye(m[, n]) seems like a good rewrite. The catch: (3) requires eltype(Matrix(I, m, n)) <: AbstractFloat and (4) usually expects the same, while eltype(Matrix(I, m, n)) would beInt64 presently (or Bool with #24396) due to corresponding eltype(I). Matrix(1.0I, m, n) and Matrix{Float64}(I, m, n) do the trick but are somewhat less pleasant. Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result. (And likewise with future Array([zeros|ones], dims) methods, matching existing zeros/ones behavior.) Best!

Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result.

For computations of eigen and singular vectors you'd like to be able to specify the element type. It could be Float32, BigFloat, or Complex{<:AbstractFloat} as well so you'd typically use the generic version anyway. Hence it might not be too important to avoid Int elements by default. I think Matrix{T}(I, n) would be fine in those applications.

Sounds good on this end! :)

Alternate issue title: An I for an eye.

With #24438 and the other pull requests linked above in, this issue should be set :).

I think we should consider having I(n) alongside Matrix(I, n, n). It would be a nice simplification for a common teaching situation where you'd like the identity to have a size and it shouldn't really get in the way of anybody. In practice I doubt that 2I vs I(2) will be a real concern.

I agree. (says the teacher of large linear algebra classes)

It's a little weird that calling an instance of UniformScaling creates an instance of Matrix, but 🤷‍♂️. It does seem quite convenient and intuitive, so 👍.

This keeps coming up—let's do it.

Also useful for tensor products kron(A,I(4)) where the size of the identity isn't determined from A.

Re: implementation, how about adding a lazy non-allocating representation of x * ones(n) (say, UniformVector; something similar to FillArrays.Ones), and define (λ * I)(n) via

(σ::UniformScaling)(n::Integer) = Diagonal(UniformVector(σ.λ, n))

UniformVector is useful in other places like nzval of sparse arrays.

Some comments:

eye may be a bad pun, but it is an excellent mnemonic.

My guess is that having eye taken away without a clearly documented (anywhere documented ?), simple replacement is upsetting to many users. I have seen this anecdotally several times. I don't think they care whether the identity matrix is lazy or not, just that they have something that is not too much more difficult to type and to remember than eye, and don't have to think about it anymore. This may be related to Julia becoming less of an application and more of a language (ie moving away from MATLAB), and the need for a more application-like layer.

Allocation is not always slower than a lazily evaluated object (https://github.com/BBN-Q/QuantumInfo.jl/pull/3):

import FillArrays
import LinearAlgebra
using BenchmarkTools

eye(n::Integer) = Matrix{Float64}(LinearAlgebra.I, (n, n))

ikron_dense(n, mat) = kron(eye(n), mat)
ikron_lazy(n, mat) = kron(FillArrays.Eye(n), mat)

rikron_dense(n, mat) = kron(mat, eye(n))
rikron_lazy(n, mat) = kron(mat, FillArrays.Eye(n))

const nsmall = 2
const nbig = 100

const mat_small = rand(nsmall, nsmall)
const mat_big = rand(nbig, nbig);

macro sbtime(expr)
    quote
        println($(expr |> string))
        @btime $expr
    end
end

@sbtime ikron_dense(nbig, mat_big);
@sbtime ikron_lazy(nbig, mat_big);

println()

@sbtime rikron_dense(nbig, mat_big);
@sbtime rikron_lazy(nbig, mat_big);

println()

@sbtime ikron_dense(nsmall, mat_small);
@sbtime ikron_lazy(nsmall, mat_small);

println()

@sbtime ikron_dense(nsmall, mat_big);
@sbtime ikron_lazy(nsmall, mat_big);

println()

@sbtime rikron_dense(nsmall, mat_small);
@sbtime rikron_lazy(nsmall, mat_small);

println()

@sbtime rikron_dense(nsmall, mat_big);
@sbtime rikron_lazy(nsmall, mat_big);

println()

@sbtime ikron_dense(nbig, mat_small);
@sbtime ikron_lazy(nbig, mat_small);
ikron_dense(nbig, mat_big)
 157.041 ms (4 allocations: 763.02 MiB)
ikron_lazy(nbig, mat_big)
  209.153 ms (2 allocations: 762.94 MiB)

rikron_dense(nbig, mat_big)
  304.093 ms (4 allocations: 763.02 MiB)
rikron_lazy(nbig, mat_big)
  358.267 ms (2 allocations: 762.94 MiB)

ikron_dense(nsmall, mat_small)
  101.824 ns (2 allocations: 320 bytes)
ikron_lazy(nsmall, mat_small)
  61.975 ns (1 allocation: 208 bytes)

ikron_dense(nsmall, mat_big)
  41.494 μs (3 allocations: 312.69 KiB)
ikron_lazy(nsmall, mat_big)
  41.486 μs (2 allocations: 312.58 KiB)

rikron_dense(nsmall, mat_small)
  105.991 ns (2 allocations: 320 bytes)
rikron_lazy(nsmall, mat_small)
  68.820 ns (1 allocation: 208 bytes)

rikron_dense(nsmall, mat_big)
  60.869 μs (3 allocations: 312.69 KiB)
rikron_lazy(nsmall, mat_big)
  76.319 μs (2 allocations: 312.58 KiB)

ikron_dense(nbig, mat_small)
  64.835 μs (4 allocations: 390.78 KiB)
ikron_lazy(nbig, mat_small)
  59.712 μs (2 allocations: 312.58 KiB)

Stefan said above to do this, so I fixed the milestone.

Since there is no PR for this yet I think it needs to be moved to the next milestone.

Was this page helpful?
0 / 5 - 0 ratings