Zig: Builtin Matrix type

Created on 6 Apr 2020  路  9Comments  路  Source: ziglang/zig

LLVM 10 introduced nice Matrix intrinsics.

Possible syntax:

@Matrix(rows, cols, type)

Related issue: #903

proposal

Most helpful comment

The RFC [1] referenced by the LLVM commit [2] has this to say:

The main motivation for the matrix support on the Clang side is to give users a way to

  • Guarantee generating high-quality code for matrix operations and trees of matrix operations. For isolated operations, we can guarantee vector code generation suitable for the target. For trees of operations, the proposed value type helps with eliminating temporary loads & stores.
  • Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.
  • Move optimisations from matrix wrapper libraries into the compiler. We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Clearly the members of the LLVM community (or at least the ones backing this extension) believe that the optimizer can perform better here with the additional information about matrix representation, which to me seems like a valid argument that this should be included in the language. As long as we don't care about being bound more tightly to LLVM (which we don't seem to, given zig c++), I don't see a strong reason not to expose this.

But that still leaves a lot of free space in terms of how it should be exposed. At the LLVM level, there is no Matrix type [3]; the matrix intrinsics operate on Vectors with additional information supplied at the call site to describe the matrix dimensions. I do think that there would be concrete benefits to having a Matrix type abstraction for these intrinsics in Zig though. It would make it much easier to specify the dimensions in one place, and would allow for dimension inference when the compiler determines the result type of a matrix multiply. As long as the language supports a cast between matrices of the same size but different dimensions (which could just be @bitCast), and between matrix types and vector types of the same size, I think a specialized Matrix type is a net win. This also mirrors the decision made by the clang authors, who exposed these intrinsics via typedef float m4x4_t __attribute__((matrix_type(4, 4)));

It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

I agree that this is a potential issue. We could make it easier by documenting the layout in the Zig documentation of the @Matrix intrinsic. The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator. This would also give us a place to specify the other information that can be attached to the LLVM intrinsic, like fast-math flags.

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Looking at the LLVM documentation, the Matrix type is internally backed by a Vector type, so @bitCast support (or some specialized cast) definitely makes sense. But for the same reasons I stated above, I don't think we should implement matrix * vector. Since @Vector is semantically a SIMD type, not a mathematical vector type, I also don't think we should make @matrixVectorMultiply(matrix, vector), unless LLVM makes a specialized intrinsic for this specific operation. Instead, if this is needed, @matrixMultiply(matrix, @bitCast(@Matrix(4, 1, f32), vector)) should give all of the code generation benefits without introducing an operator to the language that has unexpected nontrivial cost, or encouraging treating simd vectors like mathematical vectors.

Overall I think our investment in this feature should be parallel to LLVM's. If they start making large improvements to the codegen from these intrinsics, or supporting more new hardware with them, it becomes more worthwhile for us to add support.

[1] RFC: Matrix Math Support http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
[2] LLVM Code review for Matrix intrinsics https://reviews.llvm.org/D70456
[3] LLVM documentation, matrix intrinsics https://llvm.org/docs/LangRef.html#matrix-intrinsics

All 9 comments

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))? If their code gen and memory is the same, then we might as well drop @Vector for @Matrix.

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))?

Yes.

we might as well drop @Vector for @Matrix

Oh, no! Vector's operators are already perfect. :)

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Do they have anything else than transpose, multiple, load and store? Is that useful to add to a language? How much magic will that hide?

To be honest, I'm not a big fan. It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

The RFC [1] referenced by the LLVM commit [2] has this to say:

The main motivation for the matrix support on the Clang side is to give users a way to

  • Guarantee generating high-quality code for matrix operations and trees of matrix operations. For isolated operations, we can guarantee vector code generation suitable for the target. For trees of operations, the proposed value type helps with eliminating temporary loads & stores.
  • Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.
  • Move optimisations from matrix wrapper libraries into the compiler. We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Clearly the members of the LLVM community (or at least the ones backing this extension) believe that the optimizer can perform better here with the additional information about matrix representation, which to me seems like a valid argument that this should be included in the language. As long as we don't care about being bound more tightly to LLVM (which we don't seem to, given zig c++), I don't see a strong reason not to expose this.

But that still leaves a lot of free space in terms of how it should be exposed. At the LLVM level, there is no Matrix type [3]; the matrix intrinsics operate on Vectors with additional information supplied at the call site to describe the matrix dimensions. I do think that there would be concrete benefits to having a Matrix type abstraction for these intrinsics in Zig though. It would make it much easier to specify the dimensions in one place, and would allow for dimension inference when the compiler determines the result type of a matrix multiply. As long as the language supports a cast between matrices of the same size but different dimensions (which could just be @bitCast), and between matrix types and vector types of the same size, I think a specialized Matrix type is a net win. This also mirrors the decision made by the clang authors, who exposed these intrinsics via typedef float m4x4_t __attribute__((matrix_type(4, 4)));

It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

I agree that this is a potential issue. We could make it easier by documenting the layout in the Zig documentation of the @Matrix intrinsic. The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator. This would also give us a place to specify the other information that can be attached to the LLVM intrinsic, like fast-math flags.

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Looking at the LLVM documentation, the Matrix type is internally backed by a Vector type, so @bitCast support (or some specialized cast) definitely makes sense. But for the same reasons I stated above, I don't think we should implement matrix * vector. Since @Vector is semantically a SIMD type, not a mathematical vector type, I also don't think we should make @matrixVectorMultiply(matrix, vector), unless LLVM makes a specialized intrinsic for this specific operation. Instead, if this is needed, @matrixMultiply(matrix, @bitCast(@Matrix(4, 1, f32), vector)) should give all of the code generation benefits without introducing an operator to the language that has unexpected nontrivial cost, or encouraging treating simd vectors like mathematical vectors.

Overall I think our investment in this feature should be parallel to LLVM's. If they start making large improvements to the codegen from these intrinsics, or supporting more new hardware with them, it becomes more worthwhile for us to add support.

[1] RFC: Matrix Math Support http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
[2] LLVM Code review for Matrix intrinsics https://reviews.llvm.org/D70456
[3] LLVM documentation, matrix intrinsics https://llvm.org/docs/LangRef.html#matrix-intrinsics

* We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Ehm... ok. Not sure what to think of this. This is going in the way of Fortran. Doesn't mean it's bad, but I'm also not sure if implementing matrix multiplication algorithms is a compiler's job. Maybe I'm overestimating the extent of tiled & fused loops?

* Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.

This is a valid argument for specialized matrix operations.

* For trees of operations, the proposed value type helps with eliminating temporary loads & stores.

I don't know enough here to have an opinion.

The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Seems like a good solution.

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator.

Agreed. That makes it a lot clearer already than I originally imagined. 馃憤

Given the variation in matrix memory layout between architectures (row-major or column-major? Is [|c|]*[|r|]T a matrix? Do we allow multiplies between different index orderings? If so, what's the index order of the result? Where is it stored?), and the implicit memory management inherent to some of them, I really don't think a separate matrix type is wise. If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited -- until then, I think the best course of action is to tighten the guarantees around auto-tiling and -fusing of loops.

If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited

Intel AMX is a new addition to x86 to support matrix operations. Intel Instruction Set Reference (PDF). See chapter 3.

Personally I think this kind of thing is an edge case and should wait until the rest of the language is finished. Also with the rise of Arm CPUs perhaps a more sane way of dealing with vector and matrix data will become more common. We can only hope at any rate.

One final comment: to be fair to Intel, AMX is a lot more sane than the ever changing set of SIMD instructions from MMX to AVX-512. But, wow, is that a lot of state. Task switching is going to be painful with the addition of that much state.

Was this page helpful?
0 / 5 - 0 ratings