Deeplearning4j: ND4J/DL4J: Deprecate L1/L2/L3 BLAS methods; update Nd4j.gemm to use op not direct BLAS calls

Created on 8 Apr 2020 · 4Comments · Source: eclipse/deeplearning4j

Why: https://github.com/eclipse/deeplearning4j/issues/8734#issuecomment-601565911

We should also check for any direct BLAS calls in DL4J/ND4J and switch them over to the op.

Needs: https://github.com/eclipse/deeplearning4j/issues/8797

BlaLapack Enhancement

Source

AlexDBlack

All 4 comments

Note we have Nd4j.gemm calls in BaseLayer, ConvolutionLayer, LSTM, etc.
Any/all of these can cause problems in heavily multi-threaded environments on CUDA.

AlexDBlack on 8 Apr 2020

There's a bit more than GEMM to BLAS/LAPACK/MKL...

saudet on 8 Apr 2020

👍1

Yep, we're aware :)
GEMM threading issues are a problem right now though.
The plan is to have proper op coverage for (most of - ideally all of) BLAS/LAPACK in the namespaces, where applicable using external libraries in libnd4j for the actual implementation. That way it can be used in both ND4J and SameDiff (usually with a nicer API), will be properly documented and findable in autogen docs, etc.

AlexDBlack on 8 Apr 2020

👍1

And data types... We support f16/bf16, and for BLAS it's not that simple: regular BLAS doesn't support anything besides f32/f64. cuBLAS supports f16. MKLDNN supports int/bf16 etc. So we'll be centralizing all this stuff in 1 place. Java will be getting valid results no matter what backend is used.