Julia: svd, eigen, cholesky of Matrix{Float16} returns Float32 factors

Created on 19 May 2016 · 19Comments · Source: JuliaLang/julia

Try this:

x = Float16[2.3 4.3; 3.4 5.6]
svd(x)

The returned eigenvectors and eigenvalues are in Float32. I think they should be in Float16.

Compare to the behavior of svd(x) where x is Float32, or Float64. In these cases the result has the same type of x.

good first issue help wanted linear algebra

Source

cossio

Most helpful comment

Just to clarify, what's to be done here: implement Float16 versions of these linalg operations by promoting to Float32, doing the operations on those types and then converting back to Float16 at the end. It's slow and annoying but the best way to do it absent native Float16 support.

StefanKarpinski on 31 Jul 2019

👍2

All 19 comments

I think this is a duplicate of #5942.

simonbyrne on 19 May 2016

@simonbyrne It's related. But that is too general and doesn't mention svd in particular.

cossio on 19 May 2016

Right now we use LAPACK for computing the SVD and LAPACK only supports Float32 and Float64 as well as their complex versions. Therefore, it would require a generic implementation of the SVD algorithm. Such implementations exist but there are many that don't think computing with Float16s is a good idea and that is why #5942 is relevant here.

andreasnoack on 19 May 2016

@andreasnoack I think whether or not to use Float16, 32 or 64 internally for the computations is a different question, than whether the _return type_ should match the argument type, even if the computations were carried out in higher precision.

cossio on 19 May 2016

See also #5429 re: generic linear algebra.

simonbyrne on 31 Jul 2016

@ViralBShah Why was this closed? svd of Float16 still returns Float32 on v1.1.1.

cossio on 8 Jul 2019

👍1

lu returns Float16 factors, but cholesky and eigen return Float32 results as well. I would argue that perhaps the right thing to do is to promote all Float16 matrices to Float32 for linear algebra, do the operations in Float32 and return them. Alternatively, we could convert the results back to Float16. Doing all the operations in Float16 does not sound like it is the right thing.

ViralBShah on 8 Jul 2019

Given that #5942 was resolved in favour of Float16 as a real computational type, perhaps the right thing to do here is to figure out what is causing the promotions and fix those.

ViralBShah on 8 Jul 2019

The right way is to model our generic linear algebra libraries along the lines of the multi-precision lapack libraries that allow you to have different input, working, and output precisions.

ViralBShah on 8 Jul 2019

Doing operations in Float32 and returning Float16 seems like a good option since Float16 operations are not native most of the time. On some hardware, however, Float16 will be native and it might make sense to do native Float16 ops there, which will lead to inconsistent results, but I think that's just the nature of the Float16 beast for now. Documentation warnings about that should suffice.

StefanKarpinski on 8 Jul 2019

👍1

I agree that is a good immediate way to resolve this specific issue.

This is why we want a longer term solution of having generic linear algebra parametrized by input, working and output precisions. Then, we can pick reasonable defaults, while still allowing people to override them.

https://www.netlib.org/xblas/
http://mplapack.sourceforge.net/

There is then also the question of whether the GPU implementations should have the same behaviour or not. On CPUs working precision of Float32 is a good default, but not on the GPU, we might not want to do that.

ViralBShah on 8 Jul 2019

I think that's going to be a somewhat unavoidable consequence of the Float16 data type being the odd duck that it is. If people really need the 32-bit intermediate precision on GPUs, they can do the compute in 32-bits and then convert to 16-bit at the end. Most people will want the implementation that is the fastest on their hardware and will care less about consistency across different hardware. CPU users would be really annoyed at having a slower, less accurate implementation because of lots of pointless conversions to and from Float16. GPU users would be super annoyed if their computations are many times slower because of using Float32 for Float16 inputs. So I think that's the only way.

StefanKarpinski on 8 Jul 2019

👍1

A note in the doc saying that:

Precision of intermediate calculations depends on whether your hardware supports Float16 operations natively or not:

On hardware not supporting Float16 (most CPUs), intermediate computations are done in Float32 and results are converted back to Float16 at the end;

On hardware supporting Float16 (most GPUs), intermediate computations are done in Float16.

This gives the best performance on each kind of hardware, but can produce significantly different results depending on the platform. If you need identical results across platforms, perform computations using a floating-point type that is natively supported on all the hardware your code will run on. Float32 is natively supported on almost all CPUs and GPUs with good performance.

StefanKarpinski on 8 Jul 2019

Is there a way to guarantee that the results are the same, regardless of platform? And still get Float16 output with Float16 input?

cossio on 8 Jul 2019

That won't even be the case for normal BLAS types (Float32/Float64) due to differences like threading and SIMD handling.

simonbyrne on 8 Jul 2019

Also a function of how ill-conditioned the matrix is, and what "same" means. If anything, the opposite is true - it will almost never be the same.

ViralBShah on 8 Jul 2019

Is there a way to guarantee that the results are the same, regardless of platform? And still get Float16 output with Float16 input?

Yes, but you won't like it. This would be done by using generic linalg codes (no BLAS) on Float16 values. This means converting scalar value to Float32, doing an operation (+ or *) on it, then converting that result back to Float16. There also won't be any blocked or cache-optimized version of this, so it will be super slow compared to a well-tuned BLAS for Float32 or Float64. It will make these operations orders of magnitude slower on hardware without native Float16 support.

And as @simonbyrne said, the results will already differ across runs due to differences in SIMD (different on different CPUs) and threading (can be different even on the same hardware).

Basically, you can either have fast linear algebra or you can have consistent results. Pick one.

StefanKarpinski on 8 Jul 2019

👍1

Please also see https://github.com/JuliaLang/julia/issues/5942#issuecomment-35949938.

andreasnoack on 24 Jul 2019

StefanKarpinski on 31 Jul 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings