Try this:
x = Float16[2.3 4.3; 3.4 5.6]
svd(x)
The returned eigenvectors and eigenvalues are in Float32
. I think they should be in Float16
.
Compare to the behavior of svd(x)
where x
is Float32
, or Float64
. In these cases the result has the same type of x
.
I think this is a duplicate of #5942.
@simonbyrne It's related. But that is too general and doesn't mention svd
in particular.
Right now we use LAPACK for computing the SVD and LAPACK only supports Float32
and Float64
as well as their complex versions. Therefore, it would require a generic implementation of the SVD algorithm. Such implementations exist but there are many that don't think computing with Float16
s is a good idea and that is why #5942 is relevant here.
@andreasnoack I think whether or not to use Float16
, 32
or 64
internally for the computations is a different question, than whether the _return type_ should match the argument type, even if the computations were carried out in higher precision.
See also #5429 re: generic linear algebra.
@ViralBShah Why was this closed? svd
of Float16
still returns Float32
on v1.1.1.
lu
returns Float16 factors, but cholesky
and eigen
return Float32
results as well. I would argue that perhaps the right thing to do is to promote all Float16 matrices to Float32 for linear algebra, do the operations in Float32 and return them. Alternatively, we could convert the results back to Float16. Doing all the operations in Float16 does not sound like it is the right thing.
Given that #5942 was resolved in favour of Float16
as a real computational type, perhaps the right thing to do here is to figure out what is causing the promotions and fix those.
The right way is to model our generic linear algebra libraries along the lines of the multi-precision lapack libraries that allow you to have different input, working, and output precisions.
Doing operations in Float32 and returning Float16 seems like a good option since Float16 operations are not native most of the time. On some hardware, however, Float16 will be native and it might make sense to do native Float16 ops there, which will lead to inconsistent results, but I think that's just the nature of the Float16 beast for now. Documentation warnings about that should suffice.
I agree that is a good immediate way to resolve this specific issue.
This is why we want a longer term solution of having generic linear algebra parametrized by input, working and output precisions. Then, we can pick reasonable defaults, while still allowing people to override them.
https://www.netlib.org/xblas/
http://mplapack.sourceforge.net/
There is then also the question of whether the GPU implementations should have the same behaviour or not. On CPUs working precision of Float32 is a good default, but not on the GPU, we might not want to do that.
I think that's going to be a somewhat unavoidable consequence of the Float16 data type being the odd duck that it is. If people really need the 32-bit intermediate precision on GPUs, they can do the compute in 32-bits and then convert to 16-bit at the end. Most people will want the implementation that is the fastest on their hardware and will care less about consistency across different hardware. CPU users would be really annoyed at having a slower, less accurate implementation because of lots of pointless conversions to and from Float16. GPU users would be super annoyed if their computations are many times slower because of using Float32 for Float16 inputs. So I think that's the only way.
A note in the doc saying that:
Precision of intermediate calculations depends on whether your hardware supports Float16 operations natively or not:
- On hardware not supporting
Float16
(most CPUs), intermediate computations are done inFloat32
and results are converted back toFloat16
at the end;- On hardware supporting
Float16
(most GPUs), intermediate computations are done inFloat16
.This gives the best performance on each kind of hardware, but can produce significantly different results depending on the platform. If you need identical results across platforms, perform computations using a floating-point type that is natively supported on all the hardware your code will run on.
Float32
is natively supported on almost all CPUs and GPUs with good performance.
Is there a way to guarantee that the results are the same, regardless of platform? And still get Float16
output with Float16
input?
That won't even be the case for normal BLAS types (Float32/Float64) due to differences like threading and SIMD handling.
Also a function of how ill-conditioned the matrix is, and what "same" means. If anything, the opposite is true - it will almost never be the same.
Is there a way to guarantee that the results are the same, regardless of platform? And still get
Float16
output withFloat16
input?
Yes, but you won't like it. This would be done by using generic linalg codes (no BLAS) on Float16
values. This means converting scalar value to Float32
, doing an operation (+
or *
) on it, then converting that result back to Float16
. There also won't be any blocked or cache-optimized version of this, so it will be super slow compared to a well-tuned BLAS for Float32 or Float64. It will make these operations orders of magnitude slower on hardware without native Float16 support.
And as @simonbyrne said, the results will already differ across runs due to differences in SIMD (different on different CPUs) and threading (can be different even on the same hardware).
Basically, you can either have fast linear algebra or you can have consistent results. Pick one.
Please also see https://github.com/JuliaLang/julia/issues/5942#issuecomment-35949938.
Just to clarify, what's to be done here: implement Float16 versions of these linalg operations by promoting to Float32, doing the operations on those types and then converting back to Float16 at the end. It's slow and annoying but the best way to do it absent native Float16 support.
Most helpful comment
Just to clarify, what's to be done here: implement Float16 versions of these linalg operations by promoting to Float32, doing the operations on those types and then converting back to Float16 at the end. It's slow and annoying but the best way to do it absent native Float16 support.