When converting an SArray of type string to float, small values are added or subtracted to the correct value.
The following code:
import turicreate as tc
sa = tc.SArray(['1.2', '3.14', '1.66'])
print(sa.astype(float))
Produces:
dtype: float
Rows: 3
[1.2000000476837158, 3.140000104904175, 1.659999966621399]
Expected:
dtype: float
Rows: 3
[1.2, 3.14, 1.66]
That's somewhat expected behavior. It's stored as a base-2 number, which doesn't map directly to base 10, so you get artifacts like that. Those numbers are the closest ones in base 2 to the ones given.
In python:
In [8]: print("{0:.20f}".format(3.14))
3.14000000000000012434
However, it appears that those numbers are being converted through a 32 bit float type instead of a double, as it should be:
In [13]: print("{0:.20f}".format(numpy.float32(3.14)))
3.14000010490417480469
In addition, we should respect python's default precision when printing numbers in sarray, possibly by using python's system internally.
From https://docs.python.org/2/tutorial/floatingpoint.html:
"In current versions, Python displays a value based on the shortest decimal fraction that rounds correctly back to the true binary value, resulting simply in ‘0.1’."
To sum up, two bugs:
[ ] astype(float) invokes a 32 bit floating point path instead of a 64 bit double conversion somewhere.
[ ] SArray printing should invoke python's number truncation logic to better print things.
Most helpful comment
To sum up, two bugs:
[ ] astype(float) invokes a 32 bit floating point path instead of a 64 bit double conversion somewhere.
[ ] SArray printing should invoke python's number truncation logic to better print things.