Turicreate: Converting SArray from dtype string to float is not accurate

Created on 15 Jan 2020 · 2Comments · Source: apple/turicreate

When converting an SArray of type string to float, small values are added or subtracted to the correct value.

The following code:

import turicreate as tc
sa = tc.SArray(['1.2', '3.14', '1.66'])
print(sa.astype(float))

Produces:

dtype: float
Rows: 3
[1.2000000476837158, 3.140000104904175, 1.659999966621399]

Expected:

dtype: float
Rows: 3
[1.2, 3.14, 1.66]

bug engine

Source

TobyRoseman

Most helpful comment

To sum up, two bugs:

[ ] astype(float) invokes a 32 bit floating point path instead of a 64 bit double conversion somewhere.
[ ] SArray printing should invoke python's number truncation logic to better print things.

hoytak on 15 Jan 2020

👍2

All 2 comments

That's somewhat expected behavior. It's stored as a base-2 number, which doesn't map directly to base 10, so you get artifacts like that. Those numbers are the closest ones in base 2 to the ones given.

In python:

In [8]: print("{0:.20f}".format(3.14))
3.14000000000000012434

However, it appears that those numbers are being converted through a 32 bit float type instead of a double, as it should be:

In [13]: print("{0:.20f}".format(numpy.float32(3.14)))
3.14000010490417480469

In addition, we should respect python's default precision when printing numbers in sarray, possibly by using python's system internally.

From https://docs.python.org/2/tutorial/floatingpoint.html:

"In current versions, Python displays a value based on the shortest decimal fraction that rounds correctly back to the true binary value, resulting simply in ‘0.1’."

hoytak on 15 Jan 2020

To sum up, two bugs:

[ ] astype(float) invokes a 32 bit floating point path instead of a 64 bit double conversion somewhere.
[ ] SArray printing should invoke python's number truncation logic to better print things.

hoytak on 15 Jan 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings