If a pandas data frame has an object column that contains NaN value, we can not convert it to an SFrame, and we get an unhelpful error message.
Turicreate version: 4.3.2
Python version: 3.6.5 (this is likely a bug only in Python 3)
Simple repo code with stack trace:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import turicreate as tc
In [4]: temp = pd.DataFrame({'a': ['test', np.NaN]})
In [5]: tc.SFrame(temp)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.infer_flex_type_of_sequence()
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.flex_type_from_array_typecode()
ValueError: Type 'O' does not appear to be a valid array type code.
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/turicreate/data_structures/sframe.py in __init__(self, data, format, _proxy)
771 for c in data.columns.values:
--> 772 self.add_column(SArray(data[c].values), str(c), inplace=True)
773 elif (_format == 'sframe_obj'):
~/anaconda3/lib/python3.6/site-packages/turicreate/data_structures/sarray.py in __init__(self, data, dtype, ignore_cast_failure, _proxy)
354 # we need to get a bit more fine grained than that
--> 355 dtype = infer_type_of_sequence(data)
356 if len(data.shape) == 2:
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.infer_type_of_sequence()
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.infer_type_of_sequence()
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.infer_flex_type_of_sequence()
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type._infer_common_type_of_listlike()
turicreate/cython/cy_flexible_type.pyx in turicreate.cython.cy_flexible_type.infer_common_type()
TypeError: sequence item 0: expected str instance, bytes found
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-5-4164fbdc309e> in <module>()
----> 1 tc.SFrame(temp)
~/anaconda3/lib/python3.6/site-packages/turicreate/data_structures/sframe.py in __init__(self, data, format, _proxy)
812 pass
813 else:
--> 814 raise ValueError('Unknown input type: ' + format)
815
816 @staticmethod
~/anaconda3/lib/python3.6/site-packages/turicreate/cython/context.py in __exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let exception propagate
TypeError: sequence item 0: expected str instance, bytes found
Same problem here. I had to use df = df.replace({pd.np.nan: None}) to workaround this issue.
This is still an issue in TuriCreate 6.4
I had the same error but a different cause. My pandas dataframe was the result of a groupby and aggregating with pd.Series.mode to find the most frequent value. Even though running df.dtypes gave the expected output and there were no NaN values, somehow the dataframe from this operation was not accepted by SFrame. I just explicitly converted every column to its proper datatype, and now all is good.
Most helpful comment
Same problem here. I had to use
df = df.replace({pd.np.nan: None})to workaround this issue.