When I copy paste this sample code to read a tabular dataset in ML it doesn't work and it is really frustrating. I checked two hours on the documentation, I tried a lot of things and all the documentation seem deprecated. ... I checked as well the version of azureml-core (1.076)
# azureml-core of version 1.0.72 or higher is required
from azureml.core import Workspace, Dataset
subscription_id = '<subscription_id>'
resource_group = '<resource_group>'
workspace_name = '<workspace_name>
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='csv_file')
dataset.to_pandas_dataframe()
And I got this error message:
2020-02-07 10:05:10.961893 | ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Failure, Duration=703.32 [ms], Info = {'activity_id': '99688bb1-8e33-4f48-85b5-15185a3c6563', 'activity_name': 'to_pandas_dataframe', 'activity_type': 'PublicApi', 'app_name': 'TabularDataset', 'source': 'azureml.dataset', 'version': '1.0.76', 'completionStatus': 'Success', 'durationMs': 0.05}, Exception=DatasetExecutionError; 'MultiIndex' object has no attribute 'labels'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/dataset_error_handling.py in _try_execute(action, **kwargs)
82 else:
---> 83 return action()
84 except Exception as e:
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/_loggerfactory.py in wrapper(*args, **kwargs)
130 try:
--> 131 return func(*args, **kwargs)
132 except Exception as e:
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/dataflow.py in to_pandas_dataframe(self, extended_types, nulls_as_nan)
679 try:
--> 680 return get_dataframe_reader().complete_incoming_dataframe(random_id)
681 except _InconsistentSchemaError as e:
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/_dataframereader.py in complete_incoming_dataframe(self, dataframe_id)
245 from pyarrow import feather
--> 246 df = pyarrow.feather.concat_tables(partitions_dfs).to_pandas(use_threads=True)
247 return df
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.to_pandas()
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, categories)
620 # ARROW-1751: flatten a single level column MultiIndex for pandas 0.21.0
--> 621 columns = _flatten_single_level_multiindex(columns)
622
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pyarrow/pandas_compat.py in _flatten_single_level_multiindex(index)
751 levels, = index.levels
--> 752 labels, = index.labels
753
AttributeError: 'MultiIndex' object has no attribute 'labels'
During handling of the above exception, another exception occurred:
DatasetExecutionError Traceback (most recent call last)
<ipython-input-17-7c5efbb67be1> in <module>
----> 1 dataset.take(4).to_pandas_dataframe()
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
76 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
77 try:
---> 78 return func(*args, **kwargs)
79 except Exception as e:
80 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/tabular_dataset.py in to_pandas_dataframe(self)
138 """
139 dataflow = get_dataflow_for_execution(self._dataflow, 'to_pandas_dataframe', 'TabularDataset')
--> 140 df = _try_execute(dataflow.to_pandas_dataframe)
141 return df
142
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/dataset_error_handling.py in _try_execute(action, **kwargs)
83 return action()
84 except Exception as e:
---> 85 raise DatasetExecutionError(str(e))
DatasetExecutionError: 'MultiIndex' object has no attribute 'labels'
Hi, Anis
Looks like you have incompatible versions of pandas and pyarrow libraries.
Pandas library deprecated labels in version 0.24 and pyarrow stopped using it too from version 0.12.0.
Could you run pip list and share versions of pandas and pyarrow, please so we can investigate more?
Also, could you try to update those packages by running pip install --upgrade pyarrow pandas and re-run the sample code?
@myshylin
Hi, Anis
Looks like you have incompatible versions of pandas and pyarrow libraries.
Pandas library deprecated labels in version 0.24 and pyarrow stopped using it too from version 0.12.0.Could you run
pip listand share versions of pandas and pyarrow, please so we can investigate more?Also, could you try to update those packages by running
pip install --upgrade pyarrow pandasand re-run the sample code?
Awesome the pip upgrade command worked. I close the issue. Thank you !
Most helpful comment
Hi, Anis
Looks like you have incompatible versions of pandas and pyarrow libraries.
Pandas library deprecated labels in version 0.24 and pyarrow stopped using it too from version 0.12.0.
Could you run
pip listand share versions of pandas and pyarrow, please so we can investigate more?Also, could you try to update those packages by running
pip install --upgrade pyarrow pandasand re-run the sample code?