Machinelearningnotebooks: dataset.download() Unsupported Linux distribution

Created on 10 Jun 2020  路  19Comments  路  Source: Azure/MachineLearningNotebooks

I am trying to download an AzureML dataset on Ubuntu 20.04. I am using azureml.core library. However, when I try to run it I get following error

```Traceback (most recent call last):
File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 169, in attemp_get_deps
blob_deps_to_file()
File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 161, in blob_deps_to_file
blob = request.urlopen(deps_url, context=ssl_context)
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(args)
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(
args)
File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup/get_datasets.py", line 27, in
dataset.download(target_path=f'{path}/../.datasets/{dataset_name}', overwrite=True)
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
return func(args, *kwargs)
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 123, in download
for p in self._to_path(activity='download.to_path')]
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 98, in _to_path
dataflow, portable_path = _add_portable_path_column(self._dataflow)
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
return func(args, *kwargs)
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 203, in _dataflow
dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 136, in _set_auth_type
get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 18, in get_engine_api
_engine_api = EngineAPI()
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 55, in __init__
self._message_channel = launch_engine()
File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 300, in launch_engine
dependencies_path = runtime.ensure_dependencies()
File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 181, in ensure_dependencies
if not attemp_get_deps():
File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 175, in attemp_get_deps
raise NotImplementedError('Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1]))
NotImplementedError: Unsupported Linux distribution ubuntu 20.04
The terminal process terminated with exit code: 1
```

Are you planning to support 20.04 version of Ubuntu? Is there any roadmap? I found this issue from 6 months ago and would really appreciate to hear if anything had changed since then.

Right now I am using the workaround from here to make it work.

Warm regards

Data4ML SDK awaiting-product-team-response cxp product-question triaged

All 19 comments

@Radeju
Thanks for the feedback! We are currently investigating and will update you shortly.

thanks for the feedback. We have recorded the feedback and added it as a feature request on our roadmap.

@MayMSFT Hi May, are we good to close this or you want me to keep it open? Thanks.

Hi! I'm planning to switch our pipeline from 18.04 to 20.04 soon as well.
Looks like this may be a blocking issue.
Do we have timeline regarding the fix?

Based on the log seems distro version is asserted by a whitelist. IMHO this is a bad design which can probably affect a lot of not-so-popular distros like arch or mint.

Unfortunately, it depends on legal approval. @tot0 to share more details

@xkszltl Hi, I unfortunately don't have any concrete timeline for official support of new linux distros. The legal processes involved distributing open source packages so that normally Datasets 'just works' require care and aren't moving as fast as we'd hope.

Datasets will only return saying 'Unsupported Distro' if the required dependencies for .NET Core 2.1 are not present on default library paths AND a pre-prepared dependency set doesn't exist.
We are working on improving the error message to link out to the official .NET Core documentation on how to install the correct dependencies for supported distributions.

@xkszltl Would you be able to try the first command here to install .NET Cores dependencies for Ubuntu 20.04 and see if you're able to use dataset.download()?
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

Of course, if it is just a matter of installing .NET it's totally fine for us.
Actually we will do that regardless of the use of AML Datasets.

Is 2.1 a exact or minimum requirement?
Can we use later version? Namely 2.2 or 3+

Currently Datasets requires .NET Core 2.1

@Radeju
We will now proceed to close this thread. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.

Getting same issue trying to use "from azureml.opendatasets import Diabetes" with error "Unsupported Linux distribution ubuntu 20.04". Tried suggested by @tot0 but didnt resolve:
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

@YutongTie-MSFT

Had this error again trying to access my own dataset in a storage account blob, error as follows. Code is being run as a local jupyter notebook on Ubuntu 20.04. Code is the "day1-part4-data" notebook:
https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/get-started-day1/day1-part4-data.ipynb

which fails on line:
dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

`HTTPError Traceback (most recent call last)
~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps()
198 try:
--> 199 blob_deps_to_file()
200 success = True

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in blob_deps_to_file()
190 ssl_context = ssl.create_default_context(cafile=cafile)
--> 191 blob = request.urlopen(deps_url, context=ssl_context)
192 with open(deps_tar_path, 'wb') as f:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_response(self, request, response)
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http', request, response, code, msg, hdrs)

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in error(self, proto, args)
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(
args)
570

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, args)
501 func = getattr(handler, meth_name)
--> 502 result = func(
args)
503 if result is not None:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650

HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

NotImplementedError Traceback (most recent call last)
in
----> 1 dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(args, *kwargs)
124 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
125 try:
--> 126 return func(args, *kwargs)
127 except Exception as e:
128 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/dataset_factory.py in from_files(path, validate)
702 from azureml.data import FileDataset
703
--> 704 dataflow = dataprep().api.dataflow.Dataflow._path_to_get_files_block(_validate_and_normalize_path(path))
705 if validate:
706 _validate_has_data(dataflow, 'Cannot load any data from the specified path. '

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/dataflow.py in _path_to_get_files_block(path, archive_options)
2423 try:
2424 if _is_datapath(path) or _is_datapaths(path):
-> 2425 return datastore_to_dataflow(path)
2426 except ImportError:
2427 pass

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in datastore_to_dataflow(data_source, query_timeout)
25 datastore_values = []
26 for source in data_source:
---> 27 datastore, datastore_value = get_datastore_value(source)
28 if not _is_fs_datastore(datastore):
29 raise NotSupportedDatastoreTypeError(datastore)

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in get_datastore_value(data_source)
78
79 workspace = datastore.workspace
---> 80 _set_auth_type(workspace)
81 return (datastore, DatastoreValue(
82 subscription=workspace.subscription_id,

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in _set_auth_type(workspace)
141 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.SERVICEPRINCIPAL, json.dumps(auth)))
142 else:
--> 143 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))
144
145

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in get_engine_api()
17 global _engine_api
18 if not _engine_api:
---> 19 _engine_api = EngineAPI()
20
21 from .._dataset_resolver import register_dataset_resolver

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in __init__(self)
66 pass
67
---> 68 self._message_channel = launch_engine()
69 connect_to_requests_channel()
70

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py in launch_engine()
331 engine_path = _get_engine_path()
332 try:
--> 333 dependencies_path = runtime.ensure_dependencies()
334 except Exception as e:
335 _LoggerFactory.trace(log, 'Failed to ensure dependencies' + str(e))

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in ensure_dependencies()
211 return success
212
--> 213 if not attemp_get_deps():
214 # Failed accessing blob, likely an interrupted connection. Try again once more.
215 if not attemp_get_deps():

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps()
205 err_msg = 'Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1])
206 log_event('ensure_dependencies', error=err_msg, missing_pkgs=list(missing_pkgs))
--> 207 raise NotImplementedError(err_msg)
208 except Exception as e:
209 logger.debug("Exception when accessing blob: " + str(e))

NotImplementedError: Unsupported Linux distribution ubuntu 20.04
`

Hi @corticalstack, could you try running the below python snippet in your Ubuntu 20.04 environment?

from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

This should reveal what dependencies missing for Datasets.

For installing .NET Core 2.1 ahead of time did you install dotnet-runtime-3.1 or dotnet-runtime-2.1?

Cheers.

@tot0 Wrt .NET Core 2.1, I believe it was 3.1 as per:
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

Within a Jupyter notebook I added the 3 lines as requested, then executed:

dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

And got what seems like multiple errors trying to log in DEBUG mode:

DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Created a static thread pool for ServiceContext class DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing instance discovery: ... DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Authority validated via static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - TokenRequest:Getting token from cache with refresh if necessary. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:finding with query keys: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Looking for potential cache entries: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Found 2 potential entries. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Resource specific token found. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Returning token from cache lookup, AccessTokenId: b'ji5H/ccIOfhlbO6LhVa6SPJm1T+uGkOaz40LghSXBzc=', RefreshTokenId: b'WKAoyST6eg+Go79SJMjKcyHKHQ1z1tWx146fEyzlv8M=' DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False

@corticalstack Hmmm, those RunContext debug logs make sense from the Dataset calls, they shouldn't have happened during the runtime.ensure_depedencies() call. What version of dotnetcore2 is installed in your environment?
Would it be possible too just see the outcome of running the 3 lines I shared, and not the from_files call? Thanks!

Unfortunately the .NET Core docs don't have any specific 2.1 advice anymore. The package dotnet-runtime-2.1 does exist though and I recommend installing that instead of dotnet-runtime-3.1.

@tot0 version installed is 2.1.15 of dotnetcore2

The only Jupyter output from the 3 lines you shared is as follows:
'/home/jp/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/bin/deps'

Thanks

Ok so if runtime.ensure_dependencies() returns a path like the one you shared that means it all the dependencies exist locally for .NET Core to run.
dotnetcore2==2.1.17 is the newest version and upgraded the underlying .NET Core run time to support newer openssl version installed on newer linux distros (Ubuntu 20 included). It has not yet added full support for all the dependencies required on Ubuntu 20 (so the pre install steps via apt-get is still required) but using the newer version of dotnetcore2 should enable Datasets to run on Ubuntu 20.

@tot0 pip uninstalled dotnetcore 2.1.15 and installed latest, all good. Thanks!

from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

NotImplementedError: Unsupported Linux distribution ubuntu 20.10

pip install dotnetcore2
Collecting dotnetcore2
Using cached dotnetcore2-2.1.19-py3-none-manylinux1_x86_64.whl (28.7 MB)
Requirement already satisfied: distro>=1.2.0 in ./.conda/envs/p8/lib/python3.8/site-packages (from dotnetcore2) (1.5.0)
Installing collected packages: dotnetcore2
Successfully installed dotnetcore2-2.1.19

Any idea ?

Issue solved

sudo apt install dotnet-runtime-2.1
The following packages have unmet dependencies:
dotnet-runtime-deps-2.1 : Depends: libicu but it is not installable or
libicu66 but it is not installable or
libicu65 but it is not installable or
libicu63 but it is not installable or
libicu60 but it is not installable or
libicu57 but it is not installable or
libicu55 but it is not installable or
libicu52 but it is not installable
E: Unable to correct problems, you have held broken packages.

1) Install libicu

    wget http://ftp.us.debian.org/debian/pool/main/i/icu/libicu63_63.2-3_amd64.deb
    sudo dpkg -i libicu63_63.2-3_amd64.deb

2) sudo apt install dotnet-runtime-2.1

Don't know if there is a best way to do?

Was this page helpful?
0 / 5 - 0 ratings