Short description
When trying to load a dataset I get an error "Problem with the SSL CA cert (path? access rights?)" and a subsequent error when six.reraise is called
Environment information
tensorflow-datasets version: 3.2.1tensorflow version: 2.2.0Reproduction instructions
tfds.builder("imagenet2012").infoLink to logs
Traceback (most recent call last):
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 399, in try_reraise
yield
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 244, in builder
return builder_cls(name)(**builder_kwargs)
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 69, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 206, in __init__
self.info.initialize_from_bucket()
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_info.py", line 423, in initialize_from_bucket
data_files = gcs_utils.gcs_dataset_info_files(self.full_name)
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/gcs_utils.py", line 71, in gcs_dataset_info_files
return gcs_listdir(posixpath.join(GCS_DATASET_INFO_DIR, dataset_dir))
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/gcs_utils.py", line 64, in gcs_listdir
if is_gcs_disabled() or not tf.io.gfile.exists(root_dir):
File "/sw/installed/TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
pywrap_tensorflow.FileExists(compat.as_bytes(path))
tensorflow.python.framework.errors_impl.AbortedError: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 77 meaning 'Problem with the SSL CA cert (path? access rights?)', error details: error setting certificate verify locations:
CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: none
when reading metadata of gs://tfds-data/dataset_info/imagenet2012/5.0.0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "hvd_dnn_benchmark.py", line 231, in <module>
run() #pylint: disable=no-value-for-parameter
File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "hvd_dnn_benchmark.py", line 105, in run
dataset = get_dataset(dataset, synthetic=synthetic_data)
File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 87, in get_dataset
return _AVAIL[name](synthetic)
File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 77, in _imagenet
return TFDS_Dataset('imagenet2012', synthetic)
File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 54, in __init__
info = tfds.builder(name).info
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 244, in builder
return builder_cls(name)(**builder_kwargs)
File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 401, in try_reraise
reraise(*args, **kwargs)
File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 392, in reraise
six.reraise(exc_type, exc_type(msg), exc_traceback)
TypeError: __init__() missing 2 required positional arguments: 'op' and 'message'
Expected behavior
No error
Hi @Flamefire,
Can you run this code snippet of your machine
import tensorflow as tf
tf.io.gfile.exists("gs://tfds-data/dataset_info/mnist/3.0.1")
and share the results
(Similar issue https://github.com/tensorflow/datasets/issues/2190)
Good idea, that fails with the first exception above. Found the cause here: https://github.com/tensorflow/tensorflow/issues/40065
So that's on TensorFlow.
However it might still be worth looking into the 2nd issue where the reraise fails which looks like that's caused by TFDS
Note that the fix seems to be for a different issue.
This issue is not on Windows but on RHEL and the cause is that TensorFlow misconfigures the bundled libcurl causing a certificate error on execution. So I doubt that PR fixes anything related to this.
The reraise error message should be fixed by https://github.com/tensorflow/datasets/pull/2377. However it seems my fix bug with pytest, but didn't had time to investigate more.
Here is a workaround to deal with this problem, downgrade the tensorflow-datasets:
pip install tensorflow-datasets==3.0.0
None of the solutions works for me, but this.
Not sure about which solutions you are talking about but I'll include 2 solutions:
/etc/ssl/certs/ca-certificates.crt (likely) to ca-bundle.crt (make sure this exists, TF looks for /etc/ssl/certs/ca-certificates.crt only)$TF_SYSTEM_LIBS. So in this case set export TF_SYSTEM_LIBS='curl' before building TFI would recommend you to try tfds-nightly as we might have send an update about this recently
Would you able to tell us more about that update? You do you workaround the issue that the curl bundled with TF uses the wrong certificate path? Don't you use TF for downloading anymore?
Reading on GCS is optional, so it should skip GCS rather than failing.
If it doesn't work in tfds-nigthly, could you try to add tf.errors.AbortedError in https://github.com/tensorflow/datasets/blob/adb320ff04b6e93c561dacb2b647c8fcbfea92f3/tensorflow_datasets/core/utils/gcs_utils.py#L43:
and send us a PR ?
First attempt installing it failed because it seems to require bazel. Are you going to make bazel an actual requirement of TFDS? If so please don't! Using Bazel on HPC systems is a nightmare.
After getting past this, this does indeed work without further changes
TFDS does not require bazel. Are you not confusing with Tensorflow ?
Oh, you are right. This is from a dependency added in https://github.com/tensorflow/datasets/commit/6e2540f85bbfd0312d5611c2612cc38de98084f3 which adds dm-tree and which output I confused with TFDS because I did not expect a new lib to be installed. That one requires Bazel so TFDS requires Bazel indirectly. Example output:
running build_ext
bazel build //tree:_tree --symlink_prefix=build/temp.linux-ppc64le-3.7/bazel- --compilation_mode=opt
unable to execute 'bazel': No such file or directory
As this is only required in 2 places and at least the "flatten" is rather trivial to implement, maybe it is possible to avoid it?
When are you seeing this error (which command are you running ?).
The only deps tree require is six (https://github.com/deepmind/tree/blob/master/requirements.txt).
You shouldn't need to compile tree from source. Why not pip install ?
I did use pip install tfds-nightly which basically reduces to pip install dm-tree. I guess the reason why I see this is that I ran it on POWER where no prebuild binary exists: https://pypi.org/project/dm-tree/#files
The same will happen for ARM clusters or when this should be installed as a module for all users of the HPC systems which requires (or at least strongly prefers) source builds.
It seems that TF expose the same pre-built package as dm-tree: https://pypi.org/project/tensorflow/#files
So I'm a little confused why the problem exists with dm-tree but not tensorflow. Or am I missing something ?
We already build TensorFlow from source. This is required for our POWER (and likely ARM) nodes (no wheel) and better in general as performance there really matters. So compiling with arch-specific optimizations and GPU specific architectures enabled has advantages. However their use of Bazel is a constant source of problems. So basically with every TF release new patches are required to make it work, some of them costing multiple days to come up with. Ultimately their use of Bazel is what lead to this issue: Hardcoding paths during the build to avoid using the native configure or the integration with CMake of the dependencies (in this case cURL)
So to summarize: Using Bazel is a huge pain on HPC systems and hence requiring it for building Python packages is a major disadvantage. I already opened an issue with dm-tree that the advantages of (their use of) Bazel are small compared to what they get for it and same applies here: Advantages of dm-tree (2 functions) are small compared to having another dependency, especially one that is hard to get in some environments.
Thank you for the explanations.
The reason we have added dm-tree is because we would like to gradually better layered our dependencies. Ultimately, we would like to have a "core" library which doesn't depends on TF at all, but use instead smaller independent libs (dm-tree, gfile,...).
This would allow building different front-end (e.g. for Jax users) which don't need the full TF package.
But this is more a long term plan. We can remove the dm-tree dependency in the meantime. Please send a PR if you want so.
I think issues here have been fixed. Please open new issue otherwise
Most helpful comment
Here is a workaround to deal with this problem, downgrade the tensorflow-datasets:
None of the solutions works for me, but this.