tfds nightly, downloaded the wikipedia dataset using:
python -m tensorflow_datasets.scripts.download_and_prepare --datasets=wikipedia/20190301.en
Now trying to access it using
ds, info = tfds.load('wikipedia/20190301.en:1.0.0', download=False, shuffle_files=True, with_info=True)
But receiving the error
ERROR:absl:Failed to construct dataset wikipedia
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in
1 # Construct a tf.data.Dataset
----> 2 ds, info = tfds.load('wikipedia/20190301.en:1.0.0', download=False, shuffle_files=True, with_info=True)
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 ismethod = instance is not None
51 _check_no_positional(fn, args, ismethod, allowed=allowed)
---> 52 _check_required(fn, kwargs)
53 return fn(*args, **kwargs)
54
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\registered.py in load(name, split, data_dir, batch_size, in_memory, shuffle_files, download, as_supervised, decoders, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
295 [the guide](https://github.com/tensorflow/datasets/tree/master/docs/decode.md)
296 for more info.
--> 297 read_config: `tfds.ReadConfig`, Additional options to configure the
298 input pipeline (e.g. seed, num parallel reads,...).
299 with_info: `bool`, if True, tfds.load will return the tuple
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\registered.py in builder(name, **builder_init_kwargs)
167 elif class_dict.get("IN_DEVELOPMENT"):
168 _IN_DEVELOPMENT_REGISTRY[name] = builder_cls
--> 169 else:
170 _DATASET_REGISTRY[name] = builder_cls
171 return builder_cls
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 ismethod = instance is not None
51 _check_no_positional(fn, args, ismethod, allowed=allowed)
---> 52 _check_required(fn, kwargs)
53 return fn(*args, **kwargs)
54
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in __init__(self, data_dir, config, version)
178 `builder_config`s will have their own subdirectories and versions.
179 version: `str`. Optional version at which to load the dataset. An error is
--> 180 raised if specified version cannot be satisfied. Eg: '1.2.3', '1.2.*'.
181 The special value "experimental_latest" will use the highest version,
182 even if not default. This is not recommended unless you know what you
~\Anaconda3\envs\docBert\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _pick_version(self, requested_version)
209 def __setstate__(self, state):
210 self.__init__(**state)
--> 211
212 @utils.memoized_property
213 def canonical_version(self):
AssertionError: Dataset wikipedia cannot be loaded at version 1.0.0, only: 0.0.3.
@dvirginz you are providing wrong config replace 20190301 with 20200301 see here
But this is what tfds downloaded tensorflow_datasets\wikipedia\20190301.en\1.0.0
Try after reinstalling tfds-nightly, currently wikipediadataset have version 1.0.0 see and according to your stacktrace it says you to use 0.0.3.
Also use latest config only 20200301
Edit : It works fine with 20190301 config too see this colab, but its older config so it is recommended to use latest only, also I think after reinstalling tfds-nightly or simply cloning this repo it works fine for you for 20190301
@dvirginz it seems that your issue is solved.
So, please close the issue
Most helpful comment
Try after reinstalling
tfds-nightly, currentlywikipediadataset have version1.0.0see and according to your stacktrace it says you to use0.0.3.Also use latest config only
20200301Edit : It works fine with
20190301config too see this colab, but its older config so it is recommended to use latest only, also I think after reinstallingtfds-nightlyor simply cloning this repo it works fine for you for20190301