Librispeech is missing unittest and fake data.
Please follow:
https://github.com/tensorflow/datasets/blob/master/docs/add_dataset.md#testing-mydataset
@cyfra I want to start working on this issue. Please assign this to me.
Thanks @ChanchalKumarMaji!
When I run this code on my fake_dataset
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow_datasets import testing
from tensorflow_datasets.audio import librispeech
class LibrispeechTest(testing.DatasetBuilderTestCase):
DATASET_CLASS = librispeech.Librispeech
SPLITS = {
"train": 1,
"test": 1,
}
if __name__ == "__main__":
testing.test_main()
I get this error:
.Testing config clean100_plain_text
Downloading / extracting dataset librispeech (?? GiB) to /tmp/librispeech_testufs4ovss/tmp_6wchvyh/librispeech/clean100_plain_text/0.1.0...
E..s
======================================================================
ERROR: test_download_and_prepare_as_dataset (__main__.LibrispeechTest)
test_download_and_prepare_as_dataset (__main__.LibrispeechTest)
Run the decorated test method.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/test_utils.py", line 167, in decorated
f(self, *args, **kwargs)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/dataset_builder_testing.py", line 192, in test_download_and_prepare_as_dataset
self._download_and_prepare_as_dataset(builder)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/dataset_builder_testing.py", line 209, in _download_and_prepare_as_dataset
builder.download_and_prepare(download_config=download_config)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/core/dataset_builder.py", line 219, in download_and_prepare
max_examples_per_split=download_config.max_examples_per_split)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/core/dataset_builder.py", line 650, in _download_and_prepare
for split_generator in self._split_generators(dl_manager):
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/audio/librispeech.py", line 195, in _split_generators
self._vocab_text_gen(extracted_dirs[tfds.Split.TRAIN]))
TypeError: string indices must be integers
----------------------------------------------------------------------
Ran 5 tests in 0.190s
FAILED (errors=1, skipped=1)
I think I am missing something. @rsepassi @cyfra @Conchylicultor please help.
@ChanchalKumarMaji You need to pass a Dictionary for the file path. This can done by assigning a Dictionary to DL_EXTRACT_RESULT.
The keys of which will be tfds.Split.TRAIN, tfds.Split.TEST, tfds.Split.VALIDATION. And the values will be the path to the Directories for the respective Fake Samples.
Reference: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image/mnist_test.py
class LibrispeechTest(testing.DatasetBuilderTestCase):
DATASET_CLASS = librispeech.Librispeech
SPLITS = {
"train": 1,
"test": 1,
"dev": 1,
}
DL_EXTRACT_RESULT = {
tfds.Split.TRAIN: "train-clean-100",
tfds.Split.TEST: "test-clean",
tfds.Split.VALIDATION: "dev-clean",
}
if __name__ == "__main__":
testing.test_main()
gives
Traceback (most recent call last):
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/test_utils.py", line 167, in decorated
f(self, *args, **kwargs)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/dataset_builder_testing.py", line 192, in test_download_and_prepare_as_dataset
self._download_and_prepare_as_dataset(builder)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/dataset_builder_testing.py", line 212, in _download_and_prepare_as_dataset
self._assertAsDataset(builder)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/tensorflow_datasets/testing/dataset_builder_testing.py", line 236, in _assertAsDataset
self.assertLen(examples, expected_examples_number)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/absl/testing/absltest.py", line 784, in assertLen
container_repr, len(container), expected_len), msg)
File "/home/chanchal/anaconda3/lib/python3.6/site-packages/absl/testing/absltest.py", line 1609, in fail
return super(TestCase, self).fail(self._formatMessage(prefix, msg))
AssertionError: [] has length of 0, expected 1.
CHAPTERS.TXT is
; Some pipe(|) separated metadata about the audio chapters included in the corpus.
;
; The meaning of the fields in left-to-right order is as follows:
;
; chapter_id: the ID of the chapter in the LibriVox's database
; reader_id: the ID of the reader in the LibriVox's database
; duration: how many minutes of this chapter are used in the corpus
; subset: the corpus subset to which this chapter is assigned
; project_id: the LibriVox project ID
; book_id: the Project Gutenberg's ID for the book on which the LibriVox project is based
; chapter_title: the title of the chapter on LibriVox
; project_title: the title of the LibriVox project
;
;ID |READER|MINUTES| SUBSET | PROJ.|BOOK ID| CH. TITLE | PROJECT TITLE
01 | 11 | 19.77 | dev-clean | 53 | 2 | In Chancer | Bleak House
02 | 12 | 10.30 | dev-clean | 53 | 3 | In Fashion | Bleak House
03 | 13 | 7.67 | dev-other | 68 | 7 | Letter XXV | Unbeaten Tracks in Japan
04 | 14 | 8.42 | dev-other | 219 | 9 | Chapter 01 | Northanger Abbey
05 | 15 | 11.68 | test-clean | 219 | 1 | Chapter 02 | Northanger Abbey
06 | 16 | 11.25 | test-clean | 219 | 5 | Chapter 03 | Northanger Abbey
07 | 17 | 7.57 | test-other | 219 | 9 | Chapter 04 | Northanger Abbey
08 | 18 | 12.76 | test-other | 219 | 3 | Chapter 07 | Northanger Abbey
09 | 19 | 12.82 | train-clean-100 | 219 | 4 | Chapter 08 | Northanger Abbey
10 | 20 | 18.33 | train-clean-100 | 219 | 6 | Chapter 10 | Northanger Abbey
11 | 21 | 12.95 | train-clean-360 | 219 | 8 | Chapter 11 | Northanger Abbey
12 | 22 | 8.20 | train-clean-360 | 219 | 1 | Chapter 12 | Northanger Abbey
13 | 23 | 12.09 | train-other-500 | 219 | 4 | Chapter 15 | Northanger Abbey
14 | 24 | 6.19 | train-other-500 | 219 | 5 | Chapter 17 | Northanger Abbey
Created just one sample of each types.
Please help @rsepassi @cyfra @Conchylicultor
Okk, now when I run librispeech_test.py I get no errors. Thanks @captain-pool for the help. @rsepassi @cyfra @Conchylicultor I am generating a pull request.
Most helpful comment
Thanks @ChanchalKumarMaji!