It seems that some tests are failing for TFDS used with tf-nightly:
PlantVillage:
tensorflow_datasets/testing/dataset_builder_testing.py:263: in test_download_and_prepare_as_dataset
self._download_and_prepare_as_dataset(self.builder)
tensorflow_datasets/testing/dataset_builder_testing.py:302: in _download_and_prepare_as_dataset
self._assertNumSamples(builder)
tensorflow_datasets/testing/dataset_builder_testing.py:343: in _assertNumSamples
expected_num_examples,
E AssertionError: 35 != 38
The300wLpTest:
num_examples = 0, number_of_shards = 1
def _get_shard_boundaries(num_examples, number_of_shards):
if num_examples == 0:
> raise AssertionError("No examples were yielded.")
E AssertionError: No examples were yielded.
See full logs at: https://source.cloud.google.com/results/invocations/80b640f7-df77-44bb-a034-0ddab98db139/targets
To reproduce: pytest -n auto --disable-warnings path/to/dataset_test.py (with pytest-xdist installed)
TF 1.15 and TF 2.1 works fine, which seems to indicates a regression in TensorFlow.
@Conchylicultor replacing train split to 35 in plant_village_test.py solves problem.
@Eshan-Agarwal but in this case, the test will fail for TF2.1. The question is why with TF nightly, there is less examples generated than in TF 2.1 ? This indicates a bug somewhere either in the generation or the reading pipeline.
So is it problem with Tf nightly pipeline or with generation and spiting of particular data ?
@Eshan-Agarwal I don't know what the actual issue is, nor why it only happen for this particular dataset. Finding it is the goal of the issue.
@Conchylicultor In dataset_builder_testing.py in _assertNumSamples function num_examples generated by PlantVillage_test.py is 38 (as given already) and function expect it to be 35. This expected_num_examples
comes from for split_name, expected_num_examples in self.SPLITS.items():. So self.SPLITS.items() generate 35 which is not clear
@Eshan-Agarwal can you try to investigate why those two numbers are different ? e.g. Is builder.info.splits[split_name].num_examples printing the correct values ? How many examples are actually written in TF-record ? Are all examples yields properly in _generate_examples ? ...
@Conchylicultor yeah sure, I am looking into it.
@Conchylicultor I observe few things like :
builder.info.splits[split_name].num_examples not printing correct values it gives 35 and expected was 38 as provided in plant_village_test.py. So I further dig in codes and,In TF-record 35 examples are written while expected will be 38, for further information about all variables please have a look on this colab notebook. Please see only for !python ./tensorflow_datasets/image/plant_village_test.py
I think problem is in _generate_examples . Also when I tried for running test for mnist_test.py it runs successfully, until I changed train splits to 11 in mnist_test.py (greater than 10) it gives error :
File "/content/datasets/tensorflow_datasets/image/mnist.py", line 365, in _extract_mnist_images
).reshape(num_images, _MNIST_IMAGE_SIZE, _MNIST_IMAGE_SIZE, 1)
ValueError: cannot reshape array of size 7840 into shape (11,28,28,1)
but values smaller than 10 run fine like 5 . So why it is happening ?
Edit : For mnist_test.py for getting tracebacklook on last output lines as I printed records also so output is much bigger
Also for the300w_lp_test.py num_examples is zero hence total_size is also zero so maybe here is something wrong with _generate_examples
@Conchylicultor
Update : I found the problem it is _generate_examples in both files in plant_leaves.py and The300wLpTest some of examples are generated but not passed. I fix them soon and send PR
Problem is in tf.io.gfile.glob in _generate_examplesr fo both files. It is not matches pattern correctly as provided
Awesome! Thank you for investigating this! Indeed both datasets are using tf.io.gfile.
This is not something we can fix in TFDS so we need to report this to the TF team and wait for them to fix the bug.
@Conchylicultor I just created this issue in TF please have a look.
@Eshan-Agarwal, for people to debug an issue, it is really helpful to provide a small self-contained code snippet which allow to reproduce the issue. Could you provide a colab which demonstrate the issue, without TFDS, without any dependencies, just with tf.io.gfile ?
@Conchylicultor thanks for suggestions, yes sure I can make a small demo which define given problem and solution with python glob
I see that this issue has been fixed as tests are back to green :smile:
Great \o/ !! Thank you @Eshan-Agarwal for helping fixing this.
@Conchylicultor @vijayphoenix Yes tests are green Thanks To TF team.
Thank you