Datasets: curated_breast_imaging_ddsm dataset creates unequal patch shapes.

Created on 10 Apr 2020  路  3Comments  路  Source: tensorflow/datasets

Short description
cbis_ddsm.py creates unequal patch shapes.

Environment information

  • Operating System: Ubuntu 18.04.3 LTS
  • Python version: Python 3.7.7. [GCC 7.3.0] :: Anaconda, Inc. on linux
  • tensorflow-datasets/ version: 1.2.0
  • tensorflow version: 2.1.0

Reproduction instructions

import os
import tensorflow_datasets as tfds
import tensorflow as tf



# Getting data directory from environment varialbe
data_directory = os.environ['CBIS_DDSM']

# Changing manual_dir from default (~/tensorflow_datasets/manual/) to environment path.
tfds.download.DownloadManager.manual_dir = data_directory

ds, ds_info = tfds.load('curated_breast_imaging_ddsm',
                data_dir = data_directory,
                with_info = True)
ds_test    = ds['test']

for element in ds_test.take(10):
    image = element['image']
    print(image.shape)

Output
2020-04-10 23:16:23.055173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-10 23:16:23.064116: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-04-10 23:16:23.064148: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hakon-PC): /proc/driver/nvidia/version does not exist
2020-04-10 23:16:23.064512: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-04-10 23:16:23.089820: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3593415000 Hz
2020-04-10 23:16:23.092120: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d0adba9c70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-10 23:16:23.092169: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:absl:Warning: Setting shuffle_files=True because split=TRAIN and shuffle_files=None. This behavior will be deprecated on 2019-08-06, at which point shuffle_files=False will be the default for all splits.
(224, 224, 1)
(224, 82, 1)
(224, 224, 1)
(224, 224, 1)
(224, 189, 1)
(224, 224, 1)
(224, 224, 1)
(224, 224, 1)
(224, 178, 1)
(224, 224, 1)

Expected behavior
I expected all shapes to be (224, 224, 1)

bug

All 3 comments

This not a bug.
The original dataset does not have a fixed shape.

However, you could use some preprocessing techniques like tf.image.resize. To integrate it with tf.data you can refer to the guide here

From what I understand from the source code, the patch may be truncated if the boundaries are outside of the image.

You could add padding with ds.map or ds.padded_batch or filter those images with ds.filter to make sure all images have the same shape.

@jpuigcerver added this dataset so may have more context.

This not a bug.
The original dataset does not have a fixed shape.

However, you could use some preprocessing techniques like tf.image.resize. To integrate it with tf.data you can refer to the guide here

Ah ok I see. Many thanks!

Was this page helpful?
0 / 5 - 0 ratings