Datasets: [data request] Horses and Zebras

Created on 20 Jan 2019  Â·  20Comments  Â·  Source: tensorflow/datasets

Folks who would also like to see this dataset in tensorflow/datasets, please +1/thumbs-up so the developers can know which requests to prioritize.

dataset request

All 20 comments

Can you please tell me, what exactly is needed to be done here ? Is knowledge of GAN necessary to contribute to this issue ?
I can see that the data set contains - testA (120 images), testB (140 images), trainA (1067 images), trainB (1334 images).

Hi @ChanchalKumarMaji, thanks for your interest! No, knowledge of GANs is not necessary. I'd recommend having a single CycleGAN DatasetBuilder that has a BuilderConfig for each of the listed datasets. You can start with a single BuilderConfig just for horse2zebra. And you'd include all the splits trainA, trainB, testA, testB. Does that help you get started?

This is our guide on adding a dataset.

Thanks @rsepassi for your response.

This is what I need to do to solve this issue (please correct me if I am wrong):

  1. Create a new folder named CycleGAN under datasets/tensorflow_datasets/
  2. In this folder, I need to have the BuilderConfig for the following DataSets :
    -> apple2orange
    -> summer2winter_yosemite
    -> horse2zebra
    -> monet2photo
    -> cezanne2photo
    -> ukiyoe2photo
    -> vangogh2photo
    -> maps
    -> cityscapes
    -> facades
    -> iphone2dslr_flower
    -> ae_photos
  3. I will first implement BuilderConfig for horse2zebra.
  4. Will my implementation be very similar to cats_vs_dogs.py and cats_vs_dogs_test.py in the image folder. I am planning to reuse codes.
  5. Lastly, I will create a pull request, when I have completed horse2zebra.

A brief about myself: I have been working with Python for 2 years, and I have a basic understanding of Tensorflow (I have created some models in Tensorflow). This year I was planning to contribute to Tensorflow as well as contribute to GSoC 2019. So, I am a beginner and may need more help.

How to discuss more if I face any more difficulties on this?

Let's create a file tensorflow_datasets/image/cycle_gan.py and put the dataset in there. You'll definitely want to carefully read through and follow the guide for adding a new dataset. You'll be implementing "heavy" configuration (you'll see that referenced in the guide) for the various sub datasets (horse2zebra, etc.). Start with just 1. cats_vs_dogs.py is a good example to follow, though it does not use BuilderConfigs so you may want to see another dataset.

The guide is your friend, so first always see if it has the answer you need. But please do come back here and ask questions and suggest things that should be added to the guide if things were unclear or not addressed.

class horse2zebraConfig(tfds.core.BuilderConfig):
  """BuilderConfig for horse2zebra.""" 

  @api_utils.disallow_positional_args
  def __init__(self, name, no_of_samples, **kwargs):
    super(horse2zebraConfig, self).__init__(**kwargs)
    self.name = name,
    self.no_of_samples = no_of_samples
class horse2zebra(tfds.core.GeneratorBasedBuilder):
  """Horses to Zebra dataset."""

  VERSION = tfds.core.Version('2.0.0')

  BUILDER_CONFIGS = [
      horse2zebraConfig(
          name="trainA",
          description="train horses",
          no_of_samples=_TRAIN_A_EXAMPLES
      ),
      horse2zebraConfig(
          name="trainB",
          description="train zebras",
          no_of_samples=_TRAIN_B_EXAMPLES
      ),
      horse2zebraConfig(
          name="testA",
          description="test horses",
          no_of_samples=_TEST_A_EXAMPLES
      ),
      horse2zebraConfig(
          name="testB",
          description="test zebras",
          no_of_samples=_TEST_B_EXAMPLES
      )
  ]  

  def _info(self):
    return tfds.core.DatasetInfo(
        builder=self,
        description=_DESCRIPTION,
        features=tfds.features.FeaturesDict({
            "image": tfds.features.Image(shape=_IMAGE_SHAPE),
            "image/filename": tfds.features.Text(),
            "label": tfds.features.ClassLabel(num_classes=2),
        }),
        supervised_keys=("image", "label"),
        urls=["https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/"],
    )

@rsepassi Can you please tell me if I am going in the right direction. I think there may be errors, please tell me if you feel if there is any error. Once I get the feedback, I will proceed further. Thanks.

You'll want to have the DatasetBuilder class be called CycleGAN.
Then have a CycleGANConfig.

The splits go in _split_generators, not in BUILDER_CONFIGS. The CycleGanConfig objects will go in BUILDER_CONFIGS.

Are you following the guide?

Yes, I am following the guide. Some misconception errors might have crept in. I will revise my code.

You mean to say, horse2zebraConfig be renamed to CycleGANConfig. In BUILDER_CONFIGS I will need to put the different types of data-sets. Also I need to create a subclass of tfds.core.BuilderConfig named CycleGAN. Is that correct.

Dataset: class CycleGAN(tfds.core.DatasetBuilder)

Config: class CycleGANConfig(tfds.core.BuilderConfig)

Ok, @rsepassi I understood. Can you please assign the issue to me ?

Great. Let me know when you've accepted the collaborator invite and I'll assign you.

Thanks, I have accepted the collaborator invite.

dl_manager = tfds.download.DownloadManager(download_dir=path)
data_dirs = dl_manager.download_and_extract(_DOWNLOAD_URL)
path_to_dataset = data_dirs + "/" + os.listdir(data_dirs)[0]

trainA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainA'))
trainB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainB'))
testA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testA'))
testB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testB'))

Suppose I define my _split_generators(self, dl_manager) as above. What to do after this ? I think I cannot use tfds.core.SplitGenerator as it uses shards to split.

Should I use tfds.load or something. If so, how to ?

You should return the list of SplitGenerators now (see MNIST
https://github.com/tensorflow/datasets/blob/da58964668f4491092406b1265bac05892ec5e3d/tensorflow_datasets/image/mnist.py#L105
as an example). num_shards tells the writer how many shards the examples
should be put into.

On Sat, Mar 2, 2019 at 9:22 AM Chanchal Kumar Maji notifications@github.com
wrote:

dl_manager = tfds.download.DownloadManager(download_dir=path)
data_dirs = dl_manager.download_and_extract(_DOWNLOAD_URL)
path_to_dataset = data_dirs + "/" + os.listdir(data_dirs)[0]

trainA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainA'))
trainB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainB'))
testA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testA'))
testB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testB'))

Suppose I define my _split_generators(self, dl_manager) as above. What to
do after this ? I think I cannot use tfds.core.SplitGenerator as it uses
shards to split.

Should I use tfds.load or something. If so, how to ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/datasets/issues/24#issuecomment-468940579,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEGW5HyuD5uhKAmrbUi3tvyJmzzuzMKks5vSrM8gaJpZM4aJiTJ
.

And remember that the guide is your friend.

Dataset: class CycleGAN(tfds.core.DatasetBuilder)

Config: class CycleGANConfig(tfds.core.BuilderConfig)

@rsepassi can I use tfds.core.GeneratorBasedBuilder instead of tfds.core.DatasetBuilder as I can see most of the datasets use that. Although I can see tfds.core.GeneratorBasedBuilder is an inherited version of tfds.core.DatasetBuilder.

@ChanchalKumarMaji - Yes you can - the GeneratorBasedBuilder takes care of some common functionality.

I think I have completed most of the code, but facing some issues in _generate_examples(self, filesA, filesB). I think the problem is in generating the data. Please suggest changes. My code is here.
@rsepassi , @cyfra , @Conchylicultor can you please review my code.

Some datasets (like "ae_photos") has only testA and testB. Will I remove such cases.

While I was going through some datasets, I encountered some TODO, like this, this

Do more work is needed to be done ?

@rsepassi @cyfra @Conchylicultor @dynamicwebpaige , I can see that my tests on cycle_gan are passing, I am generating a pull request.

Merged in #212

Was this page helpful?
0 / 5 - 0 ratings

Related issues

keshan picture keshan  Â·  5Comments

dvirginz picture dvirginz  Â·  4Comments

EmanueleGhelfi picture EmanueleGhelfi  Â·  5Comments

jinbo-huang picture jinbo-huang  Â·  3Comments

ericmclachlan picture ericmclachlan  Â·  5Comments