Datasets: [data request] Horses and Zebras

Created on 20 Jan 2019 · 20Comments · Source: tensorflow/datasets

Name of dataset: Horses and Zebras
URL of dataset: https://github.com/junyanz/CycleGAN/blob/master/datasets/download_dataset.sh
License of dataset:
Short description of dataset and use case(s): Image to image examples using in CycleGAN.

Folks who would also like to see this dataset in tensorflow/datasets, please +1/thumbs-up so the developers can know which requests to prioritize.

dataset request

Source

joel-shor

👍3

All 20 comments

Can you please tell me, what exactly is needed to be done here ? Is knowledge of GAN necessary to contribute to this issue ?
I can see that the data set contains - testA (120 images), testB (140 images), trainA (1067 images), trainB (1334 images).

ChanchalKumarMaji on 28 Feb 2019

Hi @ChanchalKumarMaji, thanks for your interest! No, knowledge of GANs is not necessary. I'd recommend having a single CycleGAN DatasetBuilder that has a BuilderConfig for each of the listed datasets. You can start with a single BuilderConfig just for horse2zebra. And you'd include all the splits trainA, trainB, testA, testB. Does that help you get started?

This is our guide on adding a dataset.

rsepassi on 28 Feb 2019

Thanks @rsepassi for your response.

This is what I need to do to solve this issue (please correct me if I am wrong):

Create a new folder named CycleGAN under datasets/tensorflow_datasets/
In this folder, I need to have the BuilderConfig for the following DataSets :
-> apple2orange
-> summer2winter_yosemite
-> horse2zebra
-> monet2photo
-> cezanne2photo
-> ukiyoe2photo
-> vangogh2photo
-> maps
-> cityscapes
-> facades
-> iphone2dslr_flower
-> ae_photos
I will first implement BuilderConfig for horse2zebra.
Will my implementation be very similar to cats_vs_dogs.py and cats_vs_dogs_test.py in the image folder. I am planning to reuse codes.
Lastly, I will create a pull request, when I have completed horse2zebra.

A brief about myself: I have been working with Python for 2 years, and I have a basic understanding of Tensorflow (I have created some models in Tensorflow). This year I was planning to contribute to Tensorflow as well as contribute to GSoC 2019. So, I am a beginner and may need more help.

How to discuss more if I face any more difficulties on this?

ChanchalKumarMaji on 1 Mar 2019

Let's create a file tensorflow_datasets/image/cycle_gan.py and put the dataset in there. You'll definitely want to carefully read through and follow the guide for adding a new dataset. You'll be implementing "heavy" configuration (you'll see that referenced in the guide) for the various sub datasets (horse2zebra, etc.). Start with just 1. cats_vs_dogs.py is a good example to follow, though it does not use BuilderConfigs so you may want to see another dataset.

The guide is your friend, so first always see if it has the answer you need. But please do come back here and ask questions and suggest things that should be added to the guide if things were unclear or not addressed.

rsepassi on 1 Mar 2019

👍1

class horse2zebraConfig(tfds.core.BuilderConfig):
  """BuilderConfig for horse2zebra.""" 

  @api_utils.disallow_positional_args
  def __init__(self, name, no_of_samples, **kwargs):
    super(horse2zebraConfig, self).__init__(**kwargs)
    self.name = name,
    self.no_of_samples = no_of_samples

class horse2zebra(tfds.core.GeneratorBasedBuilder):
  """Horses to Zebra dataset."""

  VERSION = tfds.core.Version('2.0.0')

  BUILDER_CONFIGS = [
      horse2zebraConfig(
          name="trainA",
          description="train horses",
          no_of_samples=_TRAIN_A_EXAMPLES
      ),
      horse2zebraConfig(
          name="trainB",
          description="train zebras",
          no_of_samples=_TRAIN_B_EXAMPLES
      ),
      horse2zebraConfig(
          name="testA",
          description="test horses",
          no_of_samples=_TEST_A_EXAMPLES
      ),
      horse2zebraConfig(
          name="testB",
          description="test zebras",
          no_of_samples=_TEST_B_EXAMPLES
      )
  ]  

  def _info(self):
    return tfds.core.DatasetInfo(
        builder=self,
        description=_DESCRIPTION,
        features=tfds.features.FeaturesDict({
            "image": tfds.features.Image(shape=_IMAGE_SHAPE),
            "image/filename": tfds.features.Text(),
            "label": tfds.features.ClassLabel(num_classes=2),
        }),
        supervised_keys=("image", "label"),
        urls=["https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/"],
    )

@rsepassi Can you please tell me if I am going in the right direction. I think there may be errors, please tell me if you feel if there is any error. Once I get the feedback, I will proceed further. Thanks.

ChanchalKumarMaji on 1 Mar 2019

You'll want to have the DatasetBuilder class be called CycleGAN.
Then have a CycleGANConfig.

The splits go in _split_generators, not in BUILDER_CONFIGS. The CycleGanConfig objects will go in BUILDER_CONFIGS.

Are you following the guide?

rsepassi on 1 Mar 2019

Yes, I am following the guide. Some misconception errors might have crept in. I will revise my code.

You mean to say, horse2zebraConfig be renamed to CycleGANConfig. In BUILDER_CONFIGS I will need to put the different types of data-sets. Also I need to create a subclass of tfds.core.BuilderConfig named CycleGAN. Is that correct.

ChanchalKumarMaji on 1 Mar 2019

Dataset: class CycleGAN(tfds.core.DatasetBuilder)

Config: class CycleGANConfig(tfds.core.BuilderConfig)

rsepassi on 1 Mar 2019

👍1

Ok, @rsepassi I understood. Can you please assign the issue to me ?

ChanchalKumarMaji on 1 Mar 2019

Great. Let me know when you've accepted the collaborator invite and I'll assign you.

rsepassi on 1 Mar 2019

Thanks, I have accepted the collaborator invite.

ChanchalKumarMaji on 1 Mar 2019

dl_manager = tfds.download.DownloadManager(download_dir=path)
data_dirs = dl_manager.download_and_extract(_DOWNLOAD_URL)
path_to_dataset = data_dirs + "/" + os.listdir(data_dirs)[0]

trainA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainA'))
trainB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainB'))
testA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testA'))
testB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testB'))

Suppose I define my _split_generators(self, dl_manager) as above. What to do after this ? I think I cannot use tfds.core.SplitGenerator as it uses shards to split.

Should I use tfds.load or something. If so, how to ?

ChanchalKumarMaji on 2 Mar 2019

You should return the list of SplitGenerators now (see MNIST
https://github.com/tensorflow/datasets/blob/da58964668f4491092406b1265bac05892ec5e3d/tensorflow_datasets/image/mnist.py#L105
as an example). num_shards tells the writer how many shards the examples
should be put into.

On Sat, Mar 2, 2019 at 9:22 AM Chanchal Kumar Maji notifications@github.com
wrote:

dl_manager = tfds.download.DownloadManager(download_dir=path)
data_dirs = dl_manager.download_and_extract(_DOWNLOAD_URL)
path_to_dataset = data_dirs + "/" + os.listdir(data_dirs)[0]

trainA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainA'))
trainB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'trainB'))
testA_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testA'))
testB_files = tf.io.gfile.glob(os.path.join(path_to_dataset, 'testB'))

Suppose I define my _split_generators(self, dl_manager) as above. What to
do after this ? I think I cannot use tfds.core.SplitGenerator as it uses
shards to split.

Should I use tfds.load or something. If so, how to ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/datasets/issues/24#issuecomment-468940579,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEGW5HyuD5uhKAmrbUi3tvyJmzzuzMKks5vSrM8gaJpZM4aJiTJ
.

rsepassi on 2 Mar 2019

And remember that the guide is your friend.

rsepassi on 2 Mar 2019

Dataset: class CycleGAN(tfds.core.DatasetBuilder)

Config: class CycleGANConfig(tfds.core.BuilderConfig)

@rsepassi can I use tfds.core.GeneratorBasedBuilder instead of tfds.core.DatasetBuilder as I can see most of the datasets use that. Although I can see tfds.core.GeneratorBasedBuilder is an inherited version of tfds.core.DatasetBuilder.

ChanchalKumarMaji on 5 Mar 2019

@ChanchalKumarMaji - Yes you can - the GeneratorBasedBuilder takes care of some common functionality.

cyfra on 5 Mar 2019

👍1

I think I have completed most of the code, but facing some issues in _generate_examples(self, filesA, filesB). I think the problem is in generating the data. Please suggest changes. My code is here.
@rsepassi , @cyfra , @Conchylicultor can you please review my code.

Some datasets (like "ae_photos") has only testA and testB. Will I remove such cases.

ChanchalKumarMaji on 9 Mar 2019

While I was going through some datasets, I encountered some TODO, like this, this

Do more work is needed to be done ?

ChanchalKumarMaji on 11 Mar 2019

@rsepassi @cyfra @Conchylicultor @dynamicwebpaige , I can see that my tests on cycle_gan are passing, I am generating a pull request.

ChanchalKumarMaji on 11 Mar 2019

Merged in #212

cyfra on 26 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings