Pytorch-cyclegan-and-pix2pix: Regarding similarity between training and test datasets

Created on 1 May 2019 · 8Comments · Source: junyanz/pytorch-CycleGAN-and-pix2pix

I just have a question regarding the training and test data of CycleGan.

Do you think the test data must belong to the same domain as that of the training data(for synthetic dataset), when I mean domain, I am talking about the backgrounds both of them share. When I give the test data as the drone with different background, its not producing the same quality of results as it does when given a test image with a drone that shares similar background as seen in the training dataset.
Is this result correct, should we make sure that the backgrounds should remain as similar to each other as possible ?
Please let me know.

The result obtained when the background is same :
Drone_1002 (8)_fake_B

the result obtained when the background is different :

file-1_fake_B

Input synthetic image (trainA dataset) :

Drone_1003 (7)_real_A

Source

sudharavali

Most helpful comment

If you have a downstream task, you can evaluate the performance of your model regarding the task. Otherwise, it requires either (1) manual inspection to choose the best model, or (2) standard GAN metrics (e..g, FID)

junyanz on 2 May 2019

👍2

All 8 comments

You are correct. The test data should be similar to training data. I recommend that you collect additional training data or apply additional data augmentation. (e.g., different kinds of cropping/scaling)

junyanz on 1 May 2019

👍1

Thank you so much for the prompt response Professor. I just have another follow up question, when do you think I can stop the training process ? Is human involvement/ observation required or are there any other methods to do it like keeping track of any particular loss function at their minimum, any other metrics etc ?
Please let me know. Thank you .

sudharavali on 1 May 2019

junyanz on 2 May 2019

👍2

Thanks alot Professor. This clarifies alot of questions I had.

sudharavali on 2 May 2019

Hi, I just started playing with the horse2zebra dataset. I am a little new to GANs. In usual ML models, I am used to training data having a one-one correspondence. That is, if horse pics imageA001 to imageA009.jpg are in folder trainA, then imageB001 to imageB009.jpg should be the corresponding zebra images with the same background and so on, just the horse body replaced with a ditto zebra body. But then I don't see this kind of a one-one labeling in the train data folders. Is such a one-one labelling unimportant for GANs?

prannerta100 on 11 Feb 2020

It depends on the type of GAN models. For CycleGAN, we don't need one-to-one correspondence. For pix2pix, we need.

junyanz on 11 Feb 2020

Thanks!

In your tutorial ipynb, I ran the following command to train my model (I know n_epochs is too small, but this is more of a sanity check whether things work, as each epoch takes 400 s on my Google Colab single GPU).

!python train.py --dataroot ./datasets/horse2zebra --name horse2zebra --model cycle_gan --gpu_ids 0 --n_epochs 1 --n_epochs_decay 1 --display_id 0

This is your horse2zebra dataset, nothing new from my side.

But then I don't generate ./checkpoints/horse2zebra/latest_net_G_A.pth as expected. In fact there is no .pth file generated. Whereas the pretrained folder ./checkpoints/horse2zebra_pretrained has this pth file.

prannerta100 on 14 Feb 2020

By default, your models will be saved every --save_latest_freq iterations (default 5000) or every --save_epoch_freq (default 5) epoches. In your case, as you only trained your model for 2 epoches, none of them would be saved.

junyanz on 14 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings