Currently, we propose to our users the following list of tutorials that could be run on Google Colab:
The idea is to ensure that they are running without exceptions due to the API changes with the latest release (v0.4.2) and do not contain any usage of deprecated API.
Note:
Training of CycleGAN tutorials (Training Cycle-GAN on Horses to Zebras with Nvidia/Apex, Another training Cycle-GAN on Horses to Zebras with Native Torch CUDA AMP) can take more than 12h. It is sufficient to verify if training can accomplish a single epoch.
For PyDataGlobal contributors, feel free to ask questions for details if any and say that you would like to tackle the issue.
Please, take a look at CONTRIBUTING guide. This issue can be assigned to multiple persons.
Take
Hey @vfdev-5 It's taking me more than 6 hours to train 200 epochs of CycleGAN_with_ignite_and_torch_cuda_amp.ipynb, maybe we should consider reducing it to 20 or 50. I think 200 epochs are a bit too much for a tutorial.
Running for 20 or 50 epochs wont give a proper model that correctly translates one images to another. Our idea of the tutorial is to however produce the same training as the original cycle gan. Maybe, we add a note that it could run for more than 12h. For this issue, it can be sufficient to run for 2-5 epochs as a smoke test.
@vfdev-5 great approach
Hey, I'm facing some issues with tensorboard in the cloud TPU notebook, I think nothing is getting uploaded to "/tmp/tb_logs"
I'm not familiar with tensorboard, I'll try to learn to fix it which might take me until this weekend.
@abdulelahsm thanks for the feedback! Which TPU notebook you are running ? Let me also check that from my side to understand the issue.
@vfdev-5 examples/notebooks/MNIST_TPU.ipynb