Pytorch-cyclegan-and-pix2pix: CUDA Error: Out of Memory

Created on 4 Nov 2018  ·  20Comments  ·  Source: junyanz/pytorch-CycleGAN-and-pix2pix

Hi Team,

I'm in the process of trying to train a pix2pix model on an AtoB set (edges) where I've already structured these in a montage (A on one side, B on the other side, collated into one image). I have roughly 12,000 images in my training set that I'd like to use. Batch_size is already 1, so I can't reduce that further. I've turned off the visualizer but still have the error.

From nvidia-smi, I find that GPU utilization spikes just after the Network was initialized (54.414M and 2.769M parameters for Network G and Network D respectively).

This is the error:

File "C:\Users\acn.kiosk\Anaconda3\envs\pix2pix-pytorch\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

`

I'm running Windows 10, a Quadro M6000 with 24GB of RAM. Python 3.5.5, CUDA 9.2, Pytorch 0.4.1 (for Cuda92).

Any ideas? I'm at a loss...

Brian

Most helpful comment

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

All 20 comments

What is the size of your training image?

Hi JunYanz,

Thanks for the note. :)

The images are coming out of the webcam at 1920 x 1080, and I'm saving them as 480x360 sets (1/4 scale). I'm then joining these together to form 960x360 images with an A/B pair.

Brian

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

Thank you.

So would this imply 512x256? Given that A and B should be collated in one
image?

Or should I have A and B in two separate images?

On Wed, Nov 7, 2018 at 9:14 PM Jun-Yan Zhu notifications@github.com wrote:

It seems that 24GB can fit 480x360 images. Maybe you can further reduce
the size of training images (to 256x256).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/422#issuecomment-436618581,
or mute the thread
https://github.com/notifications/unsubscribe-auth/Aqq44xlciK8yUYrkZdSKF7unjh-jNkumks5ustzLgaJpZM4YNMwr
.

Yeah, 512x256 for two images; 256x256 for each one.

test_fake_b

This was the output from edges to image. I only trained it on one image for the base settings, so I would have imagined it would be fairly accurate. Any sense of why the BtoA didn't generate the image exactly?

Would I see more fidelity in terms of painting in colors. Canny generates black background with white edges, but I notice Edges2Cats is black edges on white. Should I invert from that perspective?

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

That would be excellent Ismail!

Running nvidia-smi shows that while I’m using the GPU at 80+%, the
effective use hovers around 3/24gb. Not sure what is reserving the rest.

In any case, I downsampled everything to 256x256 and it worked. Now around
Epoch 50 so will let you know how it goes when done (30 mins/epoch).

On Mon, Nov 12, 2018 at 10:56 PM Ismail Elezi notifications@github.com
wrote:

It seems that 24GB can fit 480x360 images. Maybe you can further reduce
the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work
with images which are larger than 700 by 700. I converted the code to half
precision which allows training on 1200 x 1200 images, and am working on
gradient checkpointing and possibly model parallelism, with the goal of
reaching 2000 x 2000 training (training on small resolution and generating
large images seems to not work well).

When everything is tested and working, I can make a pull request if you
think that might be helpful.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/422#issuecomment-437910825,
or mute the thread
https://github.com/notifications/unsubscribe-auth/Aqq449_8uJe0dvlT2A9zubrm4FIP1i8zks5uuYwagaJpZM4YNMwr
.

Thank you both for the help.

I've got this mostly working, and I can test it using the script provided on this Git.

Question: Is there a way to run this from an image coming from OpenCV webcam? Currently the test needs to be run from .sh with a number of arguments / parameters that are embedded in a variety of different files (test, test_options, base_options, visualizer, etc.) and I'm not quite sure how to pull all of what is required out to run .pth model that has been created on a real-time feed.

I assume this is possible, just not sure how.

I think it is possible. I think you need to rewrite the test.py and add some flags to test_options. You don't need to use visualizer. You can write your own IO code.

@TheRevanchist : Does it hurt performance when use mixed precision (using NVIDIA Apex)?

@John1231983 , not really. I didn't do some quantitative evaluation (like inception score for example), but just visually looking at them, they are as good as the images trained with fp32. However, if the images become too big (thousands of pixels on both direction), then the results are not that good, but that is a matter of network architecture, not mixed precision. If you want big images, you should consider using something like progressive GAN types of architecture.

Also, I trained other nets for different problems with mixed precision (always using Apex), and it works like a charm.

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

@junyanz new a

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

could somebody reopen the issue ? @junyanz

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

apex support torch.nn.Module list as reference. So just like this:
[netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d] = amp.initialize([netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d])

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Good work!

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Would you like to send a PR? If you are busy, I can add apex to the official repo.

@junyanz thanks, I will send PR, just need to test it a little bit more locally to be sure everything is ok

Was this page helpful?
0 / 5 - 0 ratings

Related issues

diaosiji picture diaosiji  ·  3Comments

filmo picture filmo  ·  3Comments

zerxon picture zerxon  ·  4Comments

HectorAnadon picture HectorAnadon  ·  4Comments

lyhangustc picture lyhangustc  ·  5Comments