Pytorch: Sparse SGD + momentum = cuda memory issue.

Created on 16 Oct 2017 · 1Comment · Source: pytorch/pytorch

When using classic SGD optimizer with momentum with sparse embeddings the memory keeps garbage collecting / allocating leading to slow down and out of memory error eventually. Here is a minimal exemple to reproduce the issue

slowing

memoryerror

The issue dissapears when momentum is not used
nomomentum

or when embeddings are not sparse
notsparse

I'm using the last pytorch version on conda: '0.2.0_4'

Source

cedias

Most helpful comment

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

SsnL on 16 Oct 2017

👍2 ❤1 🎉1

>All comments

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

SsnL on 16 Oct 2017

👍2 ❤1 🎉1

Was this page helpful?

0 / 5 - 0 ratings