Pytorch: Sparse SGD + momentum = cuda memory issue.

Created on 16 Oct 2017  路  1Comment  路  Source: pytorch/pytorch

When using classic SGD optimizer with momentum with sparse embeddings the memory keeps garbage collecting / allocating leading to slow down and out of memory error eventually. Here is a minimal exemple to reproduce the issue

slowing

memoryerror

The issue dissapears when momentum is not used
nomomentum

or when embeddings are not sparse
notsparse

I'm using the last pytorch version on conda: '0.2.0_4'

Most helpful comment

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

>All comments

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

soumith picture soumith  路  3Comments

rajarshd picture rajarshd  路  3Comments

dablyo picture dablyo  路  3Comments

kdexd picture kdexd  路  3Comments

szagoruyko picture szagoruyko  路  3Comments