Pytorch: Sparse SGD + momentum = cuda memory issue.

Created on 16 Oct 2017  路  1Comment  路  Source: pytorch/pytorch

When using classic SGD optimizer with momentum with sparse embeddings the memory keeps garbage collecting / allocating leading to slow down and out of memory error eventually. Here is a minimal exemple to reproduce the issue

slowing

memoryerror

The issue dissapears when momentum is not used
nomomentum

or when embeddings are not sparse
notsparse

I'm using the last pytorch version on conda: '0.2.0_4'

Most helpful comment

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

>All comments

I tried out your script with momentum 0.1 on master, it takes roughly 10800mb gpu memory max. This is caused by using sparse buffer. I'm sending out a PR for this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

szagoruyko picture szagoruyko  路  3Comments

ikostrikov picture ikostrikov  路  3Comments

SeparateReality picture SeparateReality  路  3Comments

keskarnitish picture keskarnitish  路  3Comments

a1363901216 picture a1363901216  路  3Comments