Pytorch: Feature Request: load_state_dict should take filenames

Created on 31 May 2017 · 3Comments · Source: pytorch/pytorch

In high memory pressure situations, the following is a common occurrence:

create model
read state_dict from checkpoint file (loads on GPU)
model.load_state_dict(s)

Because of memory pressure, a common workaround is to first do:

s = torch.load('my_file.pt', map_location=lambda storage, loc: storage)

And then load s into model.

This is a very common scenario that we should be able to avoid, and this scenario might have some pitfalls: what happens on part-GPU part-CPU models, what happens on multi-GPU models...

if load_state_dict took a filename directly, it can delete it's existing parameter storages and set them to the new one on the fly, thereby requiring no extra memory.

feature nn triaged

Source

soumith

Most helpful comment

If load_state_dict takes a filename we should also allow for the map_location param too. A common situation for me is to save a checkpoint on cluster machine and then load it on my macbook (so need to load params onto CPU)

alykhantejani on 1 Jun 2017

👍3

All 3 comments

the same applies to optimizer state_dicts. for some optimizers like Adagrad, the checkpoints are large, and we can have the same memory pressure situation. optimizers dont even have a .cuda(), so we manually first have to load state_dict onto CPU, and then manually copy over parts to the GPU.

I ran into this while helping @aszlam today.

soumith on 31 May 2017

alykhantejani on 1 Jun 2017

👍3

Me and @szagoruyko are fans of HDF5 format for serialized models, maybe if it could get along nicely with this proposal