In high memory pressure situations, the following is a common occurrence:
Because of memory pressure, a common workaround is to first do:
s = torch.load('my_file.pt', map_location=lambda storage, loc: storage)
And then load s
into model
.
This is a very common scenario that we should be able to avoid, and this scenario might have some pitfalls: what happens on part-GPU part-CPU models, what happens on multi-GPU models...
if load_state_dict took a filename directly, it can delete it's existing parameter storages and set them to the new one on the fly, thereby requiring no extra memory.
the same applies to optimizer state_dicts. for some optimizers like Adagrad, the checkpoints are large, and we can have the same memory pressure situation. optimizers dont even have a .cuda()
, so we manually first have to load state_dict onto CPU, and then manually copy over parts to the GPU.
I ran into this while helping @aszlam today.
If load_state_dict
takes a filename we should also allow for the map_location
param too. A common situation for me is to save a checkpoint on cluster machine and then load it on my macbook (so need to load params onto CPU)
Me and @szagoruyko are fans of HDF5 format for serialized models, maybe if it could get along nicely with this proposal
Most helpful comment
If
load_state_dict
takes a filename we should also allow for themap_location
param too. A common situation for me is to save a checkpoint on cluster machine and then load it on my macbook (so need to load params onto CPU)