Hi,
transformer.wmt14.en-fr from hereAdded the following in the snippet
generator = SequenceGenerator([model], model_dict, beam_size=args.beam_size)
src_tokens = torch.randint(3, 50, (2, 10)).long()
src_tokens = torch.cat((src_tokens, torch.LongTensor([[eos], [eos]])), -1)
src_lengths = torch.LongTensor([2, 10])
sample = {
"net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths}
}
generator(sample) # Working
scripted_generator = torch.jit.script(generator)
scripted_generator(sample) # Not Working
model = torch.load('wmt14.en-fr.joined-dict.transformer/model.pt')
model["args"].data = directory_of_model
torch.save(model, 'wmt14.en-fr.joined-dict.transformer/model_converted.pt')
scripted_module(sample) No error message ! Cancelled since it is stuck for minutes.
print(scripted_generator(sample))
File "/miniconda3/envs/test_ts/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
KeyboardInterrupt
pip, source): pip@jmp84 Any thoughts?
@cndn any idea about this?
@gvskalyan I'm wondering whether there is an incompatibility issue with the pre-trained model. Would you be able to retrain a transformer model for a couple of iterations and retry to repro?
I have re-trained a new model following these steps for few updates and tried to infer using the scripted version, it is still getting stuck.
I'm having the same or a similar issue to gvskalyan and this is sort of a dump of what I dug up:
(For me at least it's not actually hanging). Running
scripted_generator(sample)
Actually runs eventually. It just hammers a single cpu thread for 30 minutes before eventually working as expected. My guess is it has something to do with torchscript's lazy loading, but I'm not entirely sure. The time decreased substantially when I only ran it on the encoder and decoder, not sequence_generator as a whole
The main problem for me is the amount of memory this uses is substantially higher than the non-torchscript version of the same network. Because I need 100+ beams out per inference, it is hard to use this on a sequence where the target is longer than 15-20 steps.
The load time isn't as big a problem for me (though it isn't helped by saving a model that has already loaded once and will be incurred per instance), but does anyone know of a way to get the memory usage down in the torch-script beam search? Are there some values that stay in memory that I could clear out for a basic encoder-decoder transformer?
This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.
What I did:
I'm not sure as of yet which of these things solved the problem. If other people are having similar problems I can try to put something together that could be merged into prod (right now there are breaking changes to other workflows that would need to be sorted out). If there isn't any interest and someone just wants me to throw the messy branch up somewhere feel free to reach out.
I'm having the same issue to gvskalyan.
Most helpful comment
This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.
What I did:
Changes in forward
I'm not sure as of yet which of these things solved the problem. If other people are having similar problems I can try to put something together that could be merged into prod (right now there are breaking changes to other workflows that would need to be sorted out). If there isn't any interest and someone just wants me to throw the messy branch up somewhere feel free to reach out.