Fairseq: Infer a TorchScripted Module

Created on 16 Sep 2020 · 5Comments · Source: pytorch/fairseq

❓ Questions and Help

Hi,

What is your question?

Exported a pre-trained model using the gist, but can not infer a torch scripted module
Used the pre-trained model transformer.wmt14.en-fr from here
Am I trying to infer the scripted module right?
Referred the following sources - 1,2

Code

Added the following in the snippet

    generator = SequenceGenerator([model], model_dict, beam_size=args.beam_size)
    src_tokens = torch.randint(3, 50, (2, 10)).long()
    src_tokens = torch.cat((src_tokens, torch.LongTensor([[eos], [eos]])), -1)
    src_lengths = torch.LongTensor([2, 10])
    sample = {
            "net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths}
        }
    generator(sample)                                        # Working
    scripted_generator = torch.jit.script(generator)
    scripted_generator(sample)                               # Not Working

What have you tried?

Before scripting

model = torch.load('wmt14.en-fr.joined-dict.transformer/model.pt')
model["args"].data = directory_of_model
torch.save(model, 'wmt14.en-fr.joined-dict.transformer/model_converted.pt')

Scripting

Able to save and load the jit model with and without quantization
When the above code is added it is getting stuck at scripted_module(sample)

Error

No error message ! Cancelled since it is stuck for minutes.

    print(scripted_generator(sample))
  File "/miniconda3/envs/test_ts/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
KeyboardInterrupt

What's your environment?

fairseq Version (e.g., 1.0 or master): Installed from Master
PyTorch Version (e.g., 1.0): 1.6.0
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source):
Python version: 3.7.9
CUDA/cuDNN version: 10.2
GPU models and configuration: Quadro RTX 8000

@jmp84 Any thoughts?

needs triage question

Source

gvskalyan

👍1

Most helpful comment

This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.

What I did:

Only calling torch.jit.script on the encoder and decoder (not on the whole of sequencegenerator)
remove EnsembleModel and use the single model directly
combine re-ordering the incremental state and getting the probabilities into TransformerDecoder's forward method:
Changes in forward
Pasting code from the search object into sequence_generator directly so I didn't to call it
Varied changes to returning and not returning IncrementalState
In general hard coding values

I'm not sure as of yet which of these things solved the problem. If other people are having similar problems I can try to put something together that could be merged into prod (right now there are breaking changes to other workflows that would need to be sorted out). If there isn't any interest and someone just wants me to throw the messy branch up somewhere feel free to reach out.

chessgecko on 30 Sep 2020

👍2

All 5 comments

@cndn any idea about this?
@gvskalyan I'm wondering whether there is an incompatibility issue with the pre-trained model. Would you be able to retrain a transformer model for a couple of iterations and retry to repro?

jmp84 on 16 Sep 2020

👍2

I have re-trained a new model following these steps for few updates and tried to infer using the scripted version, it is still getting stuck.

gvskalyan on 17 Sep 2020

👍1

I'm having the same or a similar issue to gvskalyan and this is sort of a dump of what I dug up:

(For me at least it's not actually hanging). Running
scripted_generator(sample)
Actually runs eventually. It just hammers a single cpu thread for 30 minutes before eventually working as expected. My guess is it has something to do with torchscript's lazy loading, but I'm not entirely sure. The time decreased substantially when I only ran it on the encoder and decoder, not sequence_generator as a whole
The main problem for me is the amount of memory this uses is substantially higher than the non-torchscript version of the same network. Because I need 100+ beams out per inference, it is hard to use this on a sequence where the target is longer than 15-20 steps.

The load time isn't as big a problem for me (though it isn't helped by saving a model that has already loaded once and will be incurred per instance), but does anyone know of a way to get the memory usage down in the torch-script beam search? Are there some values that stay in memory that I could clear out for a basic encoder-decoder transformer?

chessgecko on 28 Sep 2020

This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.

What I did:

Only calling torch.jit.script on the encoder and decoder (not on the whole of sequencegenerator)
remove EnsembleModel and use the single model directly
combine re-ordering the incremental state and getting the probabilities into TransformerDecoder's forward method:
Changes in forward
Pasting code from the search object into sequence_generator directly so I didn't to call it
Varied changes to returning and not returning IncrementalState
In general hard coding values