Fairseq: Infer a TorchScripted Module

Created on 16 Sep 2020  ยท  5Comments  ยท  Source: pytorch/fairseq

โ“ Questions and Help

Hi,

What is your question?

  • Exported a pre-trained model using the gist, but can not infer a torch scripted module
  • Used the pre-trained model transformer.wmt14.en-fr from here
  • Am I trying to infer the scripted module right?
  • Referred the following sources - 1,2

Code

Added the following in the snippet

    generator = SequenceGenerator([model], model_dict, beam_size=args.beam_size)
    src_tokens = torch.randint(3, 50, (2, 10)).long()
    src_tokens = torch.cat((src_tokens, torch.LongTensor([[eos], [eos]])), -1)
    src_lengths = torch.LongTensor([2, 10])
    sample = {
            "net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths}
        }
    generator(sample)                                        # Working
    scripted_generator = torch.jit.script(generator)
    scripted_generator(sample)                               # Not Working

What have you tried?

Before scripting
model = torch.load('wmt14.en-fr.joined-dict.transformer/model.pt')
model["args"].data = directory_of_model
torch.save(model, 'wmt14.en-fr.joined-dict.transformer/model_converted.pt')
Scripting
  • Able to save and load the jit model with and without quantization
  • When the above code is added it is getting stuck at scripted_module(sample)
Error

No error message ! Cancelled since it is stuck for minutes.

    print(scripted_generator(sample))
  File "/miniconda3/envs/test_ts/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
KeyboardInterrupt

What's your environment?

  • fairseq Version (e.g., 1.0 or master): Installed from Master
  • PyTorch Version (e.g., 1.0): 1.6.0
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.7.9
  • CUDA/cuDNN version: 10.2
  • GPU models and configuration: Quadro RTX 8000

@jmp84 Any thoughts?

needs triage question

Most helpful comment

This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.

What I did:

  1. Only calling torch.jit.script on the encoder and decoder (not on the whole of sequencegenerator)
  2. remove EnsembleModel and use the single model directly
  3. combine re-ordering the incremental state and getting the probabilities into TransformerDecoder's forward method:
    Changes in forward
  4. Pasting code from the search object into sequence_generator directly so I didn't to call it
  5. Varied changes to returning and not returning IncrementalState
  6. In general hard coding values

I'm not sure as of yet which of these things solved the problem. If other people are having similar problems I can try to put something together that could be merged into prod (right now there are breaking changes to other workflows that would need to be sorted out). If there isn't any interest and someone just wants me to throw the messy branch up somewhere feel free to reach out.

All 5 comments

@cndn any idea about this?
@gvskalyan I'm wondering whether there is an incompatibility issue with the pre-trained model. Would you be able to retrain a transformer model for a couple of iterations and retry to repro?

I have re-trained a new model following these steps for few updates and tried to infer using the scripted version, it is still getting stuck.

I'm having the same or a similar issue to gvskalyan and this is sort of a dump of what I dug up:

  1. (For me at least it's not actually hanging). Running
    scripted_generator(sample)
    Actually runs eventually. It just hammers a single cpu thread for 30 minutes before eventually working as expected. My guess is it has something to do with torchscript's lazy loading, but I'm not entirely sure. The time decreased substantially when I only ran it on the encoder and decoder, not sequence_generator as a whole

  2. The main problem for me is the amount of memory this uses is substantially higher than the non-torchscript version of the same network. Because I need 100+ beams out per inference, it is hard to use this on a sequence where the target is longer than 15-20 steps.

The load time isn't as big a problem for me (though it isn't helped by saving a model that has already loaded once and will be incurred per instance), but does anyone know of a way to get the memory usage down in the torch-script beam search? Are there some values that stay in memory that I could clear out for a basic encoder-decoder transformer?

This problem was resolved for me, I have a branch where the model loads quickly and uses comparable GPU memory to the non-torchscript version while preserving the speedup.

What I did:

  1. Only calling torch.jit.script on the encoder and decoder (not on the whole of sequencegenerator)
  2. remove EnsembleModel and use the single model directly
  3. combine re-ordering the incremental state and getting the probabilities into TransformerDecoder's forward method:
    Changes in forward
  4. Pasting code from the search object into sequence_generator directly so I didn't to call it
  5. Varied changes to returning and not returning IncrementalState
  6. In general hard coding values

I'm not sure as of yet which of these things solved the problem. If other people are having similar problems I can try to put something together that could be merged into prod (right now there are breaking changes to other workflows that would need to be sorted out). If there isn't any interest and someone just wants me to throw the messy branch up somewhere feel free to reach out.

I'm having the same issue to gvskalyan.

Was this page helpful?
0 / 5 - 0 ratings