Is there any update on using TorchScript annotations (nn.Module->torch.jit.ScriptModule, script_method and trace) to load transformer model without using Python interpretor and end-end inference including beam search.
@myleott?
Something like this end-end https://twitter.com/Thom_Wolf/status/1151169470498582529
@zhangguanheng66 I think transformer is jit traceable and does it decrease latency (in case of transformer) after being converted.
@gvskalyan Yes. the transformer module in pytorch core library is jit traceable, which should decrease the latency. But I haven't benchmarked it yet.
An another fork, https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer#changelog
Please refer June 2019 in changelog - jit support added
@gvskalyan @zhangguanheng66 I have tried bert jit and benchmarked it. It will be 25% faster on gpu.
Still working on fairseq jit. Beam search may be a big issue.
@Meteorix do you know where one may find simple jit-transformer with beam search?
@Meteorix do you know where one may find simple jit-transformer with beam search?
https://github.com/pytorch/translate I used this repo a couple of months ago.
@Meteorix have you seen jittable LM with beam search?
Most helpful comment
Still working on fairseq jit. Beam search may be a big issue.