Hi,
I trained a a translation model from the transformer architecture on fp16 on turing gpu. The result I get is a model which is half in size than the original fp32 model I trained which is good. But now I want to infer both the models on CPU only. When I tried to infer the fp16 model without passing --fp16 argument during inference I get same inference time for both models. But when I try to pass --fp16 flag, I am getting error as
File "/home/fairseq_translation/fairseq/fairseq/sequence_generator.py", line 148, in generate
return self._generate(encoder_input, beam_size, maxlen, prefix_tokens)
File "/home/fairseq_translation/fairseq/fairseq/sequence_generator.py", line 174, in _generate
encoder_out = model.encoder(**encoder_input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/fairseq_translation/fairseq/fairseq/models/transformer.py", line 314, in forward
x = self.embed_scale * self.embed_tokens(src_tokens)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: _th_index_select is not implemented for type torch.HalfTensor
Why am I getting this error , did I miss something ? Please help.
Thanks
I don't think half precision computation is supported on CPU. @myleott?
That's correct, you have to choose FP16 or CPU. This is a limitation of PyTorch.
Hi, @myleott Thanks for the reply. Is this a limitation with libraries like PyTorch , Tensorflow or with the CPU architecture in itself. As I see this comment it confuses me whether it is limitation of PyTorch as you said or CPUs.
Most helpful comment
Hi, @myleott Thanks for the reply. Is this a limitation with libraries like PyTorch , Tensorflow or with the CPU architecture in itself. As I see this comment it confuses me whether it is limitation of PyTorch as you said or CPUs.