Pytorch-lightning: Loading a model checkpoint that is trained on TPU using a GPU

Created on 20 Jun 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

What is your question?

Is it possible to load a model that is trained on a TPU saved using ModelCheckpoint on a GPU for inference?

Code

        model = LightModel(hparams)
        trainer = pl.Trainer(resume_from_checkpoint=str(ckpt), gpus=1)
        trainer.test(model)

What have you tried?## ❓ Questions and Help

Tried to normally load the weights as with a GPU but throws an error.

What's your environment?

Kaggle GPU
torchvision==0.6.0a0+82fd1c8
torch==1.5.0
pytorch-lightning-0.8.1

Important TPU question

Source

ArthDh

All 10 comments

Have you tried to load on CPU?

Laksh1997 on 21 Jun 2020

Can you put the error?

Geeks-Sid on 21 Jun 2020

RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'XLATensorId' backend. 'aten::empty_strided' is only available for these backends: [CPUTensorId, CUDATensorId, BackendSelect, VariableTensorId].

ArthDh on 21 Jun 2020

@Laksh1997 I tried, it still gives the RuntimeError

ArthDh on 21 Jun 2020

This looks like a PyTorch Issue and this looks something similar. Your code looks fine, someone senior should take a look I guess. @Borda

Geeks-Sid on 21 Jun 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 20 Aug 2020

@lezwon mind have look? :]

Borda on 21 Aug 2020

@ArthDh The fix for this issue is in progress here: https://github.com/PyTorchLightning/pytorch-lightning/pull/3044. The issue is that Lightning as of now saves the model as XLA tensors instead of CPU ones. Hence when you try to load them on GPU they are unable to find an XLA device and hence fail.

lezwon on 21 Aug 2020

@lezwon Thank you for the update!

ArthDh on 21 Aug 2020

👍1

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale[bot] on 22 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings