Pytorch-lightning: SIGSEGV when training on TPU

Created on 1 Nov 2020  ยท  7Comments  ยท  Source: PyTorchLightning/pytorch-lightning

โ“ Questions and Help

Before asking:

  1. Try to find answers to your questions in the Lightning Forum!
  2. Search for similar issues.
  3. Search the docs.

What is your question?

I tried to apply the reformer model on a sentiment analysis task and train it on a tpu. I get a

ProcessExitedException: process X terminated with signal SIGSEGV

What did I do wrong?

Code


You can find my code in a colab notebook here.

What have you tried?

I tried to stick to notebook 1 for the general setup and notebook 2 for the tpu setup. I saw #1956 and #2124 however it does not work with the latest version (1.0.4).

What's your environment?

  • OS: [e.g. iOS, Linux, Win] Linux
  • Packaging [e.g. pip, conda] pip
  • Version [e.g. 0.5.2.1] 1.0.4
TPU question

All 7 comments

Hi! thanks for your contribution!, great first issue!

cc @lezwon

This seems to be an XLA issue and is tracked here https://github.com/pytorch/xla/issues/1775

@FabianBell Mind adding the following at the beginning of your notebook and trying?

import os
os.environ['XLA_USE_32BIT_LONG'] = '1'
os.environ['TRIM_GRAPH_SIZE'] = '1000000'

@lezwon thank you for your help.

I changed the notebook but I still get the same error.

I get a different error: Notebook

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got XLAIntType instead (while checking arguments for embedding)

Will look into this.

@FabianBell I'm not able to figure out the root cause of the error mentioned above. I think it might be similar to this one: https://github.com/huggingface/transformers/issues/2952

@lezwon thank you for your help. I followed notebook and I ended up with the same error. I do not think that it is a pytorch lightning problem. I will therefore close this issue here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

williamFalcon picture williamFalcon  ยท  3Comments

remisphere picture remisphere  ยท  3Comments

williamFalcon picture williamFalcon  ยท  3Comments

baeseongsu picture baeseongsu  ยท  3Comments

justusschock picture justusschock  ยท  3Comments