I tried to apply the reformer model on a sentiment analysis task and train it on a tpu. I get a
ProcessExitedException: process X terminated with signal SIGSEGV
What did I do wrong?
You can find my code in a colab notebook here.
I tried to stick to notebook 1 for the general setup and notebook 2 for the tpu setup. I saw #1956 and #2124 however it does not work with the latest version (1.0.4).
Hi! thanks for your contribution!, great first issue!
cc @lezwon
This seems to be an XLA issue and is tracked here https://github.com/pytorch/xla/issues/1775
@FabianBell Mind adding the following at the beginning of your notebook and trying?
import os
os.environ['XLA_USE_32BIT_LONG'] = '1'
os.environ['TRIM_GRAPH_SIZE'] = '1000000'
@lezwon thank you for your help.
I changed the notebook but I still get the same error.
I get a different error: Notebook
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got XLAIntType instead (while checking arguments for embedding)
Will look into this.
@FabianBell I'm not able to figure out the root cause of the error mentioned above. I think it might be similar to this one: https://github.com/huggingface/transformers/issues/2952
@lezwon thank you for your help. I followed notebook and I ended up with the same error. I do not think that it is a pytorch lightning problem. I will therefore close this issue here.