Pytorch-lightning: Colab TPU : process terminated with signal SIGKILL

Created on 24 Apr 2020  路  4Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

I'm trying to train BART (with transformers library) on Colab TPU. I followed the TPU documentation of Pytorch Lightning, but before the training can start, I receive the following error :

Exception: process 0 terminated with signal SIGKILL

To Reproduce

I'm using the official example for text summarization on transformers library : https://github.com/huggingface/transformers/blob/master/examples/summarization/bart/finetune.py

Here is the full stack trace :

INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/config.json from cache at /root/.cache/torch/transformers/9236d197566a7f1be2b2151f5afcc5a8e17f31e1e23c52f3cdf2340019986e78.3de31bca490b759d81268bc95fdc9ab61f970ee46716ae8b25a1f4f1aba766e7
INFO:transformers.configuration_utils:Model config ElectraConfig {
  "_num_labels": 2,
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bad_words_ids": null,
  "bos_token_id": null,
  "decoder_start_token_id": null,
  "do_sample": false,
  "early_stopping": false,
  "embedding_size": 768,
  "eos_token_id": null,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "is_encoder_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_eps": 1e-12,
  "length_penalty": 1.0,
  "max_length": 20,
  "max_position_embeddings": 512,
  "min_length": 0,
  "model_type": "electra",
  "no_repeat_ngram_size": 0,
  "num_attention_heads": 12,
  "num_beams": 1,
  "num_hidden_layers": 12,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 0,
  "prefix": null,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "task_specific_params": null,
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/config.json from cache at /root/.cache/torch/transformers/9236d197566a7f1be2b2151f5afcc5a8e17f31e1e23c52f3cdf2340019986e78.3de31bca490b759d81268bc95fdc9ab61f970ee46716ae8b25a1f4f1aba766e7
INFO:transformers.configuration_utils:Model config ElectraConfig {
  "_num_labels": 2,
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bad_words_ids": null,
  "bos_token_id": null,
  "decoder_start_token_id": null,
  "do_sample": false,
  "early_stopping": false,
  "embedding_size": 768,
  "eos_token_id": null,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "is_encoder_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_eps": 1e-12,
  "length_penalty": 1.0,
  "max_length": 20,
  "max_position_embeddings": 512,
  "min_length": 0,
  "model_type": "electra",
  "no_repeat_ngram_size": 0,
  "num_attention_heads": 12,
  "num_beams": 1,
  "num_hidden_layers": 12,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 0,
  "prefix": null,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "task_specific_params": null,
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/vocab.txt from cache at /root/.cache/torch/transformers/ff085885d4c95651587af553adadd34a26de8a663f2cef709635b48b3bed2bbd.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
INFO:transformers.modeling_utils:loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/pytorch_model.bin from cache at /root/.cache/torch/transformers/3c8e97e5021532563898ceb491dbfbc068ab4cb9eaa31f555990b9993e3228b4.b7514d01ce5acfe02313470cce3175018852a5e8cbcb8784268ab87dc21daf4c
INFO:transformers.modeling_utils:Weights from pretrained model not used in ElectraModel: ['electra.embeddings_project.weight', 'electra.embeddings_project.bias']
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/warnings.py:18: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_train.pt)
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_val.pt)
INFO:lightning:training on 8 TPU cores
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_test.pt)
INFO:lightning:INIT TPU local core: 0, global rank: 0
INFO:lightning:INIT TPU local core: 2, global rank: 2
INFO:lightning:INIT TPU local core: 3, global rank: 3
INFO:lightning:INIT TPU local core: 1, global rank: 1
INFO:lightning:INIT TPU local core: 4, global rank: 4
INFO:lightning:INIT TPU local core: 5, global rank: 5
INFO:lightning:INIT TPU local core: 6, global rank: 6
INFO:lightning:INIT TPU local core: 7, global rank: 7
INFO:lightning:
    | Name                                              | Type              | Params
------------------------------------------------------------------------------------
0   | model                                             | ElectraModel      | 108 M 
1   | model.embeddings                                  | ElectraEmbeddings | 23 M  
2   | model.embeddings.word_embeddings                  | Embedding         | 23 M  
3   | model.embeddings.position_embeddings              | Embedding         | 393 K 
4   | model.embeddings.token_type_embeddings            | Embedding         | 1 K   
5   | model.embeddings.LayerNorm                        | LayerNorm         | 1 K   
6   | model.embeddings.dropout                          | Dropout           | 0     
7   | model.encoder                                     | BertEncoder       | 85 M  
8   | model.encoder.layer                               | ModuleList        | 85 M  
9   | model.encoder.layer.0                             | BertLayer         | 7 M   
10  | model.encoder.layer.0.attention                   | BertAttention     | 2 M   
11  | model.encoder.layer.0.attention.self              | BertSelfAttention | 1 M   
12  | model.encoder.layer.0.attention.self.query        | Linear            | 590 K 
13  | model.encoder.layer.0.attention.self.key          | Linear            | 590 K 
14  | model.encoder.layer.0.attention.self.value        | Linear            | 590 K 
15  | model.encoder.layer.0.attention.self.dropout      | Dropout           | 0     
16  | model.encoder.layer.0.attention.output            | BertSelfOutput    | 592 K 
17  | model.encoder.layer.0.attention.output.dense      | Linear            | 590 K 
18  | model.encoder.layer.0.attention.output.LayerNorm  | LayerNorm         | 1 K   
19  | model.encoder.layer.0.attention.output.dropout    | Dropout           | 0     
20  | model.encoder.layer.0.intermediate                | BertIntermediate  | 2 M   
21  | model.encoder.layer.0.intermediate.dense          | Linear            | 2 M   
22  | model.encoder.layer.0.output                      | BertOutput        | 2 M   
23  | model.encoder.layer.0.output.dense                | Linear            | 2 M   
24  | model.encoder.layer.0.output.LayerNorm            | LayerNorm         | 1 K   
25  | model.encoder.layer.0.output.dropout              | Dropout           | 0     
26  | model.encoder.layer.1                             | BertLayer         | 7 M   
27  | model.encoder.layer.1.attention                   | BertAttention     | 2 M   
28  | model.encoder.layer.1.attention.self              | BertSelfAttention | 1 M   
29  | model.encoder.layer.1.attention.self.query        | Linear            | 590 K 
30  | model.encoder.layer.1.attention.self.key          | Linear            | 590 K 
31  | model.encoder.layer.1.attention.self.value        | Linear            | 590 K 
32  | model.encoder.layer.1.attention.self.dropout      | Dropout           | 0     
33  | model.encoder.layer.1.attention.output            | BertSelfOutput    | 592 K 
34  | model.encoder.layer.1.attention.output.dense      | Linear            | 590 K 
35  | model.encoder.layer.1.attention.output.LayerNorm  | LayerNorm         | 1 K   
36  | model.encoder.layer.1.attention.output.dropout    | Dropout           | 0     
37  | model.encoder.layer.1.intermediate                | BertIntermediate  | 2 M   
38  | model.encoder.layer.1.intermediate.dense          | Linear            | 2 M   
39  | model.encoder.layer.1.output                      | BertOutput        | 2 M   
40  | model.encoder.layer.1.output.dense                | Linear            | 2 M   
41  | model.encoder.layer.1.output.LayerNorm            | LayerNorm         | 1 K   
42  | model.encoder.layer.1.output.dropout              | Dropout           | 0     
43  | model.encoder.layer.2                             | BertLayer         | 7 M   
44  | model.encoder.layer.2.attention                   | BertAttention     | 2 M   
45  | model.encoder.layer.2.attention.self              | BertSelfAttention | 1 M   
46  | model.encoder.layer.2.attention.self.query        | Linear            | 590 K 
47  | model.encoder.layer.2.attention.self.key          | Linear            | 590 K 
48  | model.encoder.layer.2.attention.self.value        | Linear            | 590 K 
49  | model.encoder.layer.2.attention.self.dropout      | Dropout           | 0     
50  | model.encoder.layer.2.attention.output            | BertSelfOutput    | 592 K 
51  | model.encoder.layer.2.attention.output.dense      | Linear            | 590 K 
52  | model.encoder.layer.2.attention.output.LayerNorm  | LayerNorm         | 1 K   
53  | model.encoder.layer.2.attention.output.dropout    | Dropout           | 0     
54  | model.encoder.layer.2.intermediate                | BertIntermediate  | 2 M   
55  | model.encoder.layer.2.intermediate.dense          | Linear            | 2 M   
56  | model.encoder.layer.2.output                      | BertOutput        | 2 M   
57  | model.encoder.layer.2.output.dense                | Linear            | 2 M   
58  | model.encoder.layer.2.output.LayerNorm            | LayerNorm         | 1 K   
59  | model.encoder.layer.2.output.dropout              | Dropout           | 0     
60  | model.encoder.layer.3                             | BertLayer         | 7 M   
61  | model.encoder.layer.3.attention                   | BertAttention     | 2 M   
62  | model.encoder.layer.3.attention.self              | BertSelfAttention | 1 M   
63  | model.encoder.layer.3.attention.self.query        | Linear            | 590 K 
64  | model.encoder.layer.3.attention.self.key          | Linear            | 590 K 
65  | model.encoder.layer.3.attention.self.value        | Linear            | 590 K 
66  | model.encoder.layer.3.attention.self.dropout      | Dropout           | 0     
67  | model.encoder.layer.3.attention.output            | BertSelfOutput    | 592 K 
68  | model.encoder.layer.3.attention.output.dense      | Linear            | 590 K 
69  | model.encoder.layer.3.attention.output.LayerNorm  | LayerNorm         | 1 K   
70  | model.encoder.layer.3.attention.output.dropout    | Dropout           | 0     
71  | model.encoder.layer.3.intermediate                | BertIntermediate  | 2 M   
72  | model.encoder.layer.3.intermediate.dense          | Linear            | 2 M   
73  | model.encoder.layer.3.output                      | BertOutput        | 2 M   
74  | model.encoder.layer.3.output.dense                | Linear            | 2 M   
75  | model.encoder.layer.3.output.LayerNorm            | LayerNorm         | 1 K   
76  | model.encoder.layer.3.output.dropout              | Dropout           | 0     
77  | model.encoder.layer.4                             | BertLayer         | 7 M   
78  | model.encoder.layer.4.attention                   | BertAttention     | 2 M   
79  | model.encoder.layer.4.attention.self              | BertSelfAttention | 1 M   
80  | model.encoder.layer.4.attention.self.query        | Linear            | 590 K 
81  | model.encoder.layer.4.attention.self.key          | Linear            | 590 K 
82  | model.encoder.layer.4.attention.self.value        | Linear            | 590 K 
83  | model.encoder.layer.4.attention.self.dropout      | Dropout           | 0     
84  | model.encoder.layer.4.attention.output            | BertSelfOutput    | 592 K 
85  | model.encoder.layer.4.attention.output.dense      | Linear            | 590 K 
86  | model.encoder.layer.4.attention.output.LayerNorm  | LayerNorm         | 1 K   
87  | model.encoder.layer.4.attention.output.dropout    | Dropout           | 0     
88  | model.encoder.layer.4.intermediate                | BertIntermediate  | 2 M   
89  | model.encoder.layer.4.intermediate.dense          | Linear            | 2 M   
90  | model.encoder.layer.4.output                      | BertOutput        | 2 M   
91  | model.encoder.layer.4.output.dense                | Linear            | 2 M   
92  | model.encoder.layer.4.output.LayerNorm            | LayerNorm         | 1 K   
93  | model.encoder.layer.4.output.dropout              | Dropout           | 0     
94  | model.encoder.layer.5                             | BertLayer         | 7 M   
95  | model.encoder.layer.5.attention                   | BertAttention     | 2 M   
96  | model.encoder.layer.5.attention.self              | BertSelfAttention | 1 M   
97  | model.encoder.layer.5.attention.self.query        | Linear            | 590 K 
98  | model.encoder.layer.5.attention.self.key          | Linear            | 590 K 
99  | model.encoder.layer.5.attention.self.value        | Linear            | 590 K 
100 | model.encoder.layer.5.attention.self.dropout      | Dropout           | 0     
101 | model.encoder.layer.5.attention.output            | BertSelfOutput    | 592 K 
102 | model.encoder.layer.5.attention.output.dense      | Linear            | 590 K 
103 | model.encoder.layer.5.attention.output.LayerNorm  | LayerNorm         | 1 K   
104 | model.encoder.layer.5.attention.output.dropout    | Dropout           | 0     
105 | model.encoder.layer.5.intermediate                | BertIntermediate  | 2 M   
106 | model.encoder.layer.5.intermediate.dense          | Linear            | 2 M   
107 | model.encoder.layer.5.output                      | BertOutput        | 2 M   
108 | model.encoder.layer.5.output.dense                | Linear            | 2 M   
109 | model.encoder.layer.5.output.LayerNorm            | LayerNorm         | 1 K   
110 | model.encoder.layer.5.output.dropout              | Dropout           | 0     
111 | model.encoder.layer.6                             | BertLayer         | 7 M   
112 | model.encoder.layer.6.attention                   | BertAttention     | 2 M   
113 | model.encoder.layer.6.attention.self              | BertSelfAttention | 1 M   
114 | model.encoder.layer.6.attention.self.query        | Linear            | 590 K 
115 | model.encoder.layer.6.attention.self.key          | Linear            | 590 K 
116 | model.encoder.layer.6.attention.self.value        | Linear            | 590 K 
117 | model.encoder.layer.6.attention.self.dropout      | Dropout           | 0     
118 | model.encoder.layer.6.attention.output            | BertSelfOutput    | 592 K 
119 | model.encoder.layer.6.attention.output.dense      | Linear            | 590 K 
120 | model.encoder.layer.6.attention.output.LayerNorm  | LayerNorm         | 1 K   
121 | model.encoder.layer.6.attention.output.dropout    | Dropout           | 0     
122 | model.encoder.layer.6.intermediate                | BertIntermediate  | 2 M   
123 | model.encoder.layer.6.intermediate.dense          | Linear            | 2 M   
124 | model.encoder.layer.6.output                      | BertOutput        | 2 M   
125 | model.encoder.layer.6.output.dense                | Linear            | 2 M   
126 | model.encoder.layer.6.output.LayerNorm            | LayerNorm         | 1 K   
127 | model.encoder.layer.6.output.dropout              | Dropout           | 0     
128 | model.encoder.layer.7                             | BertLayer         | 7 M   
129 | model.encoder.layer.7.attention                   | BertAttention     | 2 M   
130 | model.encoder.layer.7.attention.self              | BertSelfAttention | 1 M   
131 | model.encoder.layer.7.attention.self.query        | Linear            | 590 K 
132 | model.encoder.layer.7.attention.self.key          | Linear            | 590 K 
133 | model.encoder.layer.7.attention.self.value        | Linear            | 590 K 
134 | model.encoder.layer.7.attention.self.dropout      | Dropout           | 0     
135 | model.encoder.layer.7.attention.output            | BertSelfOutput    | 592 K 
136 | model.encoder.layer.7.attention.output.dense      | Linear            | 590 K 
137 | model.encoder.layer.7.attention.output.LayerNorm  | LayerNorm         | 1 K   
138 | model.encoder.layer.7.attention.output.dropout    | Dropout           | 0     
139 | model.encoder.layer.7.intermediate                | BertIntermediate  | 2 M   
140 | model.encoder.layer.7.intermediate.dense          | Linear            | 2 M   
141 | model.encoder.layer.7.output                      | BertOutput        | 2 M   
142 | model.encoder.layer.7.output.dense                | Linear            | 2 M   
143 | model.encoder.layer.7.output.LayerNorm            | LayerNorm         | 1 K   
144 | model.encoder.layer.7.output.dropout              | Dropout           | 0     
145 | model.encoder.layer.8                             | BertLayer         | 7 M   
146 | model.encoder.layer.8.attention                   | BertAttention     | 2 M   
147 | model.encoder.layer.8.attention.self              | BertSelfAttention | 1 M   
148 | model.encoder.layer.8.attention.self.query        | Linear            | 590 K 
149 | model.encoder.layer.8.attention.self.key          | Linear            | 590 K 
150 | model.encoder.layer.8.attention.self.value        | Linear            | 590 K 
151 | model.encoder.layer.8.attention.self.dropout      | Dropout           | 0     
152 | model.encoder.layer.8.attention.output            | BertSelfOutput    | 592 K 
153 | model.encoder.layer.8.attention.output.dense      | Linear            | 590 K 
154 | model.encoder.layer.8.attention.output.LayerNorm  | LayerNorm         | 1 K   
155 | model.encoder.layer.8.attention.output.dropout    | Dropout           | 0     
156 | model.encoder.layer.8.intermediate                | BertIntermediate  | 2 M   
157 | model.encoder.layer.8.intermediate.dense          | Linear            | 2 M   
158 | model.encoder.layer.8.output                      | BertOutput        | 2 M   
159 | model.encoder.layer.8.output.dense                | Linear            | 2 M   
160 | model.encoder.layer.8.output.LayerNorm            | LayerNorm         | 1 K   
161 | model.encoder.layer.8.output.dropout              | Dropout           | 0     
162 | model.encoder.layer.9                             | BertLayer         | 7 M   
163 | model.encoder.layer.9.attention                   | BertAttention     | 2 M   
164 | model.encoder.layer.9.attention.self              | BertSelfAttention | 1 M   
165 | model.encoder.layer.9.attention.self.query        | Linear            | 590 K 
166 | model.encoder.layer.9.attention.self.key          | Linear            | 590 K 
167 | model.encoder.layer.9.attention.self.value        | Linear            | 590 K 
168 | model.encoder.layer.9.attention.self.dropout      | Dropout           | 0     
169 | model.encoder.layer.9.attention.output            | BertSelfOutput    | 592 K 
170 | model.encoder.layer.9.attention.output.dense      | Linear            | 590 K 
171 | model.encoder.layer.9.attention.output.LayerNorm  | LayerNorm         | 1 K   
172 | model.encoder.layer.9.attention.output.dropout    | Dropout           | 0     
173 | model.encoder.layer.9.intermediate                | BertIntermediate  | 2 M   
174 | model.encoder.layer.9.intermediate.dense          | Linear            | 2 M   
175 | model.encoder.layer.9.output                      | BertOutput        | 2 M   
176 | model.encoder.layer.9.output.dense                | Linear            | 2 M   
177 | model.encoder.layer.9.output.LayerNorm            | LayerNorm         | 1 K   
178 | model.encoder.layer.9.output.dropout              | Dropout           | 0     
179 | model.encoder.layer.10                            | BertLayer         | 7 M   
180 | model.encoder.layer.10.attention                  | BertAttention     | 2 M   
181 | model.encoder.layer.10.attention.self             | BertSelfAttention | 1 M   
182 | model.encoder.layer.10.attention.self.query       | Linear            | 590 K 
183 | model.encoder.layer.10.attention.self.key         | Linear            | 590 K 
184 | model.encoder.layer.10.attention.self.value       | Linear            | 590 K 
185 | model.encoder.layer.10.attention.self.dropout     | Dropout           | 0     
186 | model.encoder.layer.10.attention.output           | BertSelfOutput    | 592 K 
187 | model.encoder.layer.10.attention.output.dense     | Linear            | 590 K 
188 | model.encoder.layer.10.attention.output.LayerNorm | LayerNorm         | 1 K   
189 | model.encoder.layer.10.attention.output.dropout   | Dropout           | 0     
190 | model.encoder.layer.10.intermediate               | BertIntermediate  | 2 M   
191 | model.encoder.layer.10.intermediate.dense         | Linear            | 2 M   
192 | model.encoder.layer.10.output                     | BertOutput        | 2 M   
193 | model.encoder.layer.10.output.dense               | Linear            | 2 M   
194 | model.encoder.layer.10.output.LayerNorm           | LayerNorm         | 1 K   
195 | model.encoder.layer.10.output.dropout             | Dropout           | 0     
196 | model.encoder.layer.11                            | BertLayer         | 7 M   
197 | model.encoder.layer.11.attention                  | BertAttention     | 2 M   
198 | model.encoder.layer.11.attention.self             | BertSelfAttention | 1 M   
199 | model.encoder.layer.11.attention.self.query       | Linear            | 590 K 
200 | model.encoder.layer.11.attention.self.key         | Linear            | 590 K 
201 | model.encoder.layer.11.attention.self.value       | Linear            | 590 K 
202 | model.encoder.layer.11.attention.self.dropout     | Dropout           | 0     
203 | model.encoder.layer.11.attention.output           | BertSelfOutput    | 592 K 
204 | model.encoder.layer.11.attention.output.dense     | Linear            | 590 K 
205 | model.encoder.layer.11.attention.output.LayerNorm | LayerNorm         | 1 K   
206 | model.encoder.layer.11.attention.output.dropout   | Dropout           | 0     
207 | model.encoder.layer.11.intermediate               | BertIntermediate  | 2 M   
208 | model.encoder.layer.11.intermediate.dense         | Linear            | 2 M   
209 | model.encoder.layer.11.output                     | BertOutput        | 2 M   
210 | model.encoder.layer.11.output.dense               | Linear            | 2 M   
211 | model.encoder.layer.11.output.LayerNorm           | LayerNorm         | 1 K   
212 | model.encoder.layer.11.output.dropout             | Dropout           | 0     
213 | dropout                                           | Dropout           | 0     
214 | classifier                                        | Linear            | 769   
Validation sanity check: 0%
0/5 [00:00<?, ?it/s]
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-22-ab45ae238b6a> in <module>()
     41 args = parser.parse_args(cmd_args)
     42 
---> 43 main(args)

5 frames
/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    106                 raise Exception(
    107                     "process %d terminated with signal %s" %
--> 108                     (error_index, name)
    109                 )
    110             else:

Exception: process 0 terminated with signal SIGKILL

Environment

* CUDA:
    - GPU:
    - available:         False
    - version:           None
* Packages:
    - numpy:             1.18.2
    - pyTorch_debug:     False
    - pyTorch_version:   1.6.0a0+b889e0d
    - pytorch-lightning: 0.7.3
    - tensorboard:       2.2.1
    - tqdm:              4.38.0
* System:
    - OS:                Linux
    - architecture:
        - 64bit
        - 
    - processor:         x86_64
    - python:            3.6.9
    - version:           #1 SMP Wed Feb 19 05:26:34 PST 2020

Additional context

I saw the issue #996 but I don't think it's the issue because my RAM does not appear to be full :

image

bug / fix help wanted

Most helpful comment

That message is almost certainly the Linux kernel OOM killer getting rid of the fattiest process.
You can take a look at this thread where I posted some notebooks to try to overcomes the limited RAM in the Colab/Kaggle notebooks:

https://github.com/pytorch/xla/issues/1870

Unfortunately those are not what our usual way of doing things is, and what Lightning adopts ATM.

All 4 comments

you need the memory crash note at the top of the colab no?
@srush @dlibenzi

That message is almost certainly the Linux kernel OOM killer getting rid of the fattiest process.
You can take a look at this thread where I posted some notebooks to try to overcomes the limited RAM in the Colab/Kaggle notebooks:

https://github.com/pytorch/xla/issues/1870

Unfortunately those are not what our usual way of doing things is, and what Lightning adopts ATM.

I could solve my specific problem by specifying num_worker=1 in my data loader (instead of 4)

What is different colab vs kaggle? I had this problem only with colab...

Was this page helpful?
0 / 5 - 0 ratings