Flair: "BrokenPipeError: [Errno 32] Broken pipe" during corpus.make_label_dictionary()

Created on 20 Sep 2019 · 5Comments · Source: flairNLP/flair

While trying to build a sentiment classifier using the sentiment140 corpus link, I encountered an error.
After (successfully?) loading the corpus, flair throws an 'BrokenPipeError' during the creation of the label dictionary.
I am running Python3.6 and flair0.4.3 within windows10.

The code:

from flair.data import Corpus
from flair.datasets import CSVClassificationCorpus

data_folder = './twitterData'

column_name_map = {5: "text", 0: "label_topic"}

corpus: Corpus = CSVClassificationCorpus(data_folder,
                                         column_name_map,
                                         skip_header=False,
                                         delimiter=',',   
) 

label_dict = corpus.make_label_dictionary()

The output:

2019-09-20 10:14:09,107 Reading data from twitterData
2019-09-20 10:14:09,107 Train: twitterDatatraining.1600000.processed.noemoticon.csv
2019-09-20 10:14:09,107 Dev: None
2019-09-20 10:14:09,107 Test: twitterDatatestdata.manual.2009.06.14.csv
2019-09-20 10:14:18,221 Computing label dictionary. Progress:
Traceback (most recent call last):

File "", line 14, in
label_dict = corpus.make_label_dictionary()

File "C:Anaconda3libsite-packagesflairdata.py", line 948, in make_label_dictionary
for batch in Tqdm.tqdm(iter(loader)):

File "C:Anaconda3libsite-packagestorchutilsdatadataloader.py", line 278, in __iter__
return _MultiProcessingDataLoaderIter(self)

File "C:Anaconda3libsite-packagestorchutilsdatadataloader.py", line 682, in __init__
w.start()

File "C:Anaconda3libmultiprocessingprocess.py", line 112, in start
self._popen = self._Popen(self)

File "C:Anaconda3libmultiprocessingcontext.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "C:Anaconda3libmultiprocessingcontext.py", line 322, in _Popen
return Popen(process_obj)

File "C:Anaconda3libmultiprocessingpopen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)

File "C:Anaconda3libmultiprocessingreduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

bug wontfix

Source

stelehm

Most helpful comment

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

The problem resides in the fact that Windows multiprocessing uses spawn instead of fork (there is no fork mechanism in Windows) and that seems to be incompatible with the current implementation of the dataloader. You can test it yourself in Linux, if you force multiprocessing to use spawn it will fail too:

To reproduce in Linux add this inside your __main__ guard.

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn')
except RuntimeError:
    pass

mrbungie on 21 Sep 2019

👍2

All 5 comments

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

eurekaqq on 20 Sep 2019

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

To reproduce in Linux add this inside your __main__ guard.

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn')
except RuntimeError:
    pass

mrbungie on 21 Sep 2019

👍2

Thanks for reporting this. @mrbungie I am trying to reproduce the error my ubuntu setup, but everything seems to work, including with setting the start method to 'spawn'. Could you paste a minimal code example to reproduce the error?

alanakbik on 23 Sep 2019

I found this doc and i think it might be the issue and help solve it:

https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

`import torch

def main()
for i, data in enumerate(dataloader):
# do something here

if __name__ == '__main__':
main()`

Im not sure where to add the code though.

If anybody can help, Thank You

jewl123 on 9 Dec 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.