Flair: "BrokenPipeError: [Errno 32] Broken pipe" during corpus.make_label_dictionary()

Created on 20 Sep 2019  路  5Comments  路  Source: flairNLP/flair

While trying to build a sentiment classifier using the sentiment140 corpus link, I encountered an error.
After (successfully?) loading the corpus, flair throws an 'BrokenPipeError' during the creation of the label dictionary.
I am running Python3.6 and flair0.4.3 within windows10.

The code:

from flair.data import Corpus
from flair.datasets import CSVClassificationCorpus

data_folder = './twitterData'

column_name_map = {5: "text", 0: "label_topic"}

corpus: Corpus = CSVClassificationCorpus(data_folder,
                                         column_name_map,
                                         skip_header=False,
                                         delimiter=',',   
) 

label_dict = corpus.make_label_dictionary()

The output:

2019-09-20 10:14:09,107 Reading data from twitterData
2019-09-20 10:14:09,107 Train: twitterDatatraining.1600000.processed.noemoticon.csv
2019-09-20 10:14:09,107 Dev: None
2019-09-20 10:14:09,107 Test: twitterDatatestdata.manual.2009.06.14.csv
2019-09-20 10:14:18,221 Computing label dictionary. Progress:
Traceback (most recent call last):

File "", line 14, in
label_dict = corpus.make_label_dictionary()

File "C:Anaconda3libsite-packagesflairdata.py", line 948, in make_label_dictionary
for batch in Tqdm.tqdm(iter(loader)):

File "C:Anaconda3libsite-packagestorchutilsdatadataloader.py", line 278, in __iter__
return _MultiProcessingDataLoaderIter(self)

File "C:Anaconda3libsite-packagestorchutilsdatadataloader.py", line 682, in __init__
w.start()

File "C:Anaconda3libmultiprocessingprocess.py", line 112, in start
self._popen = self._Popen(self)

File "C:Anaconda3libmultiprocessingcontext.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "C:Anaconda3libmultiprocessingcontext.py", line 322, in _Popen
return Popen(process_obj)

File "C:Anaconda3libmultiprocessingpopen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)

File "C:Anaconda3libmultiprocessingreduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

bug wontfix

Most helpful comment

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

The problem resides in the fact that Windows multiprocessing uses spawn instead of fork (there is no fork mechanism in Windows) and that seems to be incompatible with the current implementation of the dataloader. You can test it yourself in Linux, if you force multiprocessing to use spawn it will fail too:

To reproduce in Linux add this inside your __main__ guard.

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn')
except RuntimeError:
    pass

All 5 comments

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

In linux with python 3.6 and the latest flair, it works well.
Maybe you can try to clone the latest flair from github.

The problem resides in the fact that Windows multiprocessing uses spawn instead of fork (there is no fork mechanism in Windows) and that seems to be incompatible with the current implementation of the dataloader. You can test it yourself in Linux, if you force multiprocessing to use spawn it will fail too:

To reproduce in Linux add this inside your __main__ guard.

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn')
except RuntimeError:
    pass

Thanks for reporting this. @mrbungie I am trying to reproduce the error my ubuntu setup, but everything seems to work, including with setting the start method to 'spawn'. Could you paste a minimal code example to reproduce the error?

I found this doc and i think it might be the issue and help solve it:

https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

`import torch

def main()
for i, data in enumerate(dataloader):
# do something here

if __name__ == '__main__':
main()`

Im not sure where to add the code though.

If anybody can help, Thank You

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

happypanda5 picture happypanda5  路  3Comments

davidsbatista picture davidsbatista  路  3Comments

mnishant2 picture mnishant2  路  3Comments

alanakbik picture alanakbik  路  3Comments

ciaochiaociao picture ciaochiaociao  路  3Comments