Flair: Any way to parallelize embedding extraction on CPU cores?

Created on 2 Aug 2019 · 2Comments · Source: flairNLP/flair

Hey guys

I tried to extract embeddings out of corpus using GPU but gpu utilisation is around 7% only so I thought there might be a way of getting all cores of CPU to extract embeddings so that this could save me time
I've a corpus of around 3 mil tokens and embeddings are gonna be saved in a python dictionary. the following is how I used flair to get embeddings in a parallel way via python Pool() but its not working. I think each process should have it's own "StackedEmbedding" obj and doing this needs a lot of memory.

import os
from multiprocessing import Pool, Manager
from flair.embeddings import WordEmbeddings
from flair.data import Sentence

def procces_sents(chunks):
    sentences = [Sentence(i) for i in chunks]
    stacked_embeddings.embed(sentences)

    for sent in sentences:
        for token in sent:
            if token in my_dict:
                pass
            else:
                pass

stacked_embeddings = StackedEmbeddings([
                                        WordEmbeddings('glove'), 
                                        PooledFlairEmbeddings('news-forward-fast'), 
                                        PooledFlairEmbeddings('news-backward-fast'),
                                       ])

my_dict=Manager().dict()
my_list["1.txt"]
n=32
filepath = "/some/path/"
for filename in my_list:
    with open(os.path.join(filepath,filename) ,"r") as m:

    all=[]
    data = m.readlines()
    all = [x.strip() for x in all_data]

    all_chunks = [all[i * n:(i + 1) * n] for i in range((len(all) + n - 1) // n )]

    p=Pool()
    p.map(procces_sents, all_chunks)
    p.close()
    p.join()

I expect that each core takes a 32 sized chunk of sentences and process them.

"all" is a list of all sentences

"all_chunks" is chunks of all list (each containing 32 sentences)

question wontfix

Source

ctrboundary

Most helpful comment

Hello @ctrboundary , did you try to parallel your task with Joblib?

from joblib import Parallel, delayed
from tqdm import tqdm_notebook
from multiprocessing import cpu_count

def parallelize(iterable, func):
    return Parallel(n_jobs=cpu_count() - 1, prefer="threads")(delayed(func)(i) for i in tqdm_notebook(iterable))

parallelize(all_chunks, procces_sents)

the prefer="threads" parameter should avoid to have stacked_embeddings object duplicated in memory.

ArmandGiraud on 5 Aug 2019

👍2

All 2 comments

Hello @ctrboundary , did you try to parallel your task with Joblib?

from joblib import Parallel, delayed
from tqdm import tqdm_notebook
from multiprocessing import cpu_count

def parallelize(iterable, func):
    return Parallel(n_jobs=cpu_count() - 1, prefer="threads")(delayed(func)(i) for i in tqdm_notebook(iterable))

parallelize(all_chunks, procces_sents)

the prefer="threads" parameter should avoid to have stacked_embeddings object duplicated in memory.

ArmandGiraud on 5 Aug 2019

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.