Flair: Stacked Embedding Classification

Created on 13 Jan 2019  路  3Comments  路  Source: flairNLP/flair

hii,

Is there a way to pass a stacked embedding into the classification model ?

My intention is to embed each sentence as a series of concatenated word embeddings rather than a single document embedding.
The error I run into when placing the stacked embedding variable directly into the TextClassfier is:

RuntimeError: size mismatch, m1: [1 x 0], m2: [7268 x 2] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

Is it better, in general, to classify (twitter text) with document embeddings over word embeddings ?

Thank You

question

Most helpful comment

Hello @jewl123 currently you must use a DocumentEmbeddings class over the word stack because this lets the classifier know how to combine the word embeddings for the classification task. So DocumentPoolEmbeddings will simply average them, while DocumentLSTMEmbeddings will train an LSTM over them.

Unfortunately it is not possible to embed sentences as concatenations of all word embeddings because sentences are of different length. So a tweet with 2 words would have a shorter embedding than a tweet with 10 words. For downstream tasks, all embeddings must have the same length. So you need some method of creating a single embedding from sentences with different numbers of words.

Hope this helps!

All 3 comments

Yes, you can embed documents using one of the DocumentEmbeddings classes, either DocumentPoolEmbeddings or DocumentLSTMEmbeddings. Both of these classes take a list of word embeddings as input that they use to create the document embedding.

For instance, if you want to combine FlairEmbeddings and WordEmbeddings as a pooled document embedding you can do this:

from flair.embeddings import WordEmbeddings, CharLMEmbeddings, DocumentPoolEmbeddings, Sentence

# initialize the word embeddings
glove_embedding = WordEmbeddings('glove')
charlm_embedding_forward = CharLMEmbeddings('news-forward')
charlm_embedding_backward = CharLMEmbeddings('news-backward')

# initialize the document embeddings
document_embeddings = DocumentPoolEmbeddings([glove_embedding,
                                              charlm_embedding_backward,
                                              charlm_embedding_forward])

the list of word embeddings you pass can be arbitrarily long, so you could add BERT or ELMo embeddings in there as well.

The document embedding you can then use to train a classifier.

For more info, check out the tutorial.

Hope this helps!

@alanakbik Thank you for the quick response!

Just to clarify my original question; Do I have to use document embeddings to pass to a classification model?

Here is a snippett of my code:

label_dict = corpus.make_label_dictionary()

stacked_embeddings = StackedEmbeddings([
                                        WordEmbeddings('glove'),
                                        FlairEmbeddings('news-forward'),
                                        FlairEmbeddings('news-backward'),
                                        ELMoEmbeddings('original')
                                       ])

classifier = TextClassifier(stacked_embeddings, label_dictionary=label_dict, multi_label=False)

trainer = ModelTrainer(classifier, corpus)

As you can see I would like to use a stacked embedding rather than a Document embedding.

My intention is to pickup and classify a more nuanced representation of the individual words in a sentence(tweet).
I would like the classifier to primarily map the relationship of words to one another and only secondarily the overall document.

Is this a naive idea ?

Thanks again :)

Hello @jewl123 currently you must use a DocumentEmbeddings class over the word stack because this lets the classifier know how to combine the word embeddings for the classification task. So DocumentPoolEmbeddings will simply average them, while DocumentLSTMEmbeddings will train an LSTM over them.

Unfortunately it is not possible to embed sentences as concatenations of all word embeddings because sentences are of different length. So a tweet with 2 words would have a shorter embedding than a tweet with 10 words. For downstream tasks, all embeddings must have the same length. So you need some method of creating a single embedding from sentences with different numbers of words.

Hope this helps!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stefan-it picture stefan-it  路  3Comments

mnishant2 picture mnishant2  路  3Comments

aschmu picture aschmu  路  3Comments

happypanda5 picture happypanda5  路  3Comments

UrszulaCzerwinska picture UrszulaCzerwinska  路  3Comments