Flair: sentence similarity with flair embeddings

Created on 14 Jul 2019 · 9Comments · Source: flairNLP/flair

i trained my own model using flair embedding, and i'm used DocumentPoolEmbeddings in order to calculate similarity between 2 sentences.
How can i use ELMO and Bert from flair to resolve sentence similarity task ?

question wontfix

Source

fatimabs

Most helpful comment

@khaledrefai thanks a lot for your reply, i'm doing the same thing, but i don't have a very interesting result . Can @alanakbik gives help to fine tune the word embedding model ?

fatimabs on 22 Jul 2019

👍2

All 9 comments

@fatimabs

Calculate embeddings for each sentence and then use cosine similarity to calculate distance between them.

https://pytorch.org/docs/stable/_modules/torch/nn/modules/distance.html

krzynio on 16 Jul 2019

👍1

This approach is unsupervised and will give you the similarity of sentences based on the average of the word embeddings of each sentence. As @krzynio writes you can then use a cosine distance over the embedding vectors to get a similarity.

Another way to do this unsupervised would be to check out word mover's distance for word embeddings. We haven't implemented this in Flair but there are libraries out there that do this for you. I could imagine that this approach works better than simple averaging of word vectors.

Yet another way would be to learn semantic similarity in a supervised way with trainig data, for instance to address a question-answering task. We are in the process of preparing a new module for Flair that will enable users to learn similarity between embeddings and will hopefully be able to contribute this in the near future.

alanakbik on 16 Jul 2019

@alanakbik Thanks for your response, so what is the difference between the method based on average of the word embeddings of each sentence and average of the word embeddings of each sentencebased on word2vec ?

fatimabs on 16 Jul 2019

@alanakbik i have a problem when i try a new traning i got this error :

IndexError Traceback (most recent call last)

in ()
3 sequence_length=250,
4 mini_batch_size=100,
----> 5 max_epochs=10)

1 frames

/usr/local/lib/python3.6/dist-packages/flair/models/language_model.py in generate_text(self, prefix, number_of_characters, temperature, break_on_suffix)
313
314 # print(word_idx)
--> 315 prob = decoder_output[word_idx]
316 log_prob += prob
317

IndexError: too many indices for tensor of dimension 0

PLZ how can resolve it ?

fatimabs on 16 Jul 2019

i trained my own model using flair embedding, and i'm used DocumentPoolEmbeddings in order to calculate similarity between 2 sentences.
How can i use ELMO and Bert from flair to resolve sentence similarity task ?

I've used :

initialize the word embeddings

glove_embedding = WordEmbeddings('ar')
flair_embedding_forward = FlairEmbeddings('ar-forward')
flair_embedding_backward = FlairEmbeddings('ar-backward')

initialize the document embeddings, mode = mean

document_embeddings = DocumentPoolEmbeddings([glove_embedding,
flair_embedding_backward,
flair_embedding_forward] )

and its work with me but I need to fine tune the word embedding model , could you please share your code

khaledrefai on 22 Jul 2019

@khaledrefai thanks a lot for your reply, i'm doing the same thing, but i don't have a very interesting result . Can @alanakbik gives help to fine tune the word embedding model ?

fatimabs on 22 Jul 2019

👍2

@khaledrefai Hi,
can you PLZ help me to fine tune the model ?

fatimabs on 25 Dec 2019

@fatimabs
Sure,
import gensim

from flair.embeddings import StackedEmbeddings, WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence ,DocumentRNNEmbeddings

initialize the word embeddings

glove_embedding = WordEmbeddings('ar')
flair_embedding_forward = FlairEmbeddings('/content/gdrive/My Drive/AI/best-lm.pt')
flair_embedding_backward = FlairEmbeddings('/content/gdrive/My Drive/AI/back/best-lm.pt')

initialize the document embeddings, mode = mean

document_embeddings = DocumentPoolEmbeddings([glove_embedding,
flair_embedding_backward,
flair_embedding_forward] )
query = Sentence(str)
# embed everything
documentembeddings.embed(query)

you can contact me
khaled.[email protected]

khaledrefai on 27 Dec 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.