i trained my own model using flair embedding, and i'm used DocumentPoolEmbeddings in order to calculate similarity between 2 sentences.
How can i use ELMO and Bert from flair to resolve sentence similarity task ?
@fatimabs
Calculate embeddings for each sentence and then use cosine similarity to calculate distance between them.
https://pytorch.org/docs/stable/_modules/torch/nn/modules/distance.html
This approach is unsupervised and will give you the similarity of sentences based on the average of the word embeddings of each sentence. As @krzynio writes you can then use a cosine distance over the embedding vectors to get a similarity.
Another way to do this unsupervised would be to check out word mover's distance for word embeddings. We haven't implemented this in Flair but there are libraries out there that do this for you. I could imagine that this approach works better than simple averaging of word vectors.
Yet another way would be to learn semantic similarity in a supervised way with trainig data, for instance to address a question-answering task. We are in the process of preparing a new module for Flair that will enable users to learn similarity between embeddings and will hopefully be able to contribute this in the near future.
@alanakbik Thanks for your response, so what is the difference between the method based on average of the word embeddings of each sentence and average of the word embeddings of each sentencebased on word2vec ?
@alanakbik i have a problem when i try a new traning i got this error :
IndexError Traceback (most recent call last)
3 sequence_length=250,
4 mini_batch_size=100,
----> 5 max_epochs=10)
1 frames
/usr/local/lib/python3.6/dist-packages/flair/models/language_model.py in generate_text(self, prefix, number_of_characters, temperature, break_on_suffix)
313
314 # print(word_idx)
--> 315 prob = decoder_output[word_idx]
316 log_prob += prob
317
IndexError: too many indices for tensor of dimension 0
PLZ how can resolve it ?
i trained my own model using flair embedding, and i'm used DocumentPoolEmbeddings in order to calculate similarity between 2 sentences.
How can i use ELMO and Bert from flair to resolve sentence similarity task ?
I've used :
glove_embedding = WordEmbeddings('ar')
flair_embedding_forward = FlairEmbeddings('ar-forward')
flair_embedding_backward = FlairEmbeddings('ar-backward')
document_embeddings = DocumentPoolEmbeddings([glove_embedding,
flair_embedding_backward,
flair_embedding_forward] )
and its work with me but I need to fine tune the word embedding model , could you please share your code
@khaledrefai thanks a lot for your reply, i'm doing the same thing, but i don't have a very interesting result . Can @alanakbik gives help to fine tune the word embedding model ?
@khaledrefai Hi,
can you PLZ help me to fine tune the model ?
@fatimabs
Sure,
import gensim
from flair.embeddings import StackedEmbeddings, WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence ,DocumentRNNEmbeddings
glove_embedding = WordEmbeddings('ar')
flair_embedding_forward = FlairEmbeddings('/content/gdrive/My Drive/AI/best-lm.pt')
flair_embedding_backward = FlairEmbeddings('/content/gdrive/My Drive/AI/back/best-lm.pt')
document_embeddings = DocumentPoolEmbeddings([glove_embedding,
flair_embedding_backward,
flair_embedding_forward] )
query = Sentence(str)
# embed everything
documentembeddings.embed(query)
you can contact me
khaled.[email protected]
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
@khaledrefai thanks a lot for your reply, i'm doing the same thing, but i don't have a very interesting result . Can @alanakbik gives help to fine tune the word embedding model ?