Transformers: Why cosine similarity of BERT, ALBERT, Robert is so big, almost near 1.0?

Created on 24 Dec 2019 · 3Comments · Source: huggingface/transformers

❓ Questions & Help

I tried to use bert models to do similarity comparison of words/sentences, but I found that the cosine similarities are all very high, even for very different words/sentences in meaning. Why?

Does all the vector are located in a small portion the vector-space?

wontfix

Source

lowoodz

Most helpful comment

BERT was not designed to produce useful word / sentence embeddings that can be used with cosine similarities. Cosine-similarity treats all dimensions equally which puts high requirements for the created embeddings.

BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257

If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers

nreimers on 27 Dec 2019

👍4

All 3 comments

BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257

If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers

nreimers on 27 Dec 2019

👍4

@nreimers I have read your paper, it's great and thanks for the answer!

lowoodz on 1 Jan 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.