Transformers: Why cosine similarity of BERT, ALBERT, Robert is so big, almost near 1.0?

Created on 24 Dec 2019  ยท  3Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help


I tried to use bert models to do similarity comparison of words/sentences, but I found that the cosine similarities are all very high, even for very different words/sentences in meaning. Why?

Does all the vector are located in a small portion the vector-space?

wontfix

Most helpful comment

BERT was not designed to produce useful word / sentence embeddings that can be used with cosine similarities. Cosine-similarity treats all dimensions equally which puts high requirements for the created embeddings.

BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257

If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers

All 3 comments

BERT was not designed to produce useful word / sentence embeddings that can be used with cosine similarities. Cosine-similarity treats all dimensions equally which puts high requirements for the created embeddings.

BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257

If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers

@nreimers I have read your paper, it's great and thanks for the answer!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings