I tried to use bert models to do similarity comparison of words/sentences, but I found that the cosine similarities are all very high, even for very different words/sentences in meaning. Why?
Does all the vector are located in a small portion the vector-space?
BERT was not designed to produce useful word / sentence embeddings that can be used with cosine similarities. Cosine-similarity treats all dimensions equally which puts high requirements for the created embeddings.
BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257
If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers
@nreimers I have read your paper, it's great and thanks for the answer!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
BERT was not designed to produce useful word / sentence embeddings that can be used with cosine similarities. Cosine-similarity treats all dimensions equally which puts high requirements for the created embeddings.
BERT as not intended for this. See this post by Jacob Devlin:
https://github.com/UKPLab/sentence-transformers/issues/80#issuecomment-565388257
If you want to use BERT with cosine similarities, you need to fine-tune it on suitable data. You can find data, code and examples in our repository:
https://github.com/UKPLab/sentence-transformers