Bert: BERT regression gives me same scores

Created on 28 Feb 2019 · 9Comments · Source: google-research/bert

I am trying to make this switch from classification to regression as disccused here: https://github.com/google-research/bert/issues/74

with BERT, but I basically get the same output no matter what (e.g. I'm trying to predict scores on a range from 1-10, and everything is given 5.5). Does anybody know why this may be happening?

Source

xaviergonzalez

👍8

Most helpful comment

Hi, I had the same problem but I finally fix it by freezing all layers except the head layers

amlarraz on 6 Apr 2020

👍2

All 9 comments

hello,have you sovled it? I have the same problem with you.

fudanchenjiahao on 15 Mar 2019

try my pr.

fancyerii on 20 Mar 2019

try my pr.

老哥能告诉我产生这样结果的原因吗，求助

fudanchenjiahao on 21 Mar 2019

👍1

I have the same issue, did any one has a solution?

FrankICT on 24 Jun 2019

I am trying to make this switch from classification to regression as disccused here: #74

with BERT, but I basically get the same output no matter what (e.g. I'm trying to predict scores on a range from 1-10, and everything is given 5.5). Does anybody know why this may be happening?

have you solved it ???

XuJianzhi on 24 Feb 2020

try my pr.

老哥能告诉我产生这样结果的原因吗，求助

老哥你后来怎么解决的？？

XuJianzhi on 24 Feb 2020

Hi, I had the same problem but I finally fix it by freezing all layers except the head layers

amlarraz on 6 Apr 2020

👍2

Run into the same problem, a subtle issue, something to investigate here.
In my case if I fine-tune all layers, predictions are distributed in a very narrow range ~1% of the target distribution span. Freezing BERT embeddings helped a bit but still distribution of predictions is not as wide as those of the target. Finally, I just extracted embeddings and run Ridge regression on top (find this pipeline easier to extend with handcrafted features), and it worked perfect, though I understand that it is equivalent to freezing BERT layers and fine-tuning only the regression head.

Yorko on 12 May 2020

👍1

I'm trying a token-level regression task and I also found the model quickly collapsing onto a very small range (0.1% of target span) of predictions. Again as @Yorko mentions, freezing all of bert helps expand the range a little (10% of target span). I'll try freezing all layers except the last. I'd love to hear others' relevant experiences :)