Bert: BERT regression gives me same scores

Created on 28 Feb 2019  Β·  9Comments  Β·  Source: google-research/bert

I am trying to make this switch from classification to regression as disccused here: https://github.com/google-research/bert/issues/74

with BERT, but I basically get the same output no matter what (e.g. I'm trying to predict scores on a range from 1-10, and everything is given 5.5). Does anybody know why this may be happening?

Most helpful comment

Hi, I had the same problem but I finally fix it by freezing all layers except the head layers

All 9 comments

hello,have you sovled it? I have the same problem with you.

try my pr.

try my pr.

老ε“₯θƒ½ε‘Šθ―‰ζˆ‘δΊ§η”ŸθΏ™ζ ·η»“ζžœηš„εŽŸε› ε—οΌŒζ±‚εŠ©

I have the same issue, did any one has a solution?

I am trying to make this switch from classification to regression as disccused here: #74

with BERT, but I basically get the same output no matter what (e.g. I'm trying to predict scores on a range from 1-10, and everything is given 5.5). Does anybody know why this may be happening?

have you solved it ???

try my pr.

老ε“₯θƒ½ε‘Šθ―‰ζˆ‘δΊ§η”ŸθΏ™ζ ·η»“ζžœηš„εŽŸε› ε—οΌŒζ±‚εŠ©

老ε“₯你后ζ₯ζ€ŽδΉˆθ§£ε†³ηš„οΌŸοΌŸ

Hi, I had the same problem but I finally fix it by freezing all layers except the head layers

Run into the same problem, a subtle issue, something to investigate here.
In my case if I fine-tune all layers, predictions are distributed in a very narrow range ~1% of the target distribution span. Freezing BERT embeddings helped a bit but still distribution of predictions is not as wide as those of the target. Finally, I just extracted embeddings and run Ridge regression on top (find this pipeline easier to extend with handcrafted features), and it worked perfect, though I understand that it is equivalent to freezing BERT layers and fine-tuning only the regression head.

I'm trying a token-level regression task and I also found the model quickly collapsing onto a very small range (0.1% of target span) of predictions. Again as @Yorko mentions, freezing all of bert helps expand the range a little (10% of target span). I'll try freezing all layers except the last. I'd love to hear others' relevant experiences :)

Was this page helpful?
0 / 5 - 0 ratings