Bert: what is the synthetic self-training

Created on 9 Mar 2019 · 8Comments · Source: google-research/bert

I noticed that the best model for squad2.0 uses bert + synthetic self-training trick, but i can't find any description about this. Could anyone show me how this trick work?
Thanks!

Source

lxylxyoo

👍7 🎉2

Most helpful comment

Here is the answer, from page 26

https://nlp.stanford.edu/seminar/details/jdevlin.pdf?fbclid=IwAR2TBFCJOeZ9cGhxB-z5cJJ17vHN4W25oWsjI8NqJoTEmlYIYEKG7oh4tlY

f-dx on 15 Mar 2019

👍6 🚀2 🎉2 😄1

All 8 comments

I have also been wondering about the same,I am not able to find any further details about these methods can be used along with BERT .
Any details on this will be very helpful.

Vibha111094 on 9 Mar 2019

👍3

same question

ZhuoranLyu on 11 Mar 2019

👍3

Here is the answer, from page 26

https://nlp.stanford.edu/seminar/details/jdevlin.pdf?fbclid=IwAR2TBFCJOeZ9cGhxB-z5cJJ17vHN4W25oWsjI8NqJoTEmlYIYEKG7oh4tlY

f-dx on 15 Mar 2019

👍6 🚀2 🎉2 😄1

Here is the answer, from page 26

https://nlp.stanford.edu/seminar/details/jdevlin.pdf?fbclid=IwAR2TBFCJOeZ9cGhxB-z5cJJ17vHN4W25oWsjI8NqJoTEmlYIYEKG7oh4tlY

Thank you so much for your share !!!

lxylxyoo on 15 Mar 2019

So how about N-Gram Masking? Does anyone know?
Even Single Model beats the others lol

ecchochan on 18 Mar 2019

So how about N-Gram Masking? Does anyone know?
Even Single Model beats the others lol

Baidu recently open sourced a model called ERNIE, which masked n-gram instead of 1-gram during pre-training. I think this is similiar to the bert + N-Gram Masking.
https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE

lxylxyoo on 19 Mar 2019

👍7 ❤2

Is synthetic self training in this repo?

Dogy06 on 30 Mar 2019

So how about N-Gram Masking? Does anyone know?
Even Single Model beats the others lol

Baidu recently open sourced a model called ERNIE, which masked n-gram instead of 1-gram during pre-training. I think this is similiar to the bert + N-Gram Masking.
https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE

What is N-Gram Masking? Anyone care to translate the readme in Baidu's repo to English?