Transformers: Fine-tuning pretrained BERT model using own dataset but with same training task

Created on 8 Jan 2020 · 7Comments · Source: huggingface/transformers

❓ Questions & Help

I would like to finetune a pretrained model using the same task as the original model was trained on, so this means that I want the model to predict masked words and do next sentence prediction. Is there anywhere some code snippet that achieves this or gives an idea on how I can implement this?

wontfix

Source

stefanknegt

👍1

Most helpful comment

Here is very barebone but working example. It does not have next sentence prediction code but it will work for masked language model:

import numpy as np
import tensorflow as tf
from transformers import *

MODEL = 'distilbert-base-uncased'
model = TFDistilBertForMaskedLM.from_pretrained(MODEL)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL)

sent = tokenizer.encode('people lost their jobs to ai')
sent = np.array([sent])
inpx = sent.copy()
inpx[0][1] = tokenizer.vocab['[MASK]']  # Replace people with mask token

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Try to overfit model for single example
for _ in range(10):
    with tf.GradientTape() as g:
        out, = model(inpx)
        loss_value = loss_object(y_true=sent, y_pred=out)
    gradients = g.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    print(loss_value.numpy())
    print('>', tokenizer.decode(model(inpx)[0].numpy()[0].argmax(-1)))

You will have to handle proper loss masking and other things like warmup etc.

NaxAlpha on 8 Jan 2020

👍5

All 7 comments

Here is very barebone but working example. It does not have next sentence prediction code but it will work for masked language model:

import numpy as np
import tensorflow as tf
from transformers import *

MODEL = 'distilbert-base-uncased'
model = TFDistilBertForMaskedLM.from_pretrained(MODEL)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL)

sent = tokenizer.encode('people lost their jobs to ai')
sent = np.array([sent])
inpx = sent.copy()
inpx[0][1] = tokenizer.vocab['[MASK]']  # Replace people with mask token

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Try to overfit model for single example
for _ in range(10):
    with tf.GradientTape() as g:
        out, = model(inpx)
        loss_value = loss_object(y_true=sent, y_pred=out)
    gradients = g.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    print(loss_value.numpy())
    print('>', tokenizer.decode(model(inpx)[0].numpy()[0].argmax(-1)))

You will have to handle proper loss masking and other things like warmup etc.

NaxAlpha on 8 Jan 2020

👍5

@stefanknegt I have the same question...Now I am trying to implement this according to the tutorial "Language model fine-tuning" based on run_lm_finetuning.py in https://github.com/huggingface/transformers/blob/master/examples/README.md. Maybe it works......

JiangYanting on 9 Jan 2020

@JiangYanting 哈哈别的问题里看到过你，老哥考试考完了啊，这模型能直接做NSP和MLM么

TLCFYBJJHYYSND on 10 Jan 2020

@TLCFYBJJHYYSND 哈哈哈幸会！好像进一步pre training还是不行……用run_lm_finetuning.py，照着example里的例子做，还是要报错“ValueError: num_samples should be a positive integeral value, but got num_samples=0”

JiangYanting on 11 Jan 2020

@JiangYanting 我这一直报这个错，老哥有没有遇到过呀
RuntimeError: CUDA error: device-side assert triggered

TLCFYBJJHYYSND on 15 Jan 2020

@TLCFYBJJHYYSND 这个error倒是没遇到过，不过可以看一看这篇博客，不知有无帮助？ https://blog.csdn.net/Geek_of_CSDN/article/details/86527107

JiangYanting on 15 Jan 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.