Transformers: Fine-tuning pretrained BERT model using own dataset but with same training task

Created on 8 Jan 2020  ·  7Comments  ·  Source: huggingface/transformers

❓ Questions & Help

I would like to finetune a pretrained model using the same task as the original model was trained on, so this means that I want the model to predict masked words and do next sentence prediction. Is there anywhere some code snippet that achieves this or gives an idea on how I can implement this?

wontfix

Most helpful comment

Here is very barebone but working example. It does not have next sentence prediction code but it will work for masked language model:

import numpy as np
import tensorflow as tf
from transformers import *

MODEL = 'distilbert-base-uncased'
model = TFDistilBertForMaskedLM.from_pretrained(MODEL)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL)

sent = tokenizer.encode('people lost their jobs to ai')
sent = np.array([sent])
inpx = sent.copy()
inpx[0][1] = tokenizer.vocab['[MASK]']  # Replace people with mask token

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Try to overfit model for single example
for _ in range(10):
    with tf.GradientTape() as g:
        out, = model(inpx)
        loss_value = loss_object(y_true=sent, y_pred=out)
    gradients = g.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    print(loss_value.numpy())
    print('>', tokenizer.decode(model(inpx)[0].numpy()[0].argmax(-1)))

You will have to handle proper loss masking and other things like warmup etc.

All 7 comments

Here is very barebone but working example. It does not have next sentence prediction code but it will work for masked language model:

import numpy as np
import tensorflow as tf
from transformers import *

MODEL = 'distilbert-base-uncased'
model = TFDistilBertForMaskedLM.from_pretrained(MODEL)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL)

sent = tokenizer.encode('people lost their jobs to ai')
sent = np.array([sent])
inpx = sent.copy()
inpx[0][1] = tokenizer.vocab['[MASK]']  # Replace people with mask token

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Try to overfit model for single example
for _ in range(10):
    with tf.GradientTape() as g:
        out, = model(inpx)
        loss_value = loss_object(y_true=sent, y_pred=out)
    gradients = g.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    print(loss_value.numpy())
    print('>', tokenizer.decode(model(inpx)[0].numpy()[0].argmax(-1)))

You will have to handle proper loss masking and other things like warmup etc.

@stefanknegt I have the same question...Now I am trying to implement this according to the tutorial "Language model fine-tuning" based on run_lm_finetuning.py in https://github.com/huggingface/transformers/blob/master/examples/README.md. Maybe it works......

@JiangYanting 哈哈别的问题里看到过你,老哥考试考完了啊,这模型能直接做NSP和MLM么

@TLCFYBJJHYYSND 哈哈哈幸会!好像进一步pre training还是不行……用run_lm_finetuning.py,照着example里的例子做,还是要报错“ValueError: num_samples should be a positive integeral value, but got num_samples=0”

@JiangYanting 我这一直报这个错,老哥有没有遇到过呀
RuntimeError: CUDA error: device-side assert triggered

@TLCFYBJJHYYSND 这个error倒是没遇到过,不过可以看一看这篇博客,不知有无帮助? https://blog.csdn.net/Geek_of_CSDN/article/details/86527107

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ereday picture ereday  ·  3Comments

0x01h picture 0x01h  ·  3Comments

lcswillems picture lcswillems  ·  3Comments

chuanmingliu picture chuanmingliu  ·  3Comments

siddsach picture siddsach  ·  3Comments