Bert: fine-tuned for a document task

Created on 12 Nov 2018 · 9Comments · Source: google-research/bert

How do I use BERT to classify a document which contains several sentences? label(0, 1)
I think I need to use BERT encode every sentence and use LSTM or RNN to generate a article hidden state, then use the article hidden state to classify.
Any better ideas?

Source

Johnzdh

Most helpful comment

@daemon Your paper does not answer the above problem at all. You even argue the opposite of the problem here, that 512 tokens is more than enough for document classification. The exciting (apparently open) problem is how to utilize the ideas of BERT with token sets that are significantly longer than 512 tokens (documents).

msta on 9 May 2019

👍3

All 9 comments

To classify a document just feed in the entire document to BERT (i.e., treat all of the concatenated sentences as "Segment A"). You should be able to just write your own DataProcessor in run_classifier.py and train the model without changing and TensorFlow code. So just set text_a to your document text and set text_b to None. You probably will want to set max_seq_length to a a longer value depending on the length of your documents (up to 512).

jacobdevlin-google on 12 Nov 2018

To classify a document just feed in the entire document to BERT (i.e., treat all of the concatenated sentences as "Segment A"). You should be able to just write your own DataProcessor in run_classifier.py and train the model without changing and TensorFlow code. So just set text_a to your document text and set text_b to None. You probably will want to set max_seq_length to a a longer value depending on the length of your documents (up to 512).

But what if most of my documents are longer than 512?

Johnzdh on 15 Nov 2018

I have the same question here. From what I read and understood there is no way to feed documents longer than 512 and do classfication. It needs to do some other data processing so that max length <= 512. Right?

wayfarerjing on 10 Dec 2018

We just released a preprint that describes BERT for document classification. If you guys are interested, the codebase is here.

daemon on 18 Apr 2019

❤1

msta on 9 May 2019

👍3

How about preprocessing document and generate Document summary and use the summary to fine tune bert. Just a thought.

sandeeppilania on 15 May 2019

@msta

Your paper does not answer the above problem at all.

My reply was a shameless plug. I never claimed to solve the problem of handling documents longer than 512.

The exciting (apparently open) problem is how to utilize the ideas of BERT with token sets that are significantly longer than 512 tokens (documents).

To my knowledge, that's indeed open. I'm not sure it matters a lot for document classification, though, but it's worth exploring.

daemon on 18 Jun 2019

What would be wrong with feeding each sentence or paragraph into BERT and then running a classifier on the pooled output of the document (or even a CNN over the concatenated BERT output of the document)? I can think of complications for back-propagation but would it be feasible?

Trying to use BERT to build a document classifier at the moment.