Bert: BERT for text summarization

Created on 11 Jan 2019 · 44Comments · Source: google-research/bert

BERT is designed to solve 11 NLP problems. Which includes text summarization.

Is there any example how can we use BERT for summarizing a document? An approach would do and and example code would be really great.

Thanks in advance

Source

ghost

👍42 🚀2 👀1 🎉1

Most helpful comment

Please see our paper using BERT for both extractive and abstractive summarization

https://arxiv.org/abs/1908.08345

With code and models released at https://github.com/nlpyang/PreSumm

nlpyang on 1 Sep 2019

🎉8 👍5 ❤2 🚀1

All 44 comments

I am also interested in seeking a reply to the above question
https://github.com/google-research/bert/issues/352#issue-398233998
Kindly do reply
Thanks

makamkkumar on 13 Jan 2019

Are these 11 tasks listed here ? :
https://ai.googleblog.com/2019/01/looking-back-at-googles-research.html?m=1

ghost on 23 Jan 2019

I know the eleven tasks but wanted to know if anyone has used this for abstractive text summarization?

makamkkumar on 26 Jan 2019

I did extractive summarization. After getting embedding I clustered them
and took 1 sentence from each cluster.
What are steps of abstractive summarization? Let me give a try..
On Jan 26, 2019 12:25 PM, "makamkkumar" notifications@github.com wrote:

I know the eleven tasks but wanted to know if anyone has used this for
abstractive text summarization?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/google-research/bert/issues/352#issuecomment-457808234,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AeIVIPSFlUFGu3ZJtMO2V2jZD5_iLuBIks5vG_vTgaJpZM4Z7N9Q
.

ghost on 26 Jan 2019

👍2

https://github.com/santhoshkolloju/bert_summ
I have replaced the Encoder part with Bert and kept the transformer decoder as it is . let me know if it helps

santhoshkolloju on 31 Jan 2019

👍4

I think you are using Google.colab.
I want to run the same on a local machine which is having P4000 GPU with 8GB RAM it is modest but I think suffices my requirements. However i am unable to run it here.
Can you tell me how to do a work around.
Thanks in advance

makamkkumar on 1 Feb 2019

What is the error you get.. By default texar places all the tensors on gpu

santhoshkolloju on 1 Feb 2019

### While running this block i.e. the last block

_#tx.utils.maybe_create_dir(model_dir)

logging_file = os.path.join(model_dir, 'logging.txt')

model_dir = "gs://bert_summ/models/"uncased_L-12_H-768_A-12/bert_model.ckpt
logging_file= "logging.txt"
logger = utils.get_logger(logging_file)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run(tf.tables_initializer())

smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)

if run_mode == 'train_and_evaluate':
    logger.info('Begin running with train_and_evaluate mode')

    if tf.train.latest_checkpoint(model_dir) is not None:
        logger.info('Restore latest checkpoint in %s' % model_dir)
        saver.restore(sess, tf.train.latest_checkpoint(model_dir))

    iterator.initialize_dataset(sess)

    step = 5000
    for epoch in range(max_train_epoch):
      iterator.restart_dataset(sess, 'train')
      step = _train_epoch(sess, epoch, step, smry_writer)

elif run_mode == 'test':
    logger.info('Begin running with test mode')

    logger.info('Restore latest checkpoint in %s' % model_dir)
    saver.restore(sess, tf.train.latest_checkpoint(model_dir))

    _eval_epoch(sess, 0, mode='test')

else:
    raise ValueError('Unknown mode: {}'.format(run_mode))_

### The error I am getting is:-

PermissionDeniedError Traceback (most recent call last)
in
10 sess.run(tf.tables_initializer())
11
---> 12 smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)
13
14 if run_mode == 'train_and_evaluate':

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py in __init__(self, logdir, graph, max_queue, flush_secs, graph_def, filename_suffix)
350
351 event_writer = EventFileWriter(logdir, max_queue, flush_secs,
--> 352 filename_suffix)
353 super(FileWriter, self).__init__(event_writer, graph, graph_def)
354

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py in __init__(self, logdir, max_queue, flush_secs, filename_suffix)
65 self._logdir = logdir
66 if not gfile.IsDirectory(self._logdir):
---> 67 gfile.MakeDirs(self._logdir)
68 self._event_queue = six.moves.queue.Queue(max_queue)
69 self._ev_writer = pywrap_tensorflow.EventsWriter(

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py in recursive_create_dir(dirname)
372 """
373 with errors.raise_exception_on_not_ok_status() as status:
--> 374 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
375
376

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
517 None, None,
518 compat.as_text(c_api.TF_Message(self.status.status)),
--> 519 c_api.TF_GetCode(self.status.status))
520 # Delete the underlying status object from memory otherwise it stays alive
521 # as there is a reference to status from this from the traceback due to

PermissionDeniedError: Error executing an HTTP request (HTTP response code 401, error code 0, error message ''), response '{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/.",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/."
}
}
'
when reading metadata of gs://bert_summ/models/

makamkkumar on 2 Feb 2019

👍1

I received the same error.

CapitalZe on 2 Feb 2019

the problem is i am writing it to my google cloud platform which you will not have access please change the location to your local filesystem (all gs: file paths with your local paths)

santhoshkolloju on 2 Feb 2019

👍1

https://github.com/santhoshkolloju/bert_summ
I have replaced the Encoder part with Bert and kept the transformer decoder as it is . let me know if it helps

Do you have any examples of generated summaries?

adelesiitova on 7 Feb 2019

👍5

I cannot share the results its my own data. But I have good results it was able to copy rare words as well. Initially I tried fine tuning both encoder(bert) and decoder both because of which Bert weights got disturbed. The. I freezed the weights of Bert and just trained the decoder part.
It was giving much readable and grammatically correct sentences.

santhoshkolloju on 12 Feb 2019

@santhoshkolloju, Can you share your experience?
When I use your code, 'hypotheses' always have same value on every references.
for example,
references: ['do', 'n', "'", 't', 'wear', 'rings', 'when', 'working', 'on', 'engine', 'internal', '##s', '.', '[PAD]', '[PAD]', '[PAD]', ...]
hypotheses: ['do', 'n', "'", 't', 'try', 'to', 'do', 'n', "'", 't', 'mix', '.', '', '', '', ...]
references: ['broke', 'the', 'elevators', 'at', 'work', ',', 'basically', 'shot', 'myself', 'in', 'the', 'foot', 'in', 'doing', 'so', 'because', 'all', 'our', 'heavy', 'shit', 'is', 'downstairs', '.', '[PAD]', '[PAD]', '[PAD]', ...]
hypotheses: ['do', 'n', "'", 't', 'try', 'to', 'do', 'n', "'", 't', 'mix', '.', '', '', '', ...]

I used tifu dataset suggested from "Abstractive Summarization of Reddit Posts with Multi-level Memory Networks" paper.

akakakakakaa on 19 Feb 2019

There was a problem.. Freeze the Bert weights and run again
tf. get_trainable_variables()
And exclude all the variables which starts with "bert" then pass non Bert variables to optimizer

santhoshkolloju on 19 Feb 2019

@santhoshkolloju Sorry for my question...
I tried hard freezing but I do not know what to do based on this codes..
Are you suggest how to freeze?
I tried to fix run_pretraining.py using export_savedmodel and removed all tpu related code.
So I create saved_model.pb file. But, loading pb file is failed..

akakakakakaa on 19 Feb 2019

In the notebook I shared replace this line code like shown below and run again it should work.
allvars = tf get_trainable_variables()
nonBert =[v for v in allvars if 'bert' not in v]

train_op = tx.core.get_train_op(
mle_loss,
learning_rate=learning_rate,
variables=non Bert,
global_step=global_step,
hparams=opt)

santhoshkolloju on 19 Feb 2019

Thank you for your advice. I finally trained. Freezing bert encoder makes much readable and grammatically correct sentences. But Still cannot summarize well :'(.. maybe we need more technic like Pointer Generator ,Bottom-Top Summarization... etc :)

akakakakakaa on 22 Feb 2019

In my case my data is some what easy one. It was not generating the sentences as it is but it is rephrasing which gives same meaning.
Try training for more iterations or passing the entity information to the model

santhoshkolloju on 22 Feb 2019

Pointer generator is to be used when you have unknowns in the data with the subword tokenization hardly there are unknowns.

But let me know if you were able to improve on this

santhoshkolloju on 22 Feb 2019

👍1

I am looking to use BERT model for abstractive text summarization, I checked out @santhoshkolloju code, will run and see, however, it would be really helpful if someone could guide me to articles/papers/resources/ code for abstractive summarization with BERT.

aSquare14 on 2 Apr 2019

❤1

I am looking to use BERT model for abstractive text summarization, I checked out @santhoshkolloju code, will run and see, however, it would be really helpful if someone could guide me to articles/papers/resources/ code for abstractive summarization with BERT.

Check out this paper: https://arxiv.org/pdf/1902.09243.pdf

They still haven't released their code yet, but I'm currently working on reimplementing it in PyTorch and will make the code public once I'm done with it.

ajamjoom on 2 Apr 2019

👍9 🎉2

I am looking to use BERT model for abstractive text summarization, I checked out @santhoshkolloju code, will run and see, however, it would be really helpful if someone could guide me to articles/papers/resources/ code for abstractive summarization with BERT.

Check out this paper: https://arxiv.org/pdf/1902.09243.pdf

They still haven't released their code yet, but I'm currently working on reimplementing it in PyTorch and will make the code public once I'm done with it.

I would be, to put it mildly, extremely interested in this!

HenryDashwood on 15 Apr 2019

👍8

I tried the summarization on some wiki articles. Splited the text into sentences then averaged the CLS vectors from each sentence to get the "whole text CLS vec", then just picked a few sentences that were most similar to the whole text CLS (cosine similarity). Results were interesting, but not good enough for something serious (too simple and vague i guess).

NameBrez on 16 Apr 2019

For those interested, looks like we have an implementation! https://github.com/nayeon7lee/bert-summarization

HenryDashwood on 18 Apr 2019

👍1

For those interested, looks like we have an implementation! https://github.com/nayeon7lee/bert-summarization

it is not complete... :(

alexferrari88 on 24 Apr 2019

It's been almost half a year since BERT released. Does anybody know where to find any colab notebook which shows working summarization example?

qo4on on 6 May 2019

👍1

Thank you for your advice. I finally trained. Freezing bert encoder makes much readable and grammatically correct sentences. But Still cannot summarize well :'(.. maybe we need more technic like Pointer Generator ,Bottom-Top Summarization... etc :)

Could you share your exprerience about bert encoder + transformer decoder + pointer generator? I wonder whether it will summarize well with pointer generator. Thanks

hzhmelody on 7 May 2019

Is there anyone alive?

qo4on on 11 May 2019

😄8

There is this paper

Fine-tune BERT for Extractive Summarization

https://arxiv.org/pdf/1903.10318.pdf

I would love a colab example as well.

Santosh-Gupta on 13 May 2019

@santhoshkolloju I think the result might have something to do with the batch size? I tried to print out the batches generated by the iterator(FeedableDataIterator from texar), and despite trying to set batch_size to 32, the size of the generated batch remained 4....

Edit:
Okay, I finally get it, why the batch size is always 4
train_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'train',"./")
eval_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'eval',"./")
test_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'test',"./")
Those lines in the colab example are the culprit...

betty35 on 14 May 2019

@Santosh-Gupta They have their code released here: https://github.com/nlpyang/BertSum, though it's not a colab example

betty35 on 14 May 2019

hmm, any idea how to use it to end up with a function like summary_result = BertSum.summarize("Text to be summarized") ?

Santosh-Gupta on 14 May 2019

There is this paper

Fine-tune BERT for Extractive Summarization

Not extractive, Abstractive example please...

qo4on on 14 May 2019

👍1

There is this paper
Fine-tune BERT for Extractive Summarization

Not extractive, Abstractive example please...

So you're looking to generate a summary, not just extract the most importance sentences?

Santosh-Gupta on 14 May 2019

I'm looking for both, thanks for the above link. I will look into it right away.

CapitalZe on 15 May 2019

So you're looking to generate a summary, not just extract the most importance sentences?

Yes

qo4on on 16 May 2019

I tried a bert encoder + similar transformer decoder on generating summaries. And none of them work.
I believe there are a lot of tricks I didn't realize for fine-tuning the network.

KaiQiangSong on 18 May 2019

It looks like this repo is completed

https://github.com/nlpyang/BertSum

Also, UNILM gives some great abstractive summarization scores, maybe the best

Santosh-Gupta on 18 Jun 2019

UNILM gives some great abstractive summarization scores

I found UNILM paper only. Do you know where can we download the model? Any code example?

qo4on on 18 Jun 2019

👍2

The authors say that they are preparing a release of the code and pretrained model

Santosh-Gupta on 20 Jun 2019

I found one useful paper which gave better performance than BERT for text summarization.
paper: https://arxiv.org/pdf/1905.02450.pdf
code: https://github.com/microsoft/MASS
Codes are not fully ready yet although.