Transformers: SpanBERT support

Created on 25 Jul 2019  路  10Comments  路  Source: huggingface/transformers

Hi,

I think the new SpanBERT model should also be supported in pytorch-transformers 馃槄

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text.

Paper can be found here.

Model is currently not released yet, I'll update this issue here whenever the model is available :)

wontfix

Most helpful comment

You can have a look here, the official implementation has just been released: https://github.com/facebookresearch/SpanBERT

All 10 comments

are we going to get this? :) thanks :)

Fyi https://github.com/mandarjoshi90/coref#pretrained-coreference-models describes how to obtain the coreference models that should contain SpanBERT.

@ArneBinder Thanks for that hint!

I downloaded the SpanBERT (base) model. Unfortunately, the TF checkpoint conversion throws the following error message:

INFO:pytorch_transformers.modeling_bert:Loading TF weight width_scores/output_weights/Adam_1 with shape [3000, 1]
INFO:pytorch_transformers.modeling_bert:Skipping antecedent_distance_emb
Traceback (most recent call last):
  File "/usr/local/bin/pytorch_transformers", line 11, in <module>
    load_entry_point('pytorch-transformers', 'console_scripts', 'pytorch_transformers')()
  File "/mnt/pytorch-transformers/pytorch_transformers/__main__.py", line 30, in main
    convert_tf_checkpoint_to_pytorch(TF_CHECKPOINT, TF_CONFIG, PYTORCH_DUMP_OUTPUT)
  File "/mnt/pytorch-transformers/pytorch_transformers/convert_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_bert(model, config, tf_checkpoint_path)
  File "/mnt/pytorch-transformers/pytorch_transformers/modeling_bert.py", line 111, in load_tf_weights_in_bert
    assert pointer.shape == array.shape
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 591, in __getattr__
    type(self).__name__, name))
AttributeError: 'BertForPreTraining' object has no attribute 'shape'

I think some variables must be skipped, so a debugging session is unavoidable 馃槄

Hi @stefan-it, the SpanBERT authors shared their (~pytorch-transformers-compatible) weights with us, so if you'd be interested we can send them your way so you can experiment/integrate them here.

Let me know!

@julien-c this would be awesome 馃 I would really like to do some experiments (mainly NER and PoS tagging) - would be great if you can share the weights (my mail is [email protected]) - thank you in advance :heart:

Hi @julien-c, I would also like to receive the spanbert pytorch-compatible weights for semantic tasks like coref. could you send it to me too? my mail is [email protected]. many thanks.

You can have a look here, the official implementation has just been released: https://github.com/facebookresearch/SpanBERT

Well, two preliminary experiments (SpanBERT base) on CoNLL-2003 show a difference of ~7.8% compared to a BERT (base, cased) model 馃槺 So maybe this has something to do with the named entity masking 馃 But I'll investigate that further this weekend...

Update on that: I tried SpanBERT for PoS tagging and the results are pretty close to DistilBERT. Here's one run over the Universal Dependencies v1.2:

| Model | Dev | Test
| ---------------------------------------------------------- | --------- | ---------
| RoBERTa (large) | 97.80 | 97.75
| SpanBERT (large) | 96.48 | 96.61
| BERT (large, cased) | 97.35 | 97.20
| DistilBERT (uncased) | 96.64 | 96.70
| Plank et. al (2016) | - | 95.52
| Yasunaga et. al (2017) | - | 95.82

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adigoryl picture adigoryl  路  3Comments

fyubang picture fyubang  路  3Comments

iedmrc picture iedmrc  路  3Comments

quocnle picture quocnle  路  3Comments

alphanlp picture alphanlp  路  3Comments