Hi,
I think the new SpanBERT model should also be supported in pytorch-transformers
馃槄
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text.
Paper can be found here.
Model is currently not released yet, I'll update this issue here whenever the model is available :)
are we going to get this? :) thanks :)
Fyi https://github.com/mandarjoshi90/coref#pretrained-coreference-models describes how to obtain the coreference models that should contain SpanBERT.
@ArneBinder Thanks for that hint!
I downloaded the SpanBERT (base) model. Unfortunately, the TF checkpoint conversion throws the following error message:
INFO:pytorch_transformers.modeling_bert:Loading TF weight width_scores/output_weights/Adam_1 with shape [3000, 1]
INFO:pytorch_transformers.modeling_bert:Skipping antecedent_distance_emb
Traceback (most recent call last):
File "/usr/local/bin/pytorch_transformers", line 11, in <module>
load_entry_point('pytorch-transformers', 'console_scripts', 'pytorch_transformers')()
File "/mnt/pytorch-transformers/pytorch_transformers/__main__.py", line 30, in main
convert_tf_checkpoint_to_pytorch(TF_CHECKPOINT, TF_CONFIG, PYTORCH_DUMP_OUTPUT)
File "/mnt/pytorch-transformers/pytorch_transformers/convert_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_bert(model, config, tf_checkpoint_path)
File "/mnt/pytorch-transformers/pytorch_transformers/modeling_bert.py", line 111, in load_tf_weights_in_bert
assert pointer.shape == array.shape
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 591, in __getattr__
type(self).__name__, name))
AttributeError: 'BertForPreTraining' object has no attribute 'shape'
I think some variables must be skipped, so a debugging session is unavoidable 馃槄
Hi @stefan-it, the SpanBERT authors shared their (~pytorch-transformers
-compatible) weights with us, so if you'd be interested we can send them your way so you can experiment/integrate them here.
Let me know!
@julien-c this would be awesome 馃 I would really like to do some experiments (mainly NER and PoS tagging) - would be great if you can share the weights (my mail is [email protected]
) - thank you in advance :heart:
Hi @julien-c, I would also like to receive the spanbert pytorch-compatible weights for semantic tasks like coref. could you send it to me too? my mail is [email protected]. many thanks.
You can have a look here, the official implementation has just been released: https://github.com/facebookresearch/SpanBERT
Well, two preliminary experiments (SpanBERT base) on CoNLL-2003 show a difference of ~7.8% compared to a BERT (base, cased) model 馃槺 So maybe this has something to do with the named entity masking 馃 But I'll investigate that further this weekend...
Update on that: I tried SpanBERT for PoS tagging and the results are pretty close to DistilBERT. Here's one run over the Universal Dependencies v1.2:
| Model | Dev | Test
| ---------------------------------------------------------- | --------- | ---------
| RoBERTa (large) | 97.80 | 97.75
| SpanBERT (large) | 96.48 | 96.61
| BERT (large, cased) | 97.35 | 97.20
| DistilBERT (uncased) | 96.64 | 96.70
| Plank et. al (2016) | - | 95.52
| Yasunaga et. al (2017) | - | 95.82
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
You can have a look here, the official implementation has just been released: https://github.com/facebookresearch/SpanBERT