Tensor2tensor: [Question]: NER with the Transformer

Created on 28 Sep 2018 · 8Comments · Source: tensorflow/tensor2tensor

Description

After successfully using the Transformer for my own translation task I was wondering if this powerful model would also perform for a NER (Named Entity Recognition) task? I was thinking as modelling this task as a seq2seq problem with sequences like:

input ---------------> output
This car is a Volvo --> O O O O ORGANISATION

ideally, at the same time I want to be able to recognise entities regardless of case. So, two questions:
1) Would this kind of approach intuitively work with the Transformer architecture? is my thinking correct here?
2) Would it make sense to feed a copy of my dataset with everything lowercased into the model to account for lowercase or would this data duplication be harmful/useless?

Just wanting to hear some opinions!

Source

mabergerx

👍1

Most helpful comment

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

yumath on 13 Mar 2019

👀6 👍2

All 8 comments

Transformer shall be used for most of seq2seq problem including named entity recognition. I am thinking it would be better not to lowercase the sentence as for most of capitalised words their chance of being named entity is high.

lkluo on 2 Oct 2018

In NER, you need one output tag (e.g. in BIO encoding) for each input word. This usually means that your input is already tokenized.
If this is the case, you have three options:

Use seq2seq as suggested by @lkluo. In this case, it is not granted that the number of output tags will be equal to the number of input tokens, but a well-trained model should learn this quite soon.
Use just a Transformer encoder with a softmax layer on top (and no decoder). You can disable T2T subword tokenization and provide your own vocabulary based on words with the replace_oov option.
As above, but use subwords and prevent thus the need of OOVs. In this case, you need to edit the output sequence in the training data to match the length of the source sequence, e.g. if _Volvo_ is split into subwords _Vol_ and _vo_, it will be: This/O car/O is/O Vol_/ORG vo/CONTINUATION.

You can also try character-based models (it makes sense for NER).

Capitalization is one of the most important features for NER, so lowercasing everything is definitely a bad idea. Character-based models will surely learn the difference between lowercase and uppercase automatically (and even subword-based models, I think).

martinpopel on 3 Oct 2018

👍2

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

yumath on 13 Mar 2019

👀6 👍2

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

Have you solved your problem？I got the same problem.

wenhrui on 22 Jul 2019

@wenhrui no, I haven't.

yumath on 22 Jul 2019

How and where to use the Transformer for NER task (I have implemented using CNN+Bi-LSTM+CRF)

Saichethan on 23 Jul 2019

I am having same issue .. @Saichethan can you share the github if you solved

niranjan8129 on 18 Sep 2019

TENER: Adapting Transformer Encoder for Name Entity Recognition
https://arxiv.org/pdf/1911.04474.pdf

qq547276542 on 13 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to get real bleu score? [approx_bleu_score]

ndvbd · 3Comments

Need help with understanding tokenization and pre processing in case of translation problem.

sugeeth14 · 3Comments

Tensorboard Support?

jsawruk · 4Comments

Installation error

SapphireEmbers · 3Comments

define my own hparams set but encounter problem with the new updated version

KayShenClarivate · 3Comments