Tensor2tensor: [Question]: NER with the Transformer

Created on 28 Sep 2018  路  8Comments  路  Source: tensorflow/tensor2tensor

Description

After successfully using the Transformer for my own translation task I was wondering if this powerful model would also perform for a NER (Named Entity Recognition) task? I was thinking as modelling this task as a seq2seq problem with sequences like:

input ---------------> output
This car is a Volvo --> O O O O ORGANISATION

ideally, at the same time I want to be able to recognise entities regardless of case. So, two questions:
1) Would this kind of approach intuitively work with the Transformer architecture? is my thinking correct here?
2) Would it make sense to feed a copy of my dataset with everything lowercased into the model to account for lowercase or would this data duplication be harmful/useless?

Just wanting to hear some opinions!

Most helpful comment

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

All 8 comments

Transformer shall be used for most of seq2seq problem including named entity recognition. I am thinking it would be better not to lowercase the sentence as for most of capitalised words their chance of being named entity is high.

In NER, you need one output tag (e.g. in BIO encoding) for each input word. This usually means that your input is already tokenized.
If this is the case, you have three options:

  • Use seq2seq as suggested by @lkluo. In this case, it is not granted that the number of output tags will be equal to the number of input tokens, but a well-trained model should learn this quite soon.
  • Use just a Transformer encoder with a softmax layer on top (and no decoder). You can disable T2T subword tokenization and provide your own vocabulary based on words with the replace_oov option.
  • As above, but use subwords and prevent thus the need of OOVs. In this case, you need to edit the output sequence in the training data to match the length of the source sequence, e.g. if _Volvo_ is split into subwords _Vol_ and _vo_, it will be: This/O car/O is/O Vol_/ORG vo/CONTINUATION.

You can also try character-based models (it makes sense for NER).

Capitalization is one of the most important features for NER, so lowercasing everything is definitely a bad idea. Character-based models will surely learn the difference between lowercase and uppercase automatically (and even subword-based models, I think).

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

Have you solved your problem锛烮 got the same problem.

@wenhrui no, I haven't.

How and where to use the Transformer for NER task (I have implemented using CNN+Bi-LSTM+CRF)

I am having same issue .. @Saichethan can you share the github if you solved

TENER: Adapting Transformer Encoder for Name Entity Recognition
https://arxiv.org/pdf/1911.04474.pdf

Was this page helpful?
0 / 5 - 0 ratings