Nltk: Why the word 'puppy' is 'JJ'? When get its tag in the sentence.

Created on 3 Sep 2019 · 2Comments · Source: nltk/nltk

[('puppy', 'JJ'),
('is', 'VBZ'),
('walking', 'VBG'),
('on', 'IN'),
('the', 'DT'),
('street', 'NN'),
('to', 'TO'),
('meet', 'VB'),
('its', 'PRP$'),
('father', 'NN')]

'puppy is walking on the street to meet its father'

resolved tagger

Source

CyberFork

Most helpful comment

Also -y and -py would be associated with JJ

jnothman on 3 Sep 2019

👍2

All 2 comments

The POS tagger is stochastic and trained on well formed text.

Possibly reasons for the failure is the non-standard English format that the model isn't trained to understand, viz.

the lack of determiner (e.g. The puppy vs puppy)
non-standard capitalization (e.g. Puppy vs puppy)

Counter example of (1) that shows expected NN tag when determiners are added:

>>> from nltk import pos_tag

>>> pos_tag("The puppy is walking on the street to meet its father .".split())
[('The', 'DT'), ('puppy', 'NN'), ('is', 'VBZ'), ('walking', 'VBG'), ('on', 'IN'), ('the', 'DT'), ('street', 'NN'), ('to', 'TO'), ('meet', 'VB'), ('its', 'PRP$'), ('father', 'NN'), ('.', '.')]

>>> pos_tag("the puppy is walking on the street to meet its father .".split())
[('the', 'DT'), ('puppy', 'NN'), ('is', 'VBZ'), ('walking', 'VBG'), ('on', 'IN'), ('the', 'DT'), ('street', 'NN'), ('to', 'TO'), ('meet', 'VB'), ('its', 'PRP$'), ('father', 'NN'), ('.', '.')]

Counter example (2) that capitalize proper noun or start of sentence, e.g.

>>> from nltk import pos_tag

>>> pos_tag("Puppy is walking on the street to meet its father .".split())
[('Puppy', 'NNP'), ('is', 'VBZ'), ('walking', 'VBG'), ('on', 'IN'), ('the', 'DT'), ('street', 'NN'), ('to', 'TO'), ('meet', 'VB'), ('its', 'PRP$'), ('father', 'NN'), ('.', '.')]

BTW, you'll get similar effects in spacy:

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')

>>> text = "puppy is walking on the street to meet its father"
>>> [(token.text, token.pos_) for token in nlp(text)]
[('puppy', 'ADJ'), ('is', 'VERB'), ('walking', 'VERB'), ('on', 'ADP'), ('the', 'DET'), ('street', 'NOUN'), ('to', 'PART'), ('meet', 'VERB'), ('its', 'ADJ'), ('father', 'NOUN')]

>>> text = "The puppy is walking on the street to meet its father"
>>> [(token.text, token.pos_) for token in nlp(text)]
[('The', 'DET'), ('puppy', 'NOUN'), ('is', 'VERB'), ('walking', 'VERB'), ('on', 'ADP'), ('the', 'DET'), ('street', 'NOUN'), ('to', 'PART'), ('meet', 'VERB'), ('its', 'ADJ'), ('father', 'NOUN')]

>>> text = "Puppy is walking on the street to meet its father"
>>> [(token.text, token.pos_) for token in nlp(text)]
[('Puppy', 'PROPN'), ('is', 'VERB'), ('walking', 'VERB'), ('on', 'ADP'), ('the', 'DET'), ('street', 'NOUN'), ('to', 'PART'), ('meet', 'VERB'), ('its', 'ADJ'), ('father', 'NOUN')]