Spacy: Spacy Dutch noun_phrases returns empty list using nl_core_news_sm

Created on 14 Jun 2020  路  6Comments  路  Source: explosion/spaCy

Dutch language model 'nl_core_news_sm' Does not create noun_chucks list

This issue seems to be closed already:
https://github.com/explosion/spaCy/issues/2574

but I downladed the last version of nl_core_news_sm and it seems that no chunks are found and no error appears.

  • Operating System: windows
  • Python Version Used: Python 3.7.4
  • spaCy Version Used: 2.2.3
  • Environment Information: Anaconda

image

The code for copy and paste:
nlp_nl = spacy.load('nl_core_news_sm')
nlp_en = spacy.load('en_core_web_sm')
nlp_de = spacy.load('de_core_news_sm')
nlp_fr = spacy.load('fr_core_news_sm')

doc_nl = nlp_nl('Dit is een tafel. Hij kocht een auto')
doc_en = nlp_en('This is a table. He bought a car')
doc_de = nlp_de('Der Tisch ist sehr sch枚n. Er hat ein Auto gekauft')
doc_fr = nlp_fr("C'est une table. Il a achet茅 une voiture.")

print('NL',[chunk.text for chunk in doc_nl.noun_chunks]) #This returns empty list.
print('EN',[chunk.text for chunk in doc_en.noun_chunks])
print('DE',[chunk.text for chunk in doc_de.noun_chunks])
print('FR',[chunk.text for chunk in doc_fr.noun_chunks])

print('NL',[token.text for token in doc_nl])
print('EN',[token.text for token in doc_en])
print('DE',[token.text for token in doc_de])
print('FR',[token.text for token in doc_fr])

enhancement help wanted (easy) lang / nl

All 6 comments

Dutch indeed does not have a noun_chunks iterator implemented yet. It probably shouldn't be too hard to implement one though, looking at the English and German ones and making sure the dependency labels are adjusted to those used by the Dutch model. If you feel like contributing, a PR for this would be most welcome!

@tolomaus: I don't suppose you have one implemented on your end, that you could share? ;-)

No I haven't unfortunately. Last time I worked on this there was no dependency parser yet for Dutch so I used a simplistic approach based on POS tags (mostly adjectives and subjectives) to decide whether to combine chunks to nouns.

Can someone post here a link/info about how this is done for en/fr/de? I might be having a look to it. Perhaps someone (me?) working with/in Dutch might have a look and decide if he is in a position to contribute. I personally don't know what entails to create noun phrases from pos. Thx

Here are the implementations for the German and English ones.

If you do implement this for Dutch, it would be good to also add some tests, like here.

Merging this with the master issue #3056 - I've added the idea of contributing a new noun chunker for your favourite language, as a great way of contributing to the coverage of different languages in spaCy :-) So PR's for this would definitely be welcome!

Was this page helpful?
0 / 5 - 0 ratings