Spacy: Spacy Dutch noun_phrases returns empty list using nl_core_news_sm

Created on 14 Jun 2020 · 6Comments · Source: explosion/spaCy

Dutch language model 'nl_core_news_sm' Does not create noun_chucks list

This issue seems to be closed already:
https://github.com/explosion/spaCy/issues/2574

but I downladed the last version of nl_core_news_sm and it seems that no chunks are found and no error appears.

Operating System: windows
Python Version Used: Python 3.7.4
spaCy Version Used: 2.2.3
Environment Information: Anaconda

The code for copy and paste:
nlp_nl = spacy.load('nl_core_news_sm')
nlp_en = spacy.load('en_core_web_sm')
nlp_de = spacy.load('de_core_news_sm')
nlp_fr = spacy.load('fr_core_news_sm')

doc_nl = nlp_nl('Dit is een tafel. Hij kocht een auto')
doc_en = nlp_en('This is a table. He bought a car')
doc_de = nlp_de('Der Tisch ist sehr schön. Er hat ein Auto gekauft')
doc_fr = nlp_fr("C'est une table. Il a acheté une voiture.")

print('NL',[chunk.text for chunk in doc_nl.noun_chunks]) #This returns empty list.
print('EN',[chunk.text for chunk in doc_en.noun_chunks])
print('DE',[chunk.text for chunk in doc_de.noun_chunks])
print('FR',[chunk.text for chunk in doc_fr.noun_chunks])

print('NL',[token.text for token in doc_nl])
print('EN',[token.text for token in doc_en])
print('DE',[token.text for token in doc_de])
print('FR',[token.text for token in doc_fr])

enhancement help wanted (easy) lang / nl

Source

joseberlines

All 6 comments

Dutch indeed does not have a noun_chunks iterator implemented yet. It probably shouldn't be too hard to implement one though, looking at the English and German ones and making sure the dependency labels are adjusted to those used by the Dutch model. If you feel like contributing, a PR for this would be most welcome!

svlandeg on 15 Jun 2020

@tolomaus: I don't suppose you have one implemented on your end, that you could share? ;-)

svlandeg on 15 Jun 2020

No I haven't unfortunately. Last time I worked on this there was no dependency parser yet for Dutch so I used a simplistic approach based on POS tags (mostly adjectives and subjectives) to decide whether to combine chunks to nouns.

tolomaus on 15 Jun 2020

👍1

Can someone post here a link/info about how this is done for en/fr/de? I might be having a look to it. Perhaps someone (me?) working with/in Dutch might have a look and decide if he is in a position to contribute. I personally don't know what entails to create noun phrases from pos. Thx

joseberlines on 9 Aug 2020

Here are the implementations for the German and English ones.

If you do implement this for Dutch, it would be good to also add some tests, like here.

svlandeg on 20 Aug 2020

👍1

Merging this with the master issue #3056 - I've added the idea of contributing a new noun chunker for your favourite language, as a great way of contributing to the coverage of different languages in spaCy :-) So PR's for this would definitely be welcome!

svlandeg on 20 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings