Spacy: Greek Lemmatizer issue with lookup() keyword argument

Created on 30 Jan 2020 · 7Comments · Source: explosion/spaCy

I copied and pasted the updated files to ...\spacy\lang\el and I got this error. Does anyone have any idea why that happened ? Thank you in advance

import spacy
from spacy.lang.el import Greek
nlp = Greek()
doc = nlp('Χθες')
doc[0].lemma_
Traceback(most recent call last):
File "", line 1, in
File "token.pyx",line 8,in spacy.tokens.token.Token.lemma_._ _get_ _
TypeError: lookup() got an unexpected keyword argument 'orth'

Operating System: Windows-10
Python Version Used: 3.7.4
spaCy Version Used: 2.2.3

lang / el usage

Source

polekspd

All 7 comments

Hi,

I copied and pasted the updated files to ...\spacy\lang\el

What exactly do you mean - which files did you update, and how did you update them ?

It looks like something's gone wrong with your spaCy installation, probably connected to the files you were copying. Did you install from source? Did it compile properly? Or did you install it via pip ?

svlandeg on 30 Jan 2020

Hello,
I installed spacy via pip. However, I needed to modify the lemmatizer for the greek language, so I edited the following files as this user did https://github.com/giannisdaras/spaCy/commit/fe94e696d3dc5abdfb846d152ebf489518419513
and then I addresed the following issue that occured https://github.com/explosion/spaCy/issues/4272 . I tried all of the above in two versions of spacy , 2.23 and 2.1.8. I'm also using a virtual enviroment .

polekspd on 31 Jan 2020

Right, so spaCy 2.1.8 won't work for you, but in 2.2.3 this bug should be fixed. I just tried it out myself and didn't get any errors, and there's even a unit test ensuring that it works.

It's probably best to create an entirely clean environment, install 2.2.3, run your code WITHOUT making any changes to the files, and check whether that works. Please let me know whether it does. Then afterwards, you could try changing files and making sure nothing breaks in between.

Do note that the changes made by @giannisdaras should be included in the 2.2.3 release already.

svlandeg on 31 Jan 2020

👍1

Hello,I followed all of the above steps. There is no error , but the thing is , that lemmatization doesn't work for the Greek language as it should. Ιnstead , .lemma_ returns the word itself.

import spacy
from spacy.lang.el import Greek
doc = nlp('αγόρασες')
doc[0].lemma_
'αγόρασες'

polekspd on 1 Feb 2020

Are you installing spaCy with the lookups data, as described here? https://spacy.io/usage#pip

For example, pip install spacy[lookups]. Otherwise, it won't have the lemma rules and tables available.

ines on 3 Feb 2020

👍1

Thank you very much for your help, it turns out that I had to install the lookups data as well as the model 'el'. Most words are lemmatized correctly, so I'm assuming that the ones that are not , are not included in the lookup table.

polekspd on 3 Feb 2020

👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.