Spacy: Lexemes are unhashable (v0.101.0)

Created on 12 May 2016  路  6Comments  路  Source: explosion/spaCy

When I try to add Lexemes to a set or dict, it fails since Lexemes are unhashable:

cat = nlp.vocab['cat']
dog = nlp.vocab['dog']
my_animals = {cat, dog}

Traceback (most recent call last):

  File "<ipython-input-30-8ffec97fae23>", line 1, in <module>
    my_animals = {cat, dog}

TypeError: unhashable type: 'spacy.lexeme.Lexeme'

Maybe lexeme.orth can be used (together with lexeme.lang) as hash value?

Another funny observation is that looking up the same word multiple times through nlp.vocab[word] produces Lexemes at different addresses (although comparison works thanks to the newly implemented rich comparison):

nlp.vocab['cat']
Out[17]: <spacy.lexeme.Lexeme at 0xe865401e10>

nlp.vocab['cat']
Out[18]: <spacy.lexeme.Lexeme at 0xe865401d80>

Most helpful comment

Btw the line should probably be:

allWords = [w for w in parser.vocab if w.has_vector and w.is_lower and w.lower_ != "nasa"]

The old .repvec property is now named .vector, too.

The __hash__ method will be there in the next release.

All 6 comments

To save memory, the Lexeme class is a wrapper around the LexemeC struct. So the Python objects are indeed created afresh each time. You can see the implementation here: https://github.com/spacy-io/spaCy/blob/master/spacy/lexeme.pyx#L31

Adding a __hash__ method is a good idea though. Will do.

Sounds reasonable, thanks for the explanation!

Is there a workaround for this in the meantime? I'm new to NLP and trying to follow this guide, specifically the part where it mentions word vector representations.

@lylebrown
Replace the curly braces ({ }) with square brackets ([ ]) in the following line:

allWords = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() and w.lower_ != "nasa"})

Btw the line should probably be:

allWords = [w for w in parser.vocab if w.has_vector and w.is_lower and w.lower_ != "nasa"]

The old .repvec property is now named .vector, too.

The __hash__ method will be there in the next release.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings