Spacy: Bug: TypeError when trying to use TextCategorizer

Created on 15 Dec 2017  Â·  4Comments  Â·  Source: explosion/spaCy

when running the following code (from the code exmple here https://spacy.io/api/textcategorizer):

import spacy
from spacy.pipeline import TextCategorizer
nlp = spacy.load('en')
​
textcat = TextCategorizer(nlp.vocab)
doc = nlp(u"This is a sentence.")
processed = textcat(doc)

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-42-f23a44f17862> in <module>()
      5 textcat = TextCategorizer(nlp.vocab)
      6 doc = nlp(u"This is a sentence.")
----> 7 processed = textcat(doc)

pipeline.pyx in spacy.pipeline.TextCategorizer.__call__()

pipeline.pyx in spacy.pipeline.TextCategorizer.predict()

TypeError: 'bool' object is not callable

Info about spaCy

  • spaCy version: 2.0.1
  • Platform: Linux-4.10.0-42-generic-x86_64-with-debian-stretch-sid
  • Python version: 3.6.3
  • Models: en
docs usage

Most helpful comment

Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:

The model for the parser is not initialized until you either load the weights (with .from_disk() or .from_bytes() methods, or initialize with .begin_training(). You can also create a model with the parser.Model() class method.

Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:

textcat.from_disk('/path/to/model')
nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)

The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.

All 4 comments

Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:

The model for the parser is not initialized until you either load the weights (with .from_disk() or .from_bytes() methods, or initialize with .begin_training(). You can also create a model with the parser.Model() class method.

Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:

textcat.from_disk('/path/to/model')
nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)

The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.

@ines is there a pre-trained model that I can download and use, just like the language models?

I know I can train one myself as per https://github.com/explosion/spacy/blob/master/examples/training/train_textcat.py, but I'm wondering if there's one that I can readily use.

@safwank Not yet! Text classification is pretty specific, though, which makes it much harder to provide general-purpose models like the language models. However, it might be nice to offer an example model people can try out.

If you're interested in an end-to-end workflow of training a text classifier, check out this video tutorial we've recorded for our annotation tool Prodigy:
https://prodi.gy/docs/video-insults-classifier

The workflow focuses on collecting the annotations to train the classifier – but under the hood, it updates spaCy's TextCategorizer with the collected annotations, and saves out a spaCy model with the new category available via doc.cats.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings