Spacy: Bug: TypeError when trying to use TextCategorizer

Created on 15 Dec 2017 · 4Comments · Source: explosion/spaCy

when running the following code (from the code exmple here https://spacy.io/api/textcategorizer):

import spacy
from spacy.pipeline import TextCategorizer
nlp = spacy.load('en')

textcat = TextCategorizer(nlp.vocab)
doc = nlp(u"This is a sentence.")
processed = textcat(doc)

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-42-f23a44f17862> in <module>()
      5 textcat = TextCategorizer(nlp.vocab)
      6 doc = nlp(u"This is a sentence.")
----> 7 processed = textcat(doc)

pipeline.pyx in spacy.pipeline.TextCategorizer.__call__()

pipeline.pyx in spacy.pipeline.TextCategorizer.predict()

TypeError: 'bool' object is not callable

Info about spaCy

spaCy version: 2.0.1
Platform: Linux-4.10.0-42-generic-x86_64-with-debian-stretch-sid
Python version: 3.6.3
Models: en

docs usage

Source

shgidi

Most helpful comment

Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:

The model for the parser is not initialized until you either load the weights (with .from_disk() or .from_bytes() methods, or initialize with .begin_training(). You can also create a model with the parser.Model() class method.

Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:

textcat.from_disk('/path/to/model')

nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)

The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.

ines on 16 Dec 2017

👍2

All 4 comments

Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:

The model for the parser is not initialized until you either load the weights (with .from_disk() or .from_bytes() methods, or initialize with .begin_training(). You can also create a model with the parser.Model() class method.

Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:

textcat.from_disk('/path/to/model')

nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)

The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.

ines on 16 Dec 2017

👍2

@ines is there a pre-trained model that I can download and use, just like the language models?

I know I can train one myself as per https://github.com/explosion/spacy/blob/master/examples/training/train_textcat.py, but I'm wondering if there's one that I can readily use.

safwank on 2 Jan 2018

@safwank Not yet! Text classification is pretty specific, though, which makes it much harder to provide general-purpose models like the language models. However, it might be nice to offer an example model people can try out.

If you're interested in an end-to-end workflow of training a text classifier, check out this video tutorial we've recorded for our annotation tool Prodigy:
https://prodi.gy/docs/video-insults-classifier

The workflow focuses on collecting the annotations to train the classifier – but under the hood, it updates spaCy's TextCategorizer with the collected annotations, and saves out a spaCy model with the new category available via doc.cats.

ines on 3 Jan 2018

🎉1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.