when running the following code (from the code exmple here https://spacy.io/api/textcategorizer):
import spacy
from spacy.pipeline import TextCategorizer
nlp = spacy.load('en')
​
textcat = TextCategorizer(nlp.vocab)
doc = nlp(u"This is a sentence.")
processed = textcat(doc)
I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-42-f23a44f17862> in <module>()
5 textcat = TextCategorizer(nlp.vocab)
6 doc = nlp(u"This is a sentence.")
----> 7 processed = textcat(doc)
pipeline.pyx in spacy.pipeline.TextCategorizer.__call__()
pipeline.pyx in spacy.pipeline.TextCategorizer.predict()
TypeError: 'bool' object is not callable
Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:
The model for the parser is not initialized until you either load the weights (with
.from_disk()or.from_bytes()methods, or initialize with .begin_training(). You can also create a model with theparser.Model()class method.
Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:
textcat.from_disk('/path/to/model')
nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)
The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.
@ines is there a pre-trained model that I can download and use, just like the language models?
I know I can train one myself as per https://github.com/explosion/spacy/blob/master/examples/training/train_textcat.py, but I'm wondering if there's one that I can readily use.
@safwank Not yet! Text classification is pretty specific, though, which makes it much harder to provide general-purpose models like the language models. However, it might be nice to offer an example model people can try out.
If you're interested in an end-to-end workflow of training a text classifier, check out this video tutorial we've recorded for our annotation tool Prodigy:
https://prodi.gy/docs/video-insults-classifier
The workflow focuses on collecting the annotations to train the classifier – but under the hood, it updates spaCy's TextCategorizer with the collected annotations, and saves out a spaCy model with the new category available via doc.cats.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:
Since the
TextCategorizerhas no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.