Hi
I wanted to understand where am I going wrong with my pipeline
Whenever I feed spacy a clean text file, I get this error regarding Unicode
Any ideas where I'm going wrong?
Traceback (most recent call last):
File "spacypipeline3.py", line 21, in
parsedData = parser(text)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 314, in __call__
doc = self.make_doc(text)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 288, in
self.make_doc = lambda text: self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)
You need to convert the text to unicode before passing it to spacy.
text.decode(). You can pass it a codec if its not in ascii.
en_nlp('as莽eptique is not a word'.decode('utf-8'))
This is something that you should really need to know about if you are using python2 rather than python3. The documentation is here:
https://docs.python.org/2/howto/unicode.html
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
You need to convert the text to unicode before passing it to spacy.
text.decode(). You can pass it a codec if its not in ascii.
This is something that you should really need to know about if you are using python2 rather than python3. The documentation is here:
https://docs.python.org/2/howto/unicode.html