doc = nlp(text)
Traceback (most recent call last):
File "
doc = nlp(text)
File "/Applications/anaconda/lib/python2.7/site-packages/spacy/language.py", line 225, in call
tokens = self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)
spaCy expects unicode strings, not byte strings. We enforce this in the interface, because otherwise you'll get problems down the line on some minority of strings.
You can paper over the issue by doing something like doc = nlp(text.decode('utf8')). But this will likely bring you more bugs in future.
The correct solution is to make sure you have from __future__ import unicode_literals at the top of all your modules, and to make sure you read strings into unicode at the source.
I am still having issue
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en')
text = open('news2.txt').read()
doc = nlp(text)
TypeError Traceback (most recent call last)
<ipython-input-22-f03b20ead295> in <module>()
1 text = open('news2.txt').read()
----> 2 doc = nlp(text)
3
4
C:\ProgramData\Anaconda2\lib\site-packages\spacy\language.pyc in __call__(self, text, tag, parse, entity)
223 ('An', 'NN')
224 """
--> 225 tokens = self.tokenizer(text)
226 if self.tagger and tag:
227 self.tagger(tokens)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)
from spacy.lang.en import English
text = open('flow.txt').read()
doc = nlp(text)
TypeError Traceback (most recent call last)
1 text = open('flow.txt').read() # open a document
----> 2 doc = nlp(text) # process it
c:\python27\lib\site-packages\spacy\language.pyc in __call__(self, text, disable)
327 ('An', 'NN')
328 """
--> 329 doc = self.make_doc(text)
330 for name, proc in self.pipeline:
331 if name in disable:
c:\python27\lib\site-packages\spacy\language.pyc in make_doc(self, text)
355
356 def make_doc(self, text):
--> 357 return self.tokenizer(text)
358
359 def update(self, docs, golds, drop=0., sgd=None, losses=None):
TypeError: Argument 'string' has incorrect type (expected unicode, got str)
I still get this error!
​
You guys working in Py 2 should try using codecs.open for file reading and set the encoding to utf-8. docs
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
spaCy expects unicode strings, not byte strings. We enforce this in the interface, because otherwise you'll get problems down the line on some minority of strings.
You can paper over the issue by doing something like
doc = nlp(text.decode('utf8')). But this will likely bring you more bugs in future.The correct solution is to make sure you have
from __future__ import unicode_literalsat the top of all your modules, and to make sure you read strings into unicode at the source.