Spacy: error in nlp(text)

Created on 15 May 2016 · 5Comments · Source: explosion/spaCy

doc = nlp(text)
Traceback (most recent call last):

File "", line 1, in
doc = nlp(text)

File "/Applications/anaconda/lib/python2.7/site-packages/spacy/language.py", line 225, in call
tokens = self.tokenizer(text)

TypeError: Argument 'string' has incorrect type (expected unicode, got str)

Source

Tracy2014

Most helpful comment

spaCy expects unicode strings, not byte strings. We enforce this in the interface, because otherwise you'll get problems down the line on some minority of strings.

You can paper over the issue by doing something like doc = nlp(text.decode('utf8')). But this will likely bring you more bugs in future.

The correct solution is to make sure you have from __future__ import unicode_literals at the top of all your modules, and to make sure you read strings into unicode at the source.

honnibal on 15 May 2016

👍4

All 5 comments

spaCy expects unicode strings, not byte strings. We enforce this in the interface, because otherwise you'll get problems down the line on some minority of strings.

You can paper over the issue by doing something like doc = nlp(text.decode('utf8')). But this will likely bring you more bugs in future.

The correct solution is to make sure you have from __future__ import unicode_literals at the top of all your modules, and to make sure you read strings into unicode at the source.

honnibal on 15 May 2016

👍4

I am still having issue

from __future__ import unicode_literals
import spacy

nlp = spacy.load('en')
text = open('news2.txt').read()
doc = nlp(text)

TypeError                                 Traceback (most recent call last)
<ipython-input-22-f03b20ead295> in <module>()
      1 text = open('news2.txt').read()
----> 2 doc = nlp(text)
      3 
      4 

C:\ProgramData\Anaconda2\lib\site-packages\spacy\language.pyc in __call__(self, text, tag, parse, entity)
    223         ('An', 'NN')
    224         """
--> 225         tokens = self.tokenizer(text)
    226         if self.tagger and tag:
    227             self.tagger(tokens)

TypeError: Argument 'string' has incorrect type (expected unicode, got str)

shashwattrivedi on 17 Jan 2018

from spacy.lang.en import English
text = open('flow.txt').read()
doc = nlp(text)

TypeError Traceback (most recent call last)
in ()
1 text = open('flow.txt').read() # open a document
----> 2 doc = nlp(text) # process it

c:\python27\lib\site-packages\spacy\language.pyc in __call__(self, text, disable)
327 ('An', 'NN')
328 """
--> 329 doc = self.make_doc(text)
330 for name, proc in self.pipeline:
331 if name in disable:

c:\python27\lib\site-packages\spacy\language.pyc in make_doc(self, text)
355
356 def make_doc(self, text):
--> 357 return self.tokenizer(text)
358
359 def update(self, docs, golds, drop=0., sgd=None, losses=None):

TypeError: Argument 'string' has incorrect type (expected unicode, got str)

I still get this error!