Spacy: TypeError: Unsupported type <class 'numpy.ndarray'>

Created on 16 Jan 2020  路  9Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

hi i am using spacy2.2.3 with gpu enabled then i tried to use displacy.render function i got error
code that i used is like below
```import spacy
spacy.require_gpu()
import pandas as pd
import re
from bs4 import BeautifulSoup
import random
from spacy.util import minibatch,compounding
from spacy import displacy

spacy.util.use_gpu(0)
df = pd.read_json("data_point_section_dataset.json",lines=True)
print(len(df))
df = df[df["doc_type"]=="doc_type"]
df = df[df["user_role"]=="special"]
print(len(df))
model_path = "spacy_2_2_3"
def populate_train_data(df):
train_data = []
for d_index, row in df.iterrows():
content = row["annotations"].replace("\n", "\n").replace("\n", " ")
content = re.sub(r"(?<=[:])(?=[^\s])", r" ", content)
# Finding tags and entities and store values in a entity list-----
soup = BeautifulSoup(content, "html.parser")
text = soup.get_text()
entities = []
for tag in soup.find_all():
if tag.string is None:
# failing silently for invalid tag
print(f'Tagging is invalid: {row["_id"], tag.name}, skipping..')
continue

        tag_index = content.split(str(tag))[0].count(tag.string)                 
        try:                     
            for index, match in enumerate(re.finditer(re.escape(tag.string), text)):                         
                if index == tag_index:                             
                    entities.append((match.start(), match.end(), tag.name))                 
        except Exception as e:                     
            print(e)                     
            continue             
    if entities:                 
        train_data.append((text, {"entities": entities}))

return train_data

def _train(train_data):
nlp = spacy.load("en_core_web_sm")
if "ner" not in nlp.pipe_names:
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last=True)
else:
ner = nlp.get_pipe("ner")
for _, annotations in train_data:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
optimizer = nlp.begin_training()
for i in range(20):
random.shuffle(train_data)
correct = 1
batches = minibatch(train_data)
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer)
return nlp

def predict(text, expected_dps):
try:
nlp = spacy.load(model_path)
except OSError:
raise ModelNotFoundError(f"Model not found. :{self.type}")
text = text.replace("\n", " ")
doc = nlp(text)
entities = []
for entity in doc.ents:
if entity.label_ in expected_dps:
data = {
"label": entity.label_,
"value": entity.text,
"start_index": entity.start_char,
"end_index": entity.end_char,
}
entities.append(data)

return entities,doc

def train():
train_data = populate_train_data(df)
nlp = _train(train_data)
nlp.to_disk(model_path)
expected_dps = ["claim_date_claim_form","injury_date_claim_form","start_injury_claim_form","end_injury_claim_form","injuries_claim_form","app_address_claim_form","injury_report_date_claim_form"]
dps,dps_for_html = predict(text="the text i want to predict",expected_dps)
print(dps)
return ""

train()

soup = BeautifulSoup(text,"html.parser")
text = soup.get_text()
expected_dps = ["claim_date_claim_form","injury_date_claim_form","start_injury_claim_form","end_injury_claim_form","injuries_claim_form","app_address_claim_form","injury_report_date_claim_form"]
dps,dps_for_html = predict(text="the text i want to predict",expected_dps)
print("hello")
print(dps)
html = displacy.render(dps_for_html.sents,style="ent")
print(html)```

but i am getting an error

as follows

Traceback (most recent call last): File "train_test.py", line 112, in <module> html = displacy.render(dps_for_html.sents,style="ent") File "/usr/local/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 46, in render docs = [obj if not isinstance(obj, Span) else obj.as_doc() for obj in docs] File "/usr/local/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 46, in <listcomp> docs = [obj if not isinstance(obj, Span) else obj.as_doc() for obj in docs] File "span.pyx", line 232, in spacy.tokens.span.Span.as_doc File "span.pyx", line 192, in __iter__ File "cupy/core/core.pyx", line 948, in cupy.core.core.ndarray.__add__ File "cupy/core/_kernel.pyx", line 886, in cupy.core._kernel.ufunc.__call__ File "cupy/core/_kernel.pyx", line 90, in cupy.core._kernel._preprocess_args TypeError: Unsupported type <class 'numpy.ndarray'>

this only happens with gpu enabled if remove require_gpu function dispalcy works well .

Your Environment

  • spaCy version: 2.2.3
  • Platform: Linux-4.4.0-1100-aws-x86_64-with-debian-stretch-sid
  • Python version: 3.6.0
bug feat / visualizers gpu more-info-needed

Most helpful comment

Hi, this problem has been fixed (#4680) and should be available soon in version 2.2.4.

All 9 comments

Hi @pyshahid , thanks for the report!

This certainly looks like an issue with spaCy.

Unfortunately your code snippet could not be run as-is, because it uses an external json file. Is there any chance you could reduce your code to a minimal running script that still exhibits the error? That would really help debugging on our side.

@svlandeg i tried with

import spacy
spacy.require_gpu()
import random
from spacy.util import minibatch,compounding
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
train_data = [("Uber blew through $1 million", {"entities":[(0,4, "ORG")]})]

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(other_pipes):
optimizer = nlp.begin_training()
for i in range(10):
random.shuffle(train_data)
batches = minibatch(train_data, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
text, annotation = zip(
batch)
nlp.update(text, annotation, sgd=optimizer)
nlp.to_disk("model")

model = spacy.load("model")

doc = model(train_data)
sentence = list(doc.sents)
displacy.serve(sentence,style='ent')

but i wasn't able to replicate that issue but if i use previously given code i still get this issue

i will give update once i create a json file and try again

Hi,

I am also seeing this issue.

Running this on Google Colab with a GPU runtime:

import spacy
spacy.prefer_gpu()

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

produces

NameError                                 Traceback (most recent call last)
<ipython-input-1-7caac881dafb> in <module>()
      3 
      4 nlp = spacy.load("en_core_web_sm")
----> 5 doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

20 frames
/usr/local/lib/python3.6/dist-packages/spacy/language.py in __call__(self, text, disable, component_cfg)
    400             if not hasattr(proc, "__call__"):
    401                 raise ValueError(Errors.E003.format(component=type(proc), name=name))
--> 402             doc = proc(doc, **component_cfg.get(name, {}))
    403             if doc is None:
    404                 raise ValueError(Errors.E005.format(name=name))

pipes.pyx in spacy.pipeline.pipes.Tagger.__call__()

pipes.pyx in spacy.pipeline.pipes.Tagger.predict()

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in __call__(self, x)
    167             Must match expected shape
    168         """
--> 169         return self.predict(x)
    170 
    171     def pipe(self, stream, batch_size=128):

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/feed_forward.py in predict(self, X)
     38     def predict(self, X):
     39         for layer in self._layers:
---> 40             X = layer(X)
     41         return X
     42 

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in __call__(self, x)
    167             Must match expected shape
    168         """
--> 169         return self.predict(x)
    170 
    171     def pipe(self, stream, batch_size=128):

/usr/local/lib/python3.6/dist-packages/thinc/api.py in predict(seqs_in)
    308     def predict(seqs_in):
    309         lengths = layer.ops.asarray([len(seq) for seq in seqs_in])
--> 310         X = layer(layer.ops.flatten(seqs_in, pad=pad))
    311         return layer.ops.unflatten(X, lengths, pad=pad)
    312 

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in __call__(self, x)
    167             Must match expected shape
    168         """
--> 169         return self.predict(x)
    170 
    171     def pipe(self, stream, batch_size=128):

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/feed_forward.py in predict(self, X)
     38     def predict(self, X):
     39         for layer in self._layers:
---> 40             X = layer(X)
     41         return X
     42 

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in __call__(self, x)
    167             Must match expected shape
    168         """
--> 169         return self.predict(x)
    170 
    171     def pipe(self, stream, batch_size=128):

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in predict(self, X)
    131 
    132     def predict(self, X):
--> 133         y, _ = self.begin_update(X, drop=None)
    134         return y
    135 

/usr/local/lib/python3.6/dist-packages/thinc/api.py in uniqued_fwd(X, drop)
    377         )
    378         X_uniq = layer.ops.xp.ascontiguousarray(X[ind])
--> 379         Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
    380         Y = Y_uniq[inv].reshape((X.shape[0],) + Y_uniq.shape[1:])
    381 

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/feed_forward.py in begin_update(self, X, drop)
     44         callbacks = []
     45         for layer in self._layers:
---> 46             X, inc_layer_grad = layer.begin_update(X, drop=drop)
     47             callbacks.append(inc_layer_grad)
     48 

/usr/local/lib/python3.6/dist-packages/thinc/api.py in begin_update(X, *a, **k)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in <listcomp>(.0)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in wrap(*args, **kwargs)
    254 
    255     def wrap(*args, **kwargs):
--> 256         output = func(*args, **kwargs)
    257         if splitter is None:
    258             to_keep, to_sink = output

/usr/local/lib/python3.6/dist-packages/thinc/api.py in begin_update(X, *a, **k)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in <listcomp>(.0)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in wrap(*args, **kwargs)
    254 
    255     def wrap(*args, **kwargs):
--> 256         output = func(*args, **kwargs)
    257         if splitter is None:
    258             to_keep, to_sink = output

/usr/local/lib/python3.6/dist-packages/thinc/api.py in begin_update(X, *a, **k)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in <listcomp>(.0)
    161     def begin_update(X, *a, **k):
    162         forward, backward = split_backward(layers)
--> 163         values = [fwd(X, *a, **k) for fwd in forward]
    164 
    165         output = ops.xp.hstack(values)

/usr/local/lib/python3.6/dist-packages/thinc/api.py in wrap(*args, **kwargs)
    254 
    255     def wrap(*args, **kwargs):
--> 256         output = func(*args, **kwargs)
    257         if splitter is None:
    258             to_keep, to_sink = output

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/hash_embed.py in begin_update(self, ids, drop)
     57         if ids.ndim >= 2:
     58             ids = self.ops.xp.ascontiguousarray(ids[:, self.column], dtype="uint64")
---> 59         keys = self.ops.hash(ids, self.seed) % self.nV
     60         vectors = self.vectors[keys].sum(axis=1)
     61         mask = self.ops.get_dropout_mask((vectors.shape[1],), drop)

ops.pyx in thinc.neural.ops.CupyOps.hash()

NameError: name 'gpu_ops' is not defined

Simply removing the line

spacy.prefer_gpu()

makes the code run perfectly.

Hi,

installing CUDA-enabled spacy with

pip install spacy[cuda100]

solved the problem for me. Perhaps this could be made clearer in the documentation. Thanks!

Good to hear you got your issue resolved @josesho, but it does look like the original post may be related to something else.

@pyshahid: can you show the commands and output you used for installing spaCy ? And can you provide a minimal running script that exhibits your error?

This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.

Hi,

Sorry @svlandeg, new error spotted! It seems when there are out-of-vocab words in the model, a TypeError is thrown?

import spacy
spacy.prefer_gpu()
nlp_core = spacy.load("en_core_web_lg")

text1 = "The effect of anxiogenic treatments on three rodent models of anxiety: \
         the open field test, the elevated plus-maze, and the light-dark box."

doc1 = nlp_core(text1)

doc1.vector

produces

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-531ef58ab65a> in <module>()
      1 doc1 = nlp_core(text1)
      2 
----> 3 doc1.vector

doc.pyx in __iter__()

cupy/core/core.pyx in cupy.core.core.ndarray.__add__()

cupy/core/_kernel.pyx in cupy.core._kernel.ufunc.__call__()

cupy/core/_kernel.pyx in cupy.core._kernel._preprocess_args()

TypeError: Unsupported type <class 'numpy.ndarray'>

Relevant package versions:

print(spacy.__version__)
print(cupy.__version__)
2.2.3
7.2.0

Hi, this problem has been fixed (#4680) and should be available soon in version 2.2.4.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
1 / 5 - 1 ratings