The exact example from the docs causes an error:
doc_bin = DocBin(attrs=["DEP", "HEAD"])
doc_bin_bytes = doc_bin.to_bytes()
ValueError Traceback (most recent call last)
<ipython-input-6-d51ca7c2f6fe> in <module>
----> 1 doc_bin_bytes = doc_bin.to_bytes()
~/anaconda3/envs/insights/lib/python3.7/site-packages/spacy/tokens/_serialize.py in to_bytes(self)
141 msg = {
142 "attrs": self.attrs,
--> 143 "tokens": numpy.vstack(self.tokens).tobytes("C"),
144 "spaces": numpy.vstack(self.spaces).tobytes("C"),
145 "lengths": numpy.asarray(lengths, dtype="int32").tobytes("C"),
<__array_function__ internals> in vstack(*args, **kwargs)
~/anaconda3/envs/insights/lib/python3.7/site-packages/numpy/core/shape_base.py in vstack(tup)
281 if not isinstance(arrs, list):
282 arrs = [arrs]
--> 283 return _nx.concatenate(arrs, 0)
284
285
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
numpy==1.18.1
Thanks for the report! PR #5148 should fix the issue and will be available in spaCy 3.0 onwards. Meanwhile, if this is blocking you, you can make the changes to your local install as shown here.
@svlandeg thank you! I have worked around this for now by subclassing the DocBin and overriding to_bytes in a similar way to how you implemented the fix.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@svlandeg thank you! I have worked around this for now by subclassing the DocBin and overriding
to_bytesin a similar way to how you implemented the fix.