Spacy: DocBin.to_bytes fails for empty DocBin

Created on 12 Mar 2020  路  3Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

The exact example from the docs causes an error:

doc_bin = DocBin(attrs=["DEP", "HEAD"])
doc_bin_bytes = doc_bin.to_bytes()
ValueError                                Traceback (most recent call last)
<ipython-input-6-d51ca7c2f6fe> in <module>
----> 1 doc_bin_bytes = doc_bin.to_bytes()

~/anaconda3/envs/insights/lib/python3.7/site-packages/spacy/tokens/_serialize.py in to_bytes(self)
    141         msg = {
    142             "attrs": self.attrs,
--> 143             "tokens": numpy.vstack(self.tokens).tobytes("C"),
    144             "spaces": numpy.vstack(self.spaces).tobytes("C"),
    145             "lengths": numpy.asarray(lengths, dtype="int32").tobytes("C"),

<__array_function__ internals> in vstack(*args, **kwargs)

~/anaconda3/envs/insights/lib/python3.7/site-packages/numpy/core/shape_base.py in vstack(tup)
    281     if not isinstance(arrs, list):
    282         arrs = [arrs]
--> 283     return _nx.concatenate(arrs, 0)
    284 
    285 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

Info about spaCy

  • spaCy version: 2.2.4
  • Platform: Linux-4.15.0-88-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.6
  • Models: en, fr

numpy==1.18.1

bug feat / doc feat / serialize

Most helpful comment

@svlandeg thank you! I have worked around this for now by subclassing the DocBin and overriding to_bytes in a similar way to how you implemented the fix.

All 3 comments

Thanks for the report! PR #5148 should fix the issue and will be available in spaCy 3.0 onwards. Meanwhile, if this is blocking you, you can make the changes to your local install as shown here.

@svlandeg thank you! I have worked around this for now by subclassing the DocBin and overriding to_bytes in a similar way to how you implemented the fix.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings