Spacy: to_array(scalar) has bad error message

Created on 11 Oct 2017 · 5Comments · Source: explosion/spaCy

text.to_array(spacy.attrs.LEMMA)

raises

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

when it should either work or be clearer in the error: it expects a list, not a single attr.

Info about spaCy

spaCy version: 1.9.0
Python version: 3.5.2
Platform: Darwin-16.7.0-x86_64-i386-64bit
Installed models: en

enhancement help wanted help wanted (easy) 🌙 nightly

Source

jnothman

All 5 comments

Agreed, thanks.

I think we should make this work, i.e. LEMMA should be handled as [LEMMA]. We should handle "LEMMA" and "lemma" too.

honnibal on 11 Oct 2017

👍1

If you want text.to_array(spacy.attrs.LEMMA) to work, I would return a 1d array rather than 2d for consistency with numpy semantics.

jnothman on 16 Oct 2017

👍1

Does this describe the behaviour you expect? I added the option of an out=None array for further numpy consistency.

def to_array(self, attr_ids, out=None):
    """Export given token attributes to a numpy `ndarray`.

    If `attr_ids` is a sequence of M attributes, the output array will
    be of shape `(N, M)`, where N is the length of the `Doc`
    (in tokens). If `attr_ids` is a single attribute, the output shape will
    be (N,). You can specify attributes by integer ID (e.g. spacy.attrs.LEMMA)
    or string name (e.g. 'LEMMA' or 'lemma').

    By default, a new numpy array of dtype uint64 is created for the output.
    You can instead pass in an array using the `out` keyword argument.
    """

honnibal on 16 Oct 2017

Sounds good

jnothman on 16 Oct 2017

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.