Spacy: to_array(scalar) has bad error message

Created on 11 Oct 2017  路  5Comments  路  Source: explosion/spaCy

text.to_array(spacy.attrs.LEMMA)

raises

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

when it should either work or be clearer in the error: it expects a list, not a single attr.

Info about spaCy

  • spaCy version: 1.9.0
  • Python version: 3.5.2
  • Platform: Darwin-16.7.0-x86_64-i386-64bit
  • Installed models: en
enhancement help wanted help wanted (easy) 馃寵 nightly

All 5 comments

Agreed, thanks.

I think we should make this work, i.e. LEMMA should be handled as [LEMMA]. We should handle "LEMMA" and "lemma" too.

If you want text.to_array(spacy.attrs.LEMMA) to work, I would return a 1d array rather than 2d for consistency with numpy semantics.

Does this describe the behaviour you expect? I added the option of an out=None array for further numpy consistency.

def to_array(self, attr_ids, out=None):
    """Export given token attributes to a numpy `ndarray`.

    If `attr_ids` is a sequence of M attributes, the output array will
    be of shape `(N, M)`, where N is the length of the `Doc`
    (in tokens). If `attr_ids` is a single attribute, the output shape will
    be (N,). You can specify attributes by integer ID (e.g. spacy.attrs.LEMMA)
    or string name (e.g. 'LEMMA' or 'lemma').

    By default, a new numpy array of dtype uint64 is created for the output.
    You can instead pass in an array using the `out` keyword argument.
    """

Sounds good

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings