text.to_array(spacy.attrs.LEMMA)
raises
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
when it should either work or be clearer in the error: it expects a list, not a single attr.
Agreed, thanks.
I think we should make this work, i.e. LEMMA should be handled as [LEMMA]. We should handle "LEMMA" and "lemma" too.
If you want text.to_array(spacy.attrs.LEMMA) to work, I would return a 1d array rather than 2d for consistency with numpy semantics.
Does this describe the behaviour you expect? I added the option of an out=None array for further numpy consistency.
def to_array(self, attr_ids, out=None):
"""Export given token attributes to a numpy `ndarray`.
If `attr_ids` is a sequence of M attributes, the output array will
be of shape `(N, M)`, where N is the length of the `Doc`
(in tokens). If `attr_ids` is a single attribute, the output shape will
be (N,). You can specify attributes by integer ID (e.g. spacy.attrs.LEMMA)
or string name (e.g. 'LEMMA' or 'lemma').
By default, a new numpy array of dtype uint64 is created for the output.
You can instead pass in an array using the `out` keyword argument.
"""
Sounds good
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.