As discussed in issue #5303, currently it is not possible to create arrays of object dtype containing equal-length sequences, since the sequence is automatically read in as array elements. There is a suggestion to only do this for lists, but this would be a major backwards compatibility break and would require a long deprecation period.
Another approach would be to have a function explicitly for creating arrays with an object dtype. Perhaps this could be called "objectarray". The default for this function would be to take in a sequence, and consider each element of the sequence as an element in a 1D object array.
The function, however, could have an optional "ndim" or "depth" argument, that could be used to specify how many levels of the sequence should be considered part of the array. This would default to 0 (only the outermost level is considered). This would raise an exception if the dimensions don't match.
Note that this approach is not mutually exclusive with the alternative, but has the advantage that it wouldn't break backwards-compatibility.
So for example:
>>> arr = objectarray([((1, 2, 3), (4, 5, 6)), ((7, 8, 9), (10, 11, 12))])
>>> arr
array([((1, 2, 3), (4, 5, 6)), ((7, 8, 9), (10, 11, 12))], dtype=object)
>>> arr.shape
(2,)
>>> arr = objectarray([((1, 2, 3), (4, 5, 6)), ((7, 8, 9), (10, 11, 12))], depth=1)
>>> arr
array([[(1, 2, 3), (4, 5, 6)],
[(7, 8, 9), (10, 11, 12)]], dtype=object)
>>> arr.shape
(2, 2)
>>> arr = objectarray([((1, 2, 3), (4, 5, 6)), ((7, 8, 9), (10, 11, 12))], depth=2)
>>> arr
array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]]], dtype=object)
>>> arr.shape
(2, 2, 3)
I think the easiest way to get equal sized lists into an object array is in two steps:
>>> a = empty((2,), dtype=np.object)
>>> a[:] = [[1,2,3],[4,5,6]]
>>> b = empty((2,3), dtype=np.object)
>>> b[:] = [[1,2,3],[4,5,6]]
Probable an implementation of objectarray
would work like this.
Yes, that is currently the best way, but it is needlessly verbose. Hence this idea.
I would hope that an implementation of this idea would simply be able to bypass the automatic conversion used in the array
function and substitute its own to the ndarray
constructor.
Hope I'm not missing anything, but it seems to me that a ndmax
argument would not only solve the problem reported ("_create arrays of object dtype containing equal-length sequences_"), but also bring performance gains in those cases in which e.g. the last object in the input is not a list (or is a list with different length). Also see this question.
Any progress or on or plans to implement ndmax
? What I'm doing right now:
np.array([*data, None])[:-1]
# This would look a lot cleaner:
np.array(data, ndmax=1)
Most helpful comment
Hope I'm not missing anything, but it seems to me that a
ndmax
argument would not only solve the problem reported ("_create arrays of object dtype containing equal-length sequences_"), but also bring performance gains in those cases in which e.g. the last object in the input is not a list (or is a list with different length). Also see this question.