Xarray: Sum based on start_index and end_index array

Created on 6 Apr 2020  路  5Comments  路  Source: pydata/xarray

I have three arrays:

  1. a: input array
  2. sindex: the array containing the start index for summation
  3. eindex: the array containing the end index for summation

MCVE Code Sample

import xarray as xr
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [10, 11, 12]])

# input array
a = xr.DataArray(data, dims=['x', 'y'])
# start_index array
sindex = xr.DataArray(np.array([0, 0, 1, 1]), dims=['x'])
# end_index array
eindex = xr.DataArray(np.array([0, 1, 2, 2]), dims=['x'])
# empty array for saving summation
sum_a = xr.DataArray(np.empty((a.shape[0], 1)), dims=['x', 'y'])

for x in a.x:
    # sum values from sindex to eindex at row x
    sum_a[x] = a[x, sindex[x].values:eindex[x].values+1].sum()

print(sum_a)

Expected Output

<xarray.DataArray (x: 4, y: 1)>
array([[ 1.],
       [ 9.],
       [17.],
       [23.]])
Dimensions without coordinates: x, y

Problem Description

Is it necessary to use xr.apply_ufunc? or any other good method?

usage question

Most helpful comment

where is usually the solution for this kind of problem. I added the keepdims to keep the y dimension after the sum.

yindex = a.y.copy(data=np.arange(a.sizes["y"]))  # generate DataArray of indexes
a.where((yindex >= sindex) & (yindex <= eindex)).sum("y", keepdims=True)

Please close if this answers your question.

All 5 comments

Solution (Boolean)

# stack indexes
index_list = np.column_stack((sindex, eindex))

# all false array
boolean_array = np.zeros(a.shape, dtype=bool)

# iterate and assign true
for row in range(len(index_list)):
    boolean_array[row, np.arange(index_list[row][0], index_list[row][1]+1)] = True

sum_a = a.where(boolean_array).sum(dim='y')

where is usually the solution for this kind of problem. I added the keepdims to keep the y dimension after the sum.

yindex = a.y.copy(data=np.arange(a.sizes["y"]))  # generate DataArray of indexes
a.where((yindex >= sindex) & (yindex <= eindex)).sum("y", keepdims=True)

Please close if this answers your question.

@dcherian Excellent solution!

If we upgrade this to 3d array and sum by z axis, it seems that method isn't suitable:

import xarray as xr
import numpy as np

x = 2
y = 2
z = 3
data = np.arange(x*y*z).reshape(z, y, x)

# input array
a = xr.DataArray(data, dims=['z', 'y', 'x'])
# start_index array
sindex = xr.DataArray(np.full_like(a[0, ...], 0), dims=['y', 'x'])
# end_index array
eindex = xr.DataArray(np.full_like(a[0, ...], 1), dims=['y', 'x'])

why not?

@dcherian Sorry for the misunderstanding. I tried again for the 3d array, it works well ;)

import xarray as xr
import numpy as np

x = 2
y = 4
z = 3
data = np.arange(x*y*z).reshape(z, x, y)


# input array
a = xr.DataArray(data, dims=['z', 'y', 'x'])
# start_index array
sindex = xr.DataArray(np.full_like(a[0, ...], 0), dims=['y', 'x'])
# end_index array
eindex = xr.DataArray(np.full_like(a[0, ...], 1), dims=['y', 'x'])

zindex = a.z.copy(data=np.arange(a.sizes["z"]))

sub_z = (zindex >= sindex) & (zindex <= eindex)
sum_a = a.where(sub_z).sum('z', keepdims=True)

print(a)
print(sum_a)
<xarray.DataArray (z: 3, y: 2, x: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])
Dimensions without coordinates: z, y, x

<xarray.DataArray (z: 1, y: 2, x: 4)>
array([[[ 8., 10., 12., 14.],
        [16., 18., 20., 22.]]])
Dimensions without coordinates: z, y, x
Was this page helpful?
0 / 5 - 0 ratings