Xarray: Creation of an empty DataArray

Created on 10 Nov 2014  路  11Comments  路  Source: pydata/xarray

I'd like to create an empty DataArray, i.e., one with only NA values. The docstring of DataArray says that data=None is allowed, if a _dataset_ argument is provided. However, the docstring doesn't say anything about a _dataset_ argument.

  1. I think there's a bug in the docstring
  2. I'd like to pass _data=None_ and get a DataArray with the coords/dims set up properly (as defined by the coords and dims kwargs), but with a values array of NA.
enhancement

Most helpful comment

It might also make sense then to implement all numpy-like constructors for DataArray, plus the empty(), which is typically faster for larger arrays:

  • .full() (kind of what's suggested here)
  • .ones()
  • .zeros()
  • .empty()

This should be trivial to implement.

All 11 comments

This is definitely a case where the documentation has gotten out of sync with the implementation (I used to abuse a dataset argument in DataArray.__init__ for a fastpath constructor, but now I have another method for that). The data argument is not really optional right now (unless you want a scalar DataArray containing the value None).

Your proposed functionality does sound useful. Do you want to take a crack at the implementation? The init logic for DataArray is pretty self contained, and it's all at the top of xray/core/dataarray.py. It will require a small amount of reorganization because _infer_coords_and_dims currently assumes it already knows the shape of the data.

Fixed the doc string for now, but this would still be a nice feature to add at some point.

Given the dask integration, being able to initialize DataArrays that are chunked would be very helpful. I want to map from an old x-y-z grid to a new one, and theoretically it could be too memory intensive to keep the new grid in memory, so it would be nice to initialize an empty one and then fill it.

Given the dask integration, being able to initialize DataArrays that are chunked would be very helpful. I want to map from an old x-y-z grid to a new one, and theoretically it could be too memory intensive to keep the new grid in memory, so it would be nice to initialize an empty one and then fill it.

Dask Array recently added support for modifying arrays in place but generally this isn't recommended -- you want to create new arrays, e.g., with dask.array.atop or map_blocks.

OK, great! I figured it out. Something like the below works; @rabernat had pointed to a similar solution, but I didn't quite understand what dask.array.map_blocks was doing.

import xmitgcm
import xarray as xr
data = xmitgcm.open_mdsdataset(dirname='./',prefix={'T'},iters=12600,read_grid=True,geometry='cartesian',endian='<',
                               chunks={'Z':1,'time':1})

def interpolateAtDepth(T,x0,y0,x,y):
    import scipy.interpolate
    if np.shape(T)[-1]>1:
        xout=np.zeros((1,1,ny,nx))   
        fit=scipy.interpolate.RectBivariateSpline(x0,y0,T[0,0,:,:].T)
        xout = fit(x,y).T
    else:
        xout=np.ones((1,1,1,1))
    return xout

# x, y, nx, ny are determined elsewhere, but set the new grid...
tm = data['T'].data.map_blocks(interpolateAtDepth,data['XC'].values,data['YC'].values,x,y,chunks=(1,1,ny,nx))

In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree

This seems too fundamental a feature to close unresolved. I am sure others will encounter the same need and will create duplicate issues.

Hi, I am also looking for a solution to create an "empty" xarray (filled with some default value, say, 0 or NaN) whose size gets automatically determined by its coordinates (which are passed to the DataSet constructor as a dict). Has there been any progress since the original post by andreas-h?

This hasn't been implemented yet (but would still be welcome!)

It might also make sense then to implement all numpy-like constructors for DataArray, plus the empty(), which is typically faster for larger arrays:

  • .full() (kind of what's suggested here)
  • .ones()
  • .zeros()
  • .empty()

This should be trivial to implement.

Should have been closed by #3159

Was this page helpful?
0 / 5 - 0 ratings