This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps:
https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks
It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to to_netcdf(), e.g., extend='time' to indicate the extended dimension.
Any updates on this?
None that I'm aware of. I think this issue is still in the "help wanted" stage.
I would love to have this capability. As @shoyer mentioned, for adding time steps of any sort to existing netcdf files would be really beneficial. The only real alternative is to save a netcdf file for each additional time step...even if there are tons of time steps and each file is a couple hundred KBs (which is my situation with NASA data).
I'll look into it if I get some time...
This would be extremely helpful for our modelling of time varying renewable energy.
I think I got a basic prototype working.
That said, I think a real challenge lies in supporting the numerous backends and lazy arrays.
For example, I was only able to add data in peculiar fashions using the netcdf4 library which may trigger complex computations many times.
Is this a use case that we must optimize for now?
Small prototype, but maybe it can help boost the development.
import netCDF4
def _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size):
# For time deltas, we must ensure that we use the same encoding as
# what was previously stored.
# We likely need to do this as well for variables that had custom
# econdings too
if hasattr(nc_variable, 'calendar'):
data.encoding = {
'units': nc_variable.units,
'calendar': nc_variable.calendar,
}
data_encoded = xr.conventions.encode_cf_variable(data) # , name=name)
left_slices = data.dims.index(expanding_dim)
right_slices = data.ndim - left_slices - 1
nc_slice = (slice(None),) * left_slices + (slice(nc_shape, nc_shape + added_size),) + (slice(None),) * (right_slices)
nc_variable[nc_slice] = data_encoded.data
def append_to_netcdf(filename, ds_to_append, unlimited_dims):
if isinstance(unlimited_dims, str):
unlimited_dims = [unlimited_dims]
if len(unlimited_dims) != 1:
# TODO: change this so it can support multiple expanding dims
raise ValueError(
"We only support one unlimited dim for now, "
f"got {len(unlimited_dims)}.")
unlimited_dims = list(set(unlimited_dims))
expanding_dim = unlimited_dims[0]
with netCDF4.Dataset(filename, mode='a') as nc:
nc_dims = set(nc.dimensions.keys())
nc_coord = nc[expanding_dim]
nc_shape = len(nc_coord)
added_size = len(ds_to_append[expanding_dim])
variables, attrs = xr.conventions.encode_dataset_coordinates(ds_to_append)
for name, data in variables.items():
if expanding_dim not in data.dims:
# Nothing to do, data assumed to the identical
continue
nc_variable = nc[name]
_expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size)
from xarray.tests.test_dataset import create_append_test_data
from xarray.testing import assert_equal
ds, ds_to_append, ds_with_new_var = create_append_test_data()
filename = 'test_dataset.nc'
ds.to_netcdf(filename, mode='w', unlimited_dims=['time'])
append_to_netcdf('test_dataset.nc', ds_to_append, unlimited_dims='time')
loaded = xr.load_dataset('test_dataset.nc')
assert_equal(xr.concat([ds, ds_to_append], dim="time"), loaded)
hi - i consider this extremely useful!!!
is your prototype already part of some library (or should we expect it in xr?)
many thanks for the code
It isn't really part of any library. I don't really have plans of making it into a public library. I think the discussion is really around the xarray API, and what functions to implement at first.
Then somebody can take the code and integrate it into the decided upon API.