Xarray: Append along an unlimited dimension to an existing netCDF file

Created on 30 Oct 2017 · 8Comments · Source: pydata/xarray

This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps:
https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks

It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to to_netcdf(), e.g., extend='time' to indicate the extended dimension.

API design backends help wanted

Source

shoyer

👍20

All 8 comments

Any updates on this?

Hoeze on 24 May 2018

None that I'm aware of. I think this issue is still in the "help wanted" stage.

jhamman on 24 May 2018

👍1

I would love to have this capability. As @shoyer mentioned, for adding time steps of any sort to existing netcdf files would be really beneficial. The only real alternative is to save a netcdf file for each additional time step...even if there are tons of time steps and each file is a couple hundred KBs (which is my situation with NASA data).

I'll look into it if I get some time...

mullenkamp on 29 Dec 2018

👍1

This would be extremely helpful for our modelling of time varying renewable energy.

thomas-fred on 11 Sep 2019

I think I got a basic prototype working.

That said, I think a real challenge lies in supporting the numerous backends and lazy arrays.

For example, I was only able to add data in peculiar fashions using the netcdf4 library which may trigger complex computations many times.

Is this a use case that we must optimize for now?

hmaarrfk on 1 Sep 2020

Small prototype, but maybe it can help boost the development.

import netCDF4


def _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size):
    # For time deltas, we must ensure that we use the same encoding as
    # what was previously stored.
    # We likely need to do this as well for variables that had custom
    # econdings too
    if hasattr(nc_variable, 'calendar'):

        data.encoding = {
            'units': nc_variable.units,
            'calendar': nc_variable.calendar,
        }
    data_encoded = xr.conventions.encode_cf_variable(data) # , name=name)
    left_slices = data.dims.index(expanding_dim)
    right_slices = data.ndim - left_slices - 1
    nc_slice   = (slice(None),) * left_slices + (slice(nc_shape, nc_shape + added_size),) + (slice(None),) * (right_slices)
    nc_variable[nc_slice] = data_encoded.data

def append_to_netcdf(filename, ds_to_append, unlimited_dims):
    if isinstance(unlimited_dims, str):
        unlimited_dims = [unlimited_dims]

    if len(unlimited_dims) != 1:
        # TODO: change this so it can support multiple expanding dims
        raise ValueError(
            "We only support one unlimited dim for now, "
            f"got {len(unlimited_dims)}.")

    unlimited_dims = list(set(unlimited_dims))
    expanding_dim = unlimited_dims[0]

    with netCDF4.Dataset(filename, mode='a') as nc:
        nc_dims = set(nc.dimensions.keys())

        nc_coord = nc[expanding_dim]
        nc_shape = len(nc_coord)

        added_size = len(ds_to_append[expanding_dim])
        variables, attrs = xr.conventions.encode_dataset_coordinates(ds_to_append)

        for name, data in variables.items():
            if expanding_dim not in data.dims:
                # Nothing to do, data assumed to the identical
                continue

            nc_variable = nc[name]
            _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size)

from xarray.tests.test_dataset import create_append_test_data
from xarray.testing import assert_equal
ds, ds_to_append, ds_with_new_var = create_append_test_data()

filename = 'test_dataset.nc'
ds.to_netcdf(filename, mode='w', unlimited_dims=['time'])
append_to_netcdf('test_dataset.nc', ds_to_append, unlimited_dims='time')

loaded = xr.load_dataset('test_dataset.nc')
assert_equal(xr.concat([ds, ds_to_append], dim="time"), loaded)

hmaarrfk on 2 Sep 2020

hi - i consider this extremely useful!!!

is your prototype already part of some library (or should we expect it in xr?)

many thanks for the code

espiritocz on 27 Nov 2020

It isn't really part of any library. I don't really have plans of making it into a public library. I think the discussion is really around the xarray API, and what functions to implement at first.

Then somebody can take the code and integrate it into the decided upon API.

hmaarrfk on 29 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings