I'm trying to use xarray as the underlying container for some data processing tasks. Part of the pipeline includes processing from non-standard/easily readable formats (e.g. ROS messages) to standard formats, e.g. netCDF(4). The data I tend to be working on is time series data that is structured, which maps pretty well to structured numpy arrays using dtype manipulations. And xarray lightly wraps numpy, and provides netCDF as a backend. However, the xarray implementation doesn't really expose this capability, supported in netCDF as 'compound data types', and in fact it fails when you try and write such a DataArray/Dataset to file (at _nc4_values_and_dtype).
So the question is, is this a reasonable feature/expectation from xarray (and thus you're receptive to contributions), or is this outside the goal/purpose (I should roll my own/use pandas/etc)?
It is a little challenging to make structured arrays work with all of xarray's computational tools. For example, we don't have a good way to handle missing values.
Also, in my experience, non-structured arrays are a nicer to work with in most cases, and a tool like xarray makes it pretty easy to unpack non-structured arrays into multiple arrays in a Dataset, possibly with different dimensions.
That said, we've added some work arounds in the past to ensure that structured arrays work in xarray, and I would be happy to accept contributions to write them to netCDF files. I'm sure there are others who would also find this useful.
I'd also like to see better support for compound types, writing them for starters. I'll collect some information here:
In the code @tfurf linked to (_nc4_values_and_dtype), an elif needs to be added to catch structured dtypes. I think they have kind == 'V'.
dtype.builtin can be used to detect whether we are indeed dealing with a structured type. Namely dtype.builtin must be 0.
The structured type must fist be added to the netCDF4.Dataset using its method createCompoundType. This must be done recursively, with the deepest levels first.
The netCDF variable is created in prepare_variable, which calls _nc4_values_and_dtype. There, via self.ds we also have access to the netCDF4 Dataset to be used for the creation of the as mentioned above. However, is self.ds really the Dataset, or some NetCDF4.Group? In any case _nc4_values_and_dtype and its use in prepare_variable needs to be refactored, because we need access to the underlying netCDF4 Dataset.
Is there anything I've missed? Can someone shed light on whether self.ds in prepare_variable can be assumed to the underlying netCDF4 Dataset?
I just got bit with this as well. I was basically using tuples of indices as coordinates in order to implement a multidimensional sparse array .
My workaround is to use plain dimension index_dim to index the points in the N-dimensional space that I actually populate, and to have several coordinates (say X,Y) that all have index_dim as their only dimension. It's easy enough to see what the coordinates are once you select a value along index_dim, but I have to go outside xarray to locate a populated point based on it's X,Y-coordinates, because I can't slice along those arrays as (A) they aren't aliased to a dimension (B) they have non-unique values.
I've come up with an ugly method for selecting by tuples of X,Y-coordinates:
pairs = zip(x_wanted,y_wanted)
pair2index = {(dataset.x[i].item(), dataset.y[i].item()):i for i in dataset.index_dim.data}
try:
found_indices = [pair2index[p] for p in pairs]
found = dataset.isel(index_dim=found_indices)
except KeyError:
print "Coordinate {} not found in dataset.".format(p)
raise
This is an ancient issue, but still - wondering if anyone here managed to hack together some workarounds?
Most helpful comment
It is a little challenging to make structured arrays work with all of xarray's computational tools. For example, we don't have a good way to handle missing values.
Also, in my experience, non-structured arrays are a nicer to work with in most cases, and a tool like xarray makes it pretty easy to unpack non-structured arrays into multiple arrays in a
Dataset, possibly with different dimensions.That said, we've added some work arounds in the past to ensure that structured arrays work in xarray, and I would be happy to accept contributions to write them to netCDF files. I'm sure there are others who would also find this useful.