Xarray: Add writing complex data to docs

Created on 9 Sep 2019  ·  8Comments  ·  Source: pydata/xarray

Is there a recommended way how to save complex data? I found some option on stack overflow, but they don't seem to satisfactory.

The main point of having self-describing data which I write as binary data, is that people can just read the data, and don't have to worry how to interpret it. Thus, the only viable option to me would be using engine='h5netcdf'.

On the other hand, if something like adding an axis would be done internally by xarray it would be also OK, as everyone could read my data using the library.

documentation

Most helpful comment

It might make sense to implement engine=“hdf5” as an alias for engine=“h5netcdf” with invalid_netcdf=True. It would certainly be a more ergonomic API.

All 8 comments

My 2 cents:

For a proper resolution, I'd rather have the topic discussed with the NetCDF specs maintainers, so that NetCDF can just be expanded to support the same structure like HDF5. Once the format is standard, it would then be a trivial PR to h5netcdf to suppress the warning. We've already gone through the exact same process for compression algorithms other than gzip. Adding the functionality to the NetCDF C library _and the python wrapper_ would be a completely different order of magnitude of work.

Another good alternative is to use h5netcdf forcing the malformation through, and just call the file .h5 instead of .nc 😉

If you really need to interact with (non-Python) people that are stuck on the NetCDF C library, and who for reasons I can't imagine can't switch to the HDF5 C library, I think writing two bespoke pre/postprocess functions in your code to add a dimension is the best approach.

I agree that including it in NetCDF is the 'most sane' approach. I don't really know how much work it is, expanding the standard.

To be honest, I don't really care about NetCDF, for me xarray is just an incredible good way to make code more stable and readable (though it still has several usability issues). In my community everyone uses HDF5 anyway, so dropping compatibility is no big issue. I just want a way to persist data as it is and conveniently load it for plotting and post processing.

I would still encourage you to push saving of complex data. In most fields people use complex data and it is hard to convince them that they benefit from this great library, if saving simple data takes complicated keyword arguments and annoys you with warnings compared to a simple np.savez on regular ndarrays.

I think the answer here is to use h5netcdf until a proper hdf5 backend is created.

It would be nice to add this to the documentation and mention h5netcdf more generally under https://xarray.pydata.org/en/stable/io.html . @DerWeh Are you up for sending in a PR?

It might make sense to implement engine=“hdf5” as an alias for engine=“h5netcdf” with invalid_netcdf=True. It would certainly be a more ergonomic API.

I am in the exact same situation. @DerWeh with the current master you can do

da.to_netcdf("complex.nc", engine="h5netcdf", invalid_netcdf=True)

which works for me until there is engine="hdf5" or may be a method da.to_hdf()?

I opened an issue to discuss this in the CF convention issue tracker -- let's see what they think: https://github.com/cf-convention/cf-conventions/issues/204

Sorry for the slow response, I have little time at the moment. The option invalid_netcdf=True is not yet in the latest release, is it? I get an TypeError.
I would have to use a manually installed version of xarray to use it, right?

Yes, this will be in the next release. (Which will hopefully be very soon!)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zxdawn picture zxdawn  ·  3Comments

equaeghe picture equaeghe  ·  4Comments

benbovy picture benbovy  ·  3Comments

duncanwp picture duncanwp  ·  4Comments

ray306 picture ray306  ·  4Comments