use_cftime was recently added as an option to decode_cf and open_dataset to give users a little more control over how times are decoded (#2759). It would be good if it was also available for open_zarr. This perhaps doesn't have quite the importance, because open_zarr only works for single data stores, so there is no risk of decoding times to different types (e.g. as there was for open_mfdataset, #1263); however, it would still be nice to be able to silence serialization warnings that result from decoding times to cftime objects in some instances, e.g. #2754.
Hey, can I work on this issue? I'm still new to this so a little heads up would be very nice!
Absolutely - I think this should be as simple as adding a use_cftime argument to open_zarr, threading its value through as an argument to the call to conventions.decode_cf here:
https://github.com/pydata/xarray/blob/7b76f163394a35c9cd8013e835e9d0b2050fd9a6/xarray/backends/zarr.py#L542-L545
and adding a test to make sure it works.
For more general tips on contributing to xarray, see the contributing guide here.
Couldn't figure out how to implement the tests
@Geektrovert and @spencerkclark - any progress on this? It would be great to have it working in time for the CMIP6 hackathon in order to suppress all of the warning messages when opening zarr stores. For example, a zarr store using cftime.DatetimeGregorian (e.g., cftime.DatetimeGregorian(2349, 12, 16, 12, 0, 0, 0, 4, 350)) will always generate the warning:
/usr/local/python/anaconda3/envs/pangeoJun2019/lib/python3.6/site-packages/xarray/coding/times.py:465: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/usr/local/python/anaconda3/envs/pangeoJun2019/lib/python3.6/site-packages/xarray/coding/times.py:465: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/usr/local/python/anaconda3/envs/pangeoJun2019/lib/python3.6/site-packages/numpy/core/numeric.py:538: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
return array(a, dtype, copy=False, order=order)
For the CMIP6 collection, I always use use_cftime=True in open_mfdataset, but then when I make the zarr store and read it back in, it generates these warnings - always a distraction for anyone not used to dealing with this data
@naomi-henderson, I was working on this. But lately, I have been quite busy so I haven't implemented the tests yet. You can fix it if you wish to.
Adding tests to #3229 should be pretty straightforward. Great project for someone looking to learn a bit more about xarray internals.
@spencerkclark @dcherian for transparency I unpinned this to make room for the broken docs -> github link. Feel free to unpin the build warnings if this is still top-of-mind
Most helpful comment
@spencerkclark @dcherian for transparency I unpinned this to make room for the broken docs -> github link. Feel free to unpin the build warnings if this is still top-of-mind