Sorry if this is a basic question. As far as I can tell, if mask_and_scale=True, Xarray unpacks stored values from files using the formula:
x_unpacked = x_stored*scale_factor+add_offset
But some datasets also use another formula(for example MODIS datafiles):
x_unpacked = (x_stored-add_offset)*scale_factor
Is there any convenient way to force the alternative formula while using xr.open_dataset? Right now I am setting the mask and scale as false and then manually doing the scaling and offseting for each relevant data-array. Is there a more elegant solution to this?
@nbCloud91 What do you mean by manually doing the scaling?
It seems that for your case it would be enough to fix the add_offset and let xarray decode the dataset after that.
what I do now is xr.open_dataset(filename,mask_and_scale=False). Then I get all the variables as some kind of int8 or int16 and then use the 2nd formula mentioned above for each data-array, since add_offset attributes are different for different variables. I did not get what you meant by "fix the add_offset", you don't mean editing all the datafiles?
@nbCloud91 I would do the same as you opening ds with mask_and_scale=False. But then I would change add_offset=-add_offset*scale_factor and call xr.decode_cf(ds).
Ah thank you very much @kmuehlbauer . I do realise this is not a proper forum for asking these questions, stackoverflow is suggested, but there seems to be very less activity over there related to questions on xarray. Closing this issue.
@nbCloud91 Can you point me to the relevant SO-question? And did my sparse answer help you in any way?
Yes your sparse answer did help. But to apply the fix to all add_offset for the different DataArrays within the DataSet I have to loop through them. So what I am doing to 'open' a file is.
ds = xr.open_dataset('XYZ.hdf.hdfeos',mask_and_scale=False, engine='pynio')
for dataarray in ds.data_vars:
if hasattr(getattr(ds, dataarray),'add_offset'):
fix_offset=-getattr(getattr(ds, dataarray),'add_offset')*(getattr(getattr(ds,dataarray),'scale_factor'))
getattr(ds,dataarray).attrs['add_offset']=fix_offset
ds = xr.decode_cf(ds)
@nbCloud91 Nice! That's what I had mind. The loop should not be that much of an issue. Thanks for coming back and thanks for the code snippet. Hope those who come here find it useful.
Edited the snippet (correction). xr.decode_cf(ds) to ds=xr.decode_cf(ds)