Hello and first of all, thank you for xarray.
With a colleague, we created pyomeca which is a Python library specialized in biomechanical data processing. Most of the data is multidimensional and so we have reimplemented by hand some functionality to access labels, etc. Except Xarray does it much better.
I am rewriting the entire library using xarray. motion reimplements the pyomeca functions with an accessor for xarray.
I have some questions about the architecture you recommend for a xarray-based library.
In pyomeca, I have three classes (RotoTrans, Markers3d and Analogs3d) with each having very specific functions.
I have two possible solutions to implement them with xarray: (1) write an accessor for each class or (2) inherit from DataArray, e.g:
import xarray
Rototrans class (xarray.DataArray):
pass
According to your documentation, the accesor is the recommended method. However, for most functions, I need a three-dimensional table with specific dimensions ([axis, channels, time_frame]) and the accessor does not allow to perform these checks when creating the DataArray. At the same time, the second solution forces me to inherit a basic function that allows me to access the functions common to all classes
import xarray
class BaseArray(xarray.DataArray):
def fct():
pass
class Rototrans(Base):
pass
Do you have any design recommendations?
I would not advise subclassing. In most cases it should work but it's thoroughly untested and I would be unsurprised if you were to find a use case that accidentally reverts your subclass to the base class.
I don't think I fully understand your problem with accessors. Is it because you would need a hook that is triggered by DataArray. __init__? Why can't you initialise whatever you need to upon first access?
Give a careful read to #3268 if statefulness (beyond what can be stored in xarray variables) is important for you.
Is it because you would need a hook that is triggered by DataArray. __init__
Yes, I need to test some properties when creating the DataArray such as shape and dimensions properties.
Why can't you initialise whatever you need to upon first access?
How do you do that?
What about simply wrapping xarray objects internally in your classes, along with utilities for converting to/from xarray objects? When users want to do domain specific stuff they can use your objects, and when they want to do something generic with xarray they can convert by calling a method like to_xarray.
+1 to shoyer - encapsulation is by far the easiest approach.
Why can't you initialise whatever you need to upon first access?
How do you do that?
@xarray.register_dataset_accessor('foo')
class FooAccessor:
def __init__(self, xarray_obj):
self.obj = xarray_obj
# <insert whatever health checks on xarray_obj>
self.x = something(xarray_obj)
@property
def bar(self):
# snip
def baz(self):
# snip
The accessor __init__ method will be invoked by xarray the first time you invoke array.foo.
Note, however, that it is not recommended to put expensive calculations in it, because the object will possibly be destroyed and reinitialised every time the array is transformed - or in other words, anything that is not a read-only method/property or a cell update.
I emphasized possibly because some transforms may not destroy and recreate your accessor instance, thus potentially causing it to be in a state that is incoherent with the attached xarray object. Every time you invoke a method, you should verify that whatever assumptions you relied upon to generate the state are still valid.
Ok thanks for the recommandations.
I'm thinking of using this kind of class to handle object creation:
import xarray as xr
import motion
class Analogs:
def __new__(cls, ...):
array = xr.DataArray(...)
cls.verify(array)
return array
@staticmethod
def verify(array):
print("verification...")
@classmethod
def from_csv(cls, filename:str, ...):
array = xr.DataArray(filename, ...)
cls.verify(array)
return array
Let's say I define another Markers class for another type of data.
Is it possible that my accessor exposes different functions according to class e, g:
@xr.register_dataarray_accessor("foo")
class DataArrayAccessor(object):
def __init__(self, xarray_obj: xr.DataArray):
self._obj = xarray_obj
def to_csv(self):
"""
This function is exposed to all objects
but behave differently if it is a Markers or an Analogs
"""
def marker_only_function(self):
"""
This function is only exposed to
array created with the Marker class
"""
def analog_only_function(self):
"""
This function is only exposed to
array created with the Analog class
"""
My ultimate goal is to have this kind of API:
analogs = (
Analogs.from_csv("filename.csv")
.foo.center() # foo accessor generic method
.foo.rectify() # foo accessor specific method for Analogs object
.mean() # xarray method
)
Most helpful comment
What about simply wrapping xarray objects internally in your classes, along with utilities for converting to/from xarray objects? When users want to do domain specific stuff they can use your objects, and when they want to do something generic with xarray they can convert by calling a method like
to_xarray.