I always use Pandas to deal with my neuroscience data (multi-dimension). It is annoying to stack and unstack all the time and I heard Xarray is designed for multi-dimension data.
In neuroscience research, we usually have multiple participants and we will test them different times, which means the data may look like this:
(100 and 101 are the testing times)
But Dataset doesn't support to have 2*5*100 DataArray and 2*5*101 DataArray together. Is there any solution to deal with that kind of data in Xarray?
this ultimately depends on how the last dimension of A and B are related (or rather, how you want to model the relationship). If they are not related at all, simply use different dimension names:
In [2]: da1 = xr.DataArray(np.empty(shape=(2, 5, 100)), dims=("x", "y", "z1"))
...: da2 = xr.DataArray(np.empty(shape=(2, 5, 101)), dims=("x", "y", "z2"))
...: ds = xr.Dataset({"a": da1, "b": da2})
...: ds
Out[2]:
<xarray.Dataset>
Dimensions: (x: 2, y: 5, z1: 100, z2: 101)
Dimensions without coordinates: x, y, z1, z2
Data variables:
a (x, y, z1) float64 6.901e-310 6.901e-310 4.67e-310 ... 0.0 0.0 0.0
b (x, y, z2) float64 6.901e-310 6.901e-310 4.67e-310 ... 0.0 0.0 0.0
If they are related, assign coordinates to the dimensions:
In [3]: da1 = xr.DataArray(
...: np.empty(shape=(2, 5, 100)),
...: dims=("x", "y", "z"),
...: coords={"z": np.arange(100)},
...: )
...: da2 = xr.DataArray(
...: np.empty(shape=(2, 5, 101)),
...: dims=("x", "y", "z"),
...: coords={"z": np.arange(101)},
...: )
...: ds = xr.Dataset({"a": da1, "b": da2})
...: ds
Out[3]:
<xarray.Dataset>
Dimensions: (x: 2, y: 5, z: 101)
Coordinates:
* z (z) int64 0 1 2 3 4 5 6 7 8 9 10 ... 91 92 93 94 95 96 97 98 99 100
Dimensions without coordinates: x, y
Data variables:
a (x, y, z) float64 6.901e-310 6.901e-310 ... 6.917e-323 nan
b (x, y, z) float64 6.901e-310 6.901e-310 ... 6.901e-310 -6.35e+53
In this case, A does not have the label z=100, so it is treated as missing (you should be familiar with the concept of "missing values" since you know pandas).
I dont try it, but i know your problem.
If you try to create from dataarray df.to_dataset(name='participant_A')
df.to_dataset(name='participant_B')
and after merge them?
xr.merge([ds1, ds2], compat='no_conflicts')
http://xarray.pydata.org/en/stable/combining.html
In potter case you could create nan values to create the same dimensions.
But i have never tried. I found another solution for my data, but it was my
alternative.
El dom., 19 abr. 2020 20:57, (Ray) Jinbiao Yang notifications@github.com
escribió:
I always use Pandas to deal with my neuroscience data (multi-dimension).
It is annoying to stack and unstack all the time and I heard Xarray is
designed for multi-dimension data.In neuroscience research, we usually have multiple participants and we
will test them different times, which means the data may look like this:
- participant A:
- 25100 matrix
- participant B:
- 25101 matrix
(100 and 101 are the testing times)
But Dataset doesn't support to have 25100 DataArray and 25101
DataArray together. Is there any solution to deal with that kind of data
in Xarray?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/issues/3984, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIGDFO4X4KQA5WPOVUEQQVLRNNCRJANCNFSM4ML467MA
.
@keewis your answer (and a clarification that we can't do real "ragged" arrays) would make a useful cookbook or StackOverflow answer, since I suspect a lot of people have this question.
Your both methods worked! Thank you!
Most helpful comment
this ultimately depends on how the last dimension of
AandBare related (or rather, how you want to model the relationship). If they are not related at all, simply use different dimension names:If they are related, assign coordinates to the dimensions:
In this case,
Adoes not have the labelz=100, so it is treated as missing (you should be familiar with the concept of "missing values" since you knowpandas).