Pandas: ENH: Add `into` argument for `to_xarray` method?

Created on 13 Jul 2020  Â·  8Comments  Â·  Source: pandas-dev/pandas

Is your feature request related to a problem?

I would like to able to specify the object used for creating xarray objects.

Describe the solution you'd like

Right now, the only option is xr.Dataset but I tend to prefer xr.DataArray objects.

API breaking implications

It shouldn't affect the API in any complicated ways.

Describe alternatives you've considered

Create a xr.Dataset and then convert that to xr.DataArray. However, this step could be obviated by usage similar to to_dict method.

Additional context

import pandas as pd
import xarray as xr
df = pd.read_csv("https://pastebin.com/raw/dR59vTD4", sep="\t", index_col=0)
da = df.to_xarray(into=xr.DataArray)
Enhancement

All 8 comments

This would only work for homogenous dataframes, correct?

cc @shoyer and @max-sixty if either of you have thoughts.

By homogenous, do you mean all the same dtype? If so, I am not sure but for the data I've been working with lately they are all floats (nan too).

Yeah, homogenous means all columns have the same dtype. A DataArray can only hold a single dtype, so if you have a mix of datatypes storing it in a DataArray wouldn't be an option (short of casting to a common dtype, like object).

What about a check like this?

if into == xr.DataArray:
    assert df.dtypes.nunique() == 1
   return xr.DataArray(df, ...

You can always use xr.DataArray(df) or df.pipe(xr.DataArray) (if you want method chaining) for this sort of conversion. I don't see a strong need to make to_xarray() more complicated to support this.

Thanks Stephan, I think that aligns with my thoughts.

Thanks for the proposal @jolespin, but we'll pass.

No worries, thanks for considering it. xr.DataArray(df) should do the trick.

Are xr.Datasets typically preferred over xr.DataArray objects even for homogenous data?

Whether to prefer xr.Dataset or xr.DataArray really depends on your
use-case and what the data represents. But xr.Dataset is a better fit for
pd.DataFrame organized according to the principle of "tidy data", so that's
why it's the default for to_xarray().

On Tue, Jul 14, 2020 at 11:48 AM Josh L. Espinoza notifications@github.com
wrote:

No worries, thanks for considering it. xr.DataArray(df) should do the
trick.

Are xr.Datasets typically preferred over xr.DataArray objects even for
homogenous data?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/35265#issuecomment-658349937,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJJFVRP5KPVCJO7JK6GQA3R3SSAHANCNFSM4OYWS2LA
.

Was this page helpful?
0 / 5 - 0 ratings