Pandas: ENH: Add `into` argument for `to_xarray` method?

Created on 13 Jul 2020 · 8Comments · Source: pandas-dev/pandas

Is your feature request related to a problem?

I would like to able to specify the object used for creating xarray objects.

Describe the solution you'd like

Right now, the only option is xr.Dataset but I tend to prefer xr.DataArray objects.

API breaking implications

It shouldn't affect the API in any complicated ways.

Describe alternatives you've considered

Create a xr.Dataset and then convert that to xr.DataArray. However, this step could be obviated by usage similar to to_dict method.

Additional context

import pandas as pd
import xarray as xr
df = pd.read_csv("https://pastebin.com/raw/dR59vTD4", sep="\t", index_col=0)
da = df.to_xarray(into=xr.DataArray)

Enhancement

Source

jolespin

All 8 comments

This would only work for homogenous dataframes, correct?

cc @shoyer and @max-sixty if either of you have thoughts.

TomAugspurger on 13 Jul 2020

By homogenous, do you mean all the same dtype? If so, I am not sure but for the data I've been working with lately they are all floats (nan too).

jolespin on 13 Jul 2020

Yeah, homogenous means all columns have the same dtype. A DataArray can only hold a single dtype, so if you have a mix of datatypes storing it in a DataArray wouldn't be an option (short of casting to a common dtype, like object).

TomAugspurger on 13 Jul 2020

What about a check like this?

if into == xr.DataArray:
    assert df.dtypes.nunique() == 1
   return xr.DataArray(df, ...

jolespin on 13 Jul 2020

You can always use xr.DataArray(df) or df.pipe(xr.DataArray) (if you want method chaining) for this sort of conversion. I don't see a strong need to make to_xarray() more complicated to support this.

shoyer on 14 Jul 2020

👍1

Thanks Stephan, I think that aligns with my thoughts.

Thanks for the proposal @jolespin, but we'll pass.

TomAugspurger on 14 Jul 2020

👍1

No worries, thanks for considering it. xr.DataArray(df) should do the trick.

Are xr.Datasets typically preferred over xr.DataArray objects even for homogenous data?

jolespin on 14 Jul 2020

Whether to prefer xr.Dataset or xr.DataArray really depends on your
use-case and what the data represents. But xr.Dataset is a better fit for
pd.DataFrame organized according to the principle of "tidy data", so that's
why it's the default for to_xarray().

On Tue, Jul 14, 2020 at 11:48 AM Josh L. Espinoza notifications@github.com
wrote:

No worries, thanks for considering it. xr.DataArray(df) should do the
trick.

Are xr.Datasets typically preferred over xr.DataArray objects even for
homogenous data?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/35265#issuecomment-658349937,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJJFVRP5KPVCJO7JK6GQA3R3SSAHANCNFSM4OYWS2LA
.