ds1 = df.set_index(['lat','lon']).stack()
ds1.index.names = ['lat', 'lon', 'time']
ds1 = ds1.sort_index()
ds1.columns = ['T']
xr.Dataset(ds1)
I tried to transform a dataset with 2D latitude and longitude into Xarray dataset, however I failed to do so, because ram error occurred during process.
I also tried to set lat and lon as coordination directly, however it is complex to plotting and conducting geographic manipulation in the following work. This dataset is a non-rectangular area, lat and lon can not be replaced by the corner values.
In all, I hope this data can be transformed into xarray and resampled into traditional rectangle data, which can be easily dealt with.
Any codes and suggestions are sincerely welcomed.
Please could you fill out the issue template, including a reproducible example? A CSV could be OK if you include the reproduction steps.
Please could you fill out the issue template, including a reproducible example? A CSV could be OK if you include the reproduction steps.
Thank you, updated.
thanks, that helps. First of all (unless I did something wrong with the read_csv call), there's a Unnamed: 0 column that has to be removed.
Other than that, your data seems to be quite sparse so that's an ideal fit for sparse:
In [38]: %%time
...: df = pd.read_csv("/tmp/data.csv")
...: a = df.drop("Unnamed: 0", axis=1).set_index(["lat", "lon"])
...: a = a.stack()
...: a.index.names = ["lat", "lon", "time"]
...: a = a.sort_index()
...: a.name = "T"
...: xr.DataArray.from_series(a, sparse=True)
...:
...:
CPU times: user 606 ms, sys: 63.9 ms, total: 670 ms
Wall time: 670 ms
Out[38]:
<xarray.DataArray 'T' (lat: 16100, lon: 29959, time: 31)>
<COO: shape=(16100, 29959, 31), dtype=float64, nnz=1003191, fill_value=nan>
Coordinates:
* lat (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1
* lon (lon) float64 96.46 96.46 96.46 96.47 ... 102.6 102.6 102.6 102.6
* time (time) object '2011-01-01 00:00:00' ... '2011-01-31 00:00:00'
thanks, that helps. First of all (unless I did something wrong with the
read_csvcall), there's aUnnamed: 0column that has to be removed.Other than that, your data seems to be quite sparse so that's an ideal fit for
sparse:In [38]: %%time ...: df = pd.read_csv("/tmp/data.csv") ...: a = df.drop("Unnamed: 0", axis=1).set_index(["lat", "lon"]) ...: a = a.stack() ...: a.index.names = ["lat", "lon", "time"] ...: a = a.sort_index() ...: a.name = "T" ...: xr.DataArray.from_series(a, sparse=True) ...: ...: CPU times: user 606 ms, sys: 63.9 ms, total: 670 ms Wall time: 670 ms Out[38]: <xarray.DataArray 'T' (lat: 16100, lon: 29959, time: 31)> <COO: shape=(16100, 29959, 31), dtype=float64, nnz=1003191, fill_value=nan> Coordinates: * lat (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1 * lon (lon) float64 96.46 96.46 96.46 96.47 ... 102.6 102.6 102.6 102.6 * time (time) object '2011-01-01 00:00:00' ... '2011-01-31 00:00:00'
Thanks for your codes! I noticed that only two decimal numbers are kept for lat and lon. Does it mean a resample process happened? The data is a grid with 0.005 degree resolution, can I keep the resolution in the results?
that's only the short repr, the values are not modified:
In [5]: da.lat
Out[5]:
<xarray.DataArray 'lat' (lat: 16100)>
array([37.49944, 37.5004 , 37.50135, ..., 43.1014 , 43.10143, 43.10144])
Coordinates:
* lat (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1
that's only the short repr, the values are not modified:
In [5]: da.lat Out[5]: <xarray.DataArray 'lat' (lat: 16100)> array([37.49944, 37.5004 , 37.50135, ..., 43.1014 , 43.10143, 43.10144]) Coordinates: * lat (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1
Thanks for help锛両 found sparse grids are not easy to plot, so I changed my code like Colab code, which is similar with the 'rasm' example in xr. Maybe you can show how to create this example datasets (more than the toy weather) in tutorial, which would be helpful.
Most helpful comment
thanks, that helps. First of all (unless I did something wrong with the
read_csvcall), there's aUnnamed: 0column that has to be removed.Other than that, your data seems to be quite sparse so that's an ideal fit for
sparse: