@Cadair and I are from the solar and astrophysics communities, respectively (particularly SunPy and Astropy). In our fields, we have a concept of something called "World Coordinate Systems" (WCS) which basically are arbitrary mappings from pixel coordinates (which is often but not necessarily the same as the index) to physical coordinates. (For more on this and associated Python/Astropy APIs, see this document). If we are reading correctly, this concept maps roughly onto the xarray concept of "Non-dimension coordinates".
However, a critical difference is this: WCS are usually expressed as arbitrary complex mathematical functions, rather than coordinate arrays, as it is crucial to some of the science cases to carry sub-pixel or other coordinate-related metadata around along with the WCS.
So our question is: is it in-scope for xarray non-dimensional coordinates to be extended to be functional instead of having to be arrays? I.e., have the coordinate arrays be generated on-the-fly from some function instead of being realized as arrays at creation-time. We have thought about several ways this might be specified and are willing to do some trial implementations, but are first asking here if it is likely to be
Thanks!
Thanks for reaching out Erik! We’d love to find a way to better support Astro data in xarray. Before digging deeper, I just want to ask a clarification question. When you say “arbitrary complex mathematical functions”: what are the arguments / inputs to these functions?
Presumably they have to be evaluated at some point, ie for plotting. Can you describe what happens when the time comes to turn the arbitrary functions to actual numbers?
Your answer will help us respond more accurately to your question.
On Dec 13, 2019, at 3:51 PM, Erik Tollerud notifications@github.com wrote:

@Cadair and I are from the solar and astrophysics communities, respectively (particularly SunPy and Astropy). In our fields, we have a concept of something called "World Coordinate Systems" (WCS) which basically are arbitrary mappings from pixel coordinates (which is often but not necessarily the same as the index) to physical coordinates. (For more on this and associated Python/Astropy APIs, see this document). If we are reading correctly, this concept maps roughly onto the xarray concept of "Non-dimension coordinates".However, a critical difference is this: WCS are usually expressed as arbitrary complex mathematical functions, rather than coordinate arrays, as it is crucial to some of the science cases to carry sub-pixel or other coordinate-related metadata around along with the WCS.
So our question is: is it in-scope for xarray non-dimensional coordinates to be extended to be functional instead of having to be arrays? I.e., have the coordinate arrays be generated on-the-fly from some function instead of being realized as arrays at creation-time. We have thought about several ways this might be specified and are willing to do some trial implementations, but are first asking here if it is likely to be
Easy
Hard
Impossible
PR will immediately be rejected on philosophical grounds, regardless?
Thanks!—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
It would also be good to hear about "sub-pixel metadata" → this seems to be the main reason why you want to carry around the analytic rather than the discrete evaluated form (which we basically support through dask). Is that right or am I missing something?
Thanks for the quick responses @rabernat and @dcherian!
When you say “arbitrary complex mathematical functions”: what are the arguments / inputs to these functions?
The short answer is: pixels. Which in astro is sometimes the same as "array index", but usually off-by-0.5 (i.e., the center of the pixel), and occasionally off-by-1 (for files generated by e.g. FORTRAN or other 1-based indexing schemes...).
The longer answer is: it depends on which direction you mean. Most WCS' are by design invertible, so you can go world-to-pixel or pixel-to-world. In the former case that means there's a bunch of possibly inputs - examples include wavelength (for a spectrum), energy (for a photon-counting experiment), or altitude and azimuth (celestial coordinates that are similar to lat/lon but on the celestial sphere instead of a reference geoid).
Additionally, the WCSs have parameters, which are usually not thought of as "inputs" to the WCS, but rather internal state or metadata (usually they are stored in file headers and interpreted by software).
If you want to see some concrete examples, you might check out Astropy's docs for our WCS interface, or the design document for the most recent iteration of that interface. But those might be impenetrable for a non-astronomer, so I'm open to more follow-ups!
It would also be good to hear about "sub-pixel metadata"
Sorry I wasn't clear there - I meant there are cases where the sub-pixel structure of the coordinates matter - e.g. telescopes often have pretty intense optical distortion, and we sometimes interpolate in such a way that the sub-pixel information in the WCS is critical to getting the interpolation right. And by "metadata" I meant both abstract parameters that encode that transformation (polynominal coefficients, information about which sky-projection is used, etc), and more physical metadata like "the temperature of the observatory, which slightly changes the shape of the distortion".
which we basically support through dask
Oh, tell me more about that - maybe this is the key for our needs?
I should have said "discrete lazily evaluated form (which we support through dask)". I think we already have what you want in principle (caveats at the end).
Here's an example:
import dask
import numpy as np
import xarray as xr
xr.set_options(display_style="html")
def arbitrary_function(dataset):
return dataset["a"] * dataset["wavelength"] * dataset.attrs["wcs_param"]
ds = xr.Dataset()
# construct a dask array.
# In practice this could represent an on-disk dataset,
# with data reads only occurring when necessary
ds["a"] = xr.DataArray(dask.array.ones((10,)), dims=["wavelength"], coords={"wavelength": np.arange(10)})
# some coordinate system parameter
ds.attrs["wcs_param"] = 1.0
# complicated pixel to world function
# no compute happens since we are working with dask arrays
# so this is quite cheap.
ds.coords["azimuth"] = arbitrary_function(ds)
ds

So you can carry around your coordinate system parameters in the .attrs dictionary and the non-dimensional coordinate azimuth is only evaluated when needed e.g. when plotting
# Both 'a' and 'azimuth' are computed now, since actual values are required to plot
ds.a.plot(x="azimuth")

In practice, there are a few limitations. @djhoese and @snowman2 may have useful perspective here.
Additional info:
PS: If it helps, I'd be happy to chat over skype for a half hour getting you oriented with how we do things.
For reference, here is how CRS information is handled in rioxarray: CRS management docs.
For reference, here is how CRS information is handled in rioxarray: CRS management docs.
Nice! I didn't know you had documented this.
Sorry this is going to get long. I'd like to describe the CRS stuff we deal with and the lessons learned and the decisions I've been fighting with in the geoxarray project (https://github.com/geoxarray/geoxarray). I'm looking at this from a meteorological satellite point of view. @snowman2 please correct me if I'm wrong about anything.
pyproj encapsulates our version of these "complex mathematical functions", although ours seem much simpler. The CRS object can hold things like the model of the Earth to use and other parameters defining the coordinate system like the reference longitude/latitude..attrs. The problem with using .attrs for this is most operations on the DataArray object will make this information disappear (ex. adding two DataArrays)..coords). I figured this would be good because then if you tried to combine two DataArrays on different CRSes, xarray would fail. Turns out xarray will just ignore the difference and drop the crs coordinate so that was no longer my "perfect" option. Additionally, to access the crs object would have to be accessed by doing my_data_arr.coords['crs'].item() because xarray stores the object as a scalar array.As for your use case(s), I'm wondering if an xarray accessor could work around some of the current limitations you're seeing. They could basically be set up like @dcherian described, but "arbitrary_function" could be accessed through x, y, z = my_data_arr.astro.world_coordinates(subpixels=4) or something. You could do my_data_arr.astro.wcs_parameters to get a dictionary of common WCS parameters stored in .attrs. The point being that the accessor could simplify the interface to doing these calculations and accessing these parameters (stored in .coords and .attrs) and maybe make changes to xarray core unnecessary.
I'm wondering if an xarray accessor could work around some of the current limitations you're seeing.
I think this sounds like a good idea.