Maybe we should rather define certain interfaces to existing packages that take care of reading datasets into common Python data structures. E.g. particularly suitable for reading a very diverse set of data in different formats is intake. Another interesting project focusing on satellite datasets is open data cube.
@hb326 thoughts? Comments?
@mattiarighi just answered this question offline to me, his answer is :
because we want to have a pool of observational data
That does not really answer the question, because you can also have a pool of observational data without reformatting it.
I think the real answer is probably more a perceived run-time advantage, reformatting takes some time, so if you have to do it every time you run a recipe, it could potentially be slower.
I'll take a step back and point to a few things:
Talking about intake, today's tech-talk by DKRZ on this subject might be of interest:
https://www.dkrz.de/up/de-news-and-events/de-tech-talks/de-dkrz-tech-talk-intake-taking-the-pain-out-of-data-access
It should be available on their youtube channel soon.