Kedro: [KED-926] Anaconda Intake Integration

Created on 20 Jun 2019 · 8Comments · Source: quantumblacklabs/kedro

The intake project, led by Anaconda, provides rich functionality for data catalogs. Consider using that instead of a homebrew approach.
https://www.anaconda.com/intake-taking-the-pain-out-of-data-access/

Feature Request Sprint Activity Discussion Opportunity Roadmap good first issue

Source

jamesmyatt

👍5

Most helpful comment

I have only just now become aware of this issue. Please let me know what you need from Intake to ease its adoption, if you still think it's a good idea. Note that it's probably easy to use your existing prescriptions, but create an Intake Catalog from them. As discussed in the linked issue above, the most immediate advantage might be hooking into fsspec for loading from various storage backends (not that you necessarily need Intake to do this).

(Also, Intake does write, but only in one specific data format for each "container", e.g., parquet for dataframe-like datasets https://intake.readthedocs.io/en/latest/persisting.html#export )

(EDIT: I am the maintainer of Intake, in case that wasn't obvious :) )

martindurant on 28 Feb 2020

🎉2

All 8 comments

Hi @jamesmyatt! Thank you so much for submitting this feature request. We've been checking out Intake since we released. Have you had any experience using it? What makes it great to use?

yetudada on 25 Jun 2019

I haven't used it in anger, but I've been following it loosely since Anaconda announced it.

It looks like it has a significant overlap with your catalog and it makes sense to avoid re-inventing the wheel. It also has easy integration with other frameworks like Dask.

jamesmyatt on 25 Jun 2019

🎉1

Hi @jamesmyatt, thanks for your suggestion. intake indeed looks very promising and we have it under our radar!

The biggest difference is that intake is for reading data only - the data catalog allows specifying both read & write datasets.

It should be fairly easy creating an IntakeDataSet - I believe integrating the 2 here might be the best approach.
A more involved contribution might be populating a kedro.io.DataCatalog from an intake catalog.

We would love contributions in this space if that is of interest to you! Please let us know if you plan on working on something so that we avoid duplication of work :)

Thank you again and welcome to our community!

tsanikgr on 25 Jun 2019

👍2

I've updated the title with our internal ticket number to keep track of this more easily. :)

lorenabalan on 6 Aug 2019

(Also, Intake does write, but only in one specific data format for each "container", e.g., parquet for dataframe-like datasets https://intake.readthedocs.io/en/latest/persisting.html#export )

(EDIT: I am the maintainer of Intake, in case that wasn't obvious :) )

martindurant on 28 Feb 2020

🎉2

re: your point about using fsspec, that's exactly what we did in our latest release (without using Intake) and it's awesome, thanks for your work on it! 🎉

ghost on 28 Feb 2020

Glad to hear it!

martindurant on 28 Feb 2020

@martindurant Glad to see you in this thread. We've been thinking internally how we can integrate best with intake and since we've been focusing mainly on other things recently, we haven't progressed much on the ideas front. We're really open to ideas how we can leverage intake beyond fsspec, which we found very useful indeed - great work!