Add datatable to Kedro's extras datasets
datatable is derived from R's data.table. As of 20/09/2020 (Version 0.11.0), datatable is fully supported on Windows, Linux and MacOS and can be installed through pip. Offering datable in Kedro's extras dataset could offer many advantages in building pipelines:
@lucasjamar Thanks for the suggestion. I've logged the request for it internally, however no timeline/priority can be provided at this time. If this is something of high importance for you and you have an implementation in mind, please feel free to go ahead and contribute this feature to Kedro 馃檪
I'm interested to try to implement it
@DmitriiDeriabinQB thanks and sorry for not getting back earlier. I have already developed a version of the CSV dataset and Excel dataset that I use in my current kedro project as an extras dataset. I haven't had more time to spend to try implementing it because of some issues . @mlisovyi , would you be interested in my current code for these two datasets or would you like to start from scratch? this zip is the folder that I currently place in the folder src/
datatable.zip
Hi @lucasjamar . Thanks for the code! What were the challenges that you have encountered? Was there anything that would prevent you from pushing it "as is"?
Hi @mlisovyi,
CSVdataset appears to work fine and I think it could go to straight to testing (let me know if you spot any issues). I just havent been able to set up the kedro environment on my Windows. I haven't been able to make Exceldataset work however (both xlsx and xls). I think its an issue with the engine but I havent found any excel engine options in fread:

https://datatable.readthedocs.io/en/latest/api/dt/fread.html
https://github.com/h2oai/datatable/blob/140a3abdae94e77badf608936b7bdbd5b091d6e5/src/core/read/py_fread.cc#L239-L286
@lucasjamar Thanks for feedback. Yes, i had the same problem. At the end i decided to narrow down to CSV only. What you think?
In the meantime I've been thinking if one in general want to have the actual reader using datatable. If the purpose is to benefit from faster reading, then it's unavoidable. If the purpose is to make use of the R-user-friendly API for tables, then maybe it makes more sense to have wrappers around pandas dataset classes to transform pd.DataFrame into dataframe.Frame (if it is possible to do it efficiently) before/after writing/reading.
@mlisovyi . Yep I think its best to stick to just CSV in a first part. The main advantage in my opinion with datatable is that it is really fast at reading CSVs and it has the append parameter for writing. This is quite practical for data collection and concatenation tasks. Coming from R, I do not believe that someone familiar with data.table will have any more trouble with the pandas syntax vs. the datatable syntax. Plus, datatable is still in beta phase. In my opinion, it lacks a few key features to be a true equivalent to pandas. Most notably, it does not have a date-time type yet
Most helpful comment
@DmitriiDeriabinQB thanks and sorry for not getting back earlier. I have already developed a version of the CSV dataset and Excel dataset that I use in my current kedro project as an extras dataset. I haven't had more time to spend to try implementing it because of some issues . @mlisovyi , would you be interested in my current code for these two datasets or would you like to start from scratch? this zip is the folder that I currently place in the folder src//extras/dataset. The code is 95% derived from the pandas equivalents. The current implementation of CSVdataset works fine for me but I still havent used the Exceldataset.
datatable.zip