The docs regarding the Spaceflights tutorial are incomplete, which makes it harder to successfully finish it.
This has been partially discussed in https://github.com/quantumblacklabs/kedro-examples/issues/58 (including the issue reproducibility). Therefore, I will discuss it in a complementary way.
There is apparently an ongoing internal issue about improving the organization and sync between the repos kedro-examples, kedro-training, and kedro-starter-spaceflights (https://github.com/quantumblacklabs/kedro-training/pull/1).
As I understood, spaceflight full-repo is moving from kedro-examples/kedro-tutorial to kedro-training/kedro/exercises/spaceflight.
I'm not sure about what is being tracked internally, so I will list what I've found related to Spaceflights' requirements.txt:
kedro[pandas.CSVDataSet,pandas.ExcelDataSet] in requirements. Still, it is not specified in the docs tutorial. This results in the missing requirement pandas and xlrd when trying to load the datasets in Set up the data.kedro-examples/kedro-tutorial is also missing this requirement in its src/requirements.txt, which yielded https://github.com/quantumblacklabs/kedro-examples/issues/58.kedro-training and kedro-starter-spaceflights already have this requirement in their src/requirements.txt.kedro-examples/kedro-tutorial's requirements.txt should be updated to contain kedro[pandas.CSVDataSet,pandas.ExcelDataSet].kedro[pandas.CSVDataSet,pandas.ExcelDataSet] is required. Set up the spaceflights project#Install project dependencies is probably the right place.Create a new project#kedro install.kedro-examples as the full source to the spaceflights project. I'm not sure, but I guess that this will/should be eventually changed to kedro-training/kedro-exercises/spaceflight.I could work on 1., 2., and 3. if it makes sense (note that they are at 3 different repos).
PS: Sorry for the cross-repos references everywhere. I considered that here was the best place to report it.
Thank you for reporting it! We will address your feedback in the docs and spaceflight example code.
Thanks for addressing it, @921kiyo.
If there is still time, I'd like to add three more notes regarding the docs and kedro-training:
scikit-learn dependency to src/requirements.txt and then runs kedro install. Shouldn't it be src/requirements.in + kedro build-deps && kedro install instead? kedro-training also uses scikit-learn, and specifies it in the full project requirements.txt, but there is no instruction stating that sklearn should be installed when following kedro-training docs.kedro_tutorial.io.xls_local.ExcelLocalDataSet (well, it is actually defined if you follow kedro-training):shuttles:
type: kedro_tutorial.io.xls_local.ExcelLocalDataSet
filepath: data/01_raw/shuttles.xlsx
layer: raw
Hey there @falcaopetri.
Thanks for reporting this. It is not too late at all!
I will add your additional comments to our ticket. In the meantime, if you'd like to fix the issues, feel free to make a PR and work on it 馃槃 We truly appreciate it!
Chat soon.
From @falcaopetri's comment:
Doc's Data science pipeline#Update dependencies adds the scikit-learn dependency to src/requirements.txt and then runs kedro install. Shouldn't it be src/requirements.in + kedro build-deps && kedro install instead?
I have the same question. Adding it to src/requirements.in makes more sense to me too.
@guludo Apologies, just noticed this has been merged into develop, which is not visible on ReadTheDocs. Indeed it should be in src/requirements.in, the next version will have the correct docs.
Closing this as resolved through linked PRs/issues, as well as https://github.com/quantumblacklabs/kedro/commit/589d6a7a329f453ac91662814ec044dc41d1063c and https://github.com/quantumblacklabs/kedro/commit/0fd6b623bdaa2f69623ad4fdacb022a5862fb590 . Please feel free to open a new issue if there are other observations!
Most helpful comment
Thanks for addressing it, @921kiyo.
If there is still time, I'd like to add three more notes regarding the
docsandkedro-training:scikit-learndependency tosrc/requirements.txtand then runskedro install. Shouldn't it besrc/requirements.in+kedro build-deps && kedro installinstead?kedro-trainingalso usesscikit-learn, and specifies it in the full projectrequirements.txt, but there is no instruction stating thatsklearnshould be installed when followingkedro-training docs.kedro_tutorial.io.xls_local.ExcelLocalDataSet(well, it is actually defined if you follow kedro-training):