This line in library code makes assumptions about the user code (i.e. that the directory src exists). This used to be in kedro_cli.py where all "frameworky" decisions are made, but now it is hidden (and undocumented), and the user is no longer able to change it.
Working on a project where src does not exist - I would like to modify the python path myself (or at least have control over what gets added or not)
I understand that this is probably there to make this work both in jupyter/ipython and when running cli commands - however assumptions about user code should not be in library code.
Am I correct that you have .kedro.yml file outside of src, or you don't want load_context to modify the Python path? It is true that load_context is based on the assumption that .kedro.yml file is under project_path/src/ (we used to rely on the location of kedro_cli.py before introducing .kedro.yml).
If you do not want load_context to modify the Python path but still want to instantiate KedroContext, you could directly instantiate the ProjectContext class.
You are technically correct, however that line is only used for updating the python path, the user can set the python path themselves and put the source wherever they want.
Indeed - any particular reason we do not want to surface this in kedro_cli.py? It's kind of hidden in load_context (and there is no mention of modifying the path in the docstring)
I am currently working around this like you mentioned but it is a bit ugly. In my case:
I am insert(0, path) in kedro_cli, but because load_context is called afterwards, my addition is pushed 2nd in the list, and I am getting the wrong object imported
project
|
----- src/file.py/MyObject
|----file.py/MyObject
@tsanikgr Thanks for raising this! We're going to put it on our next sprint to discuss this and get back to you with a proposed action.
For what it's worth, our project template does not include a src directory.
More very valid frustration from users on unintended side effects: https://github.com/quantumblacklabs/kedro/issues/161.
however assumptions about user code should not be in library code
It is true that there should be no assumptions about the user code in the library, but load_context is framework code and not library one. The library classes and code is meant to be as flexible as possible, so people can use it the way they see fit. The framework code on the other hand is meant to impose structure (that's what frameworks are for) and make the behaviour consistent between different entrypoints.
We always need to make assumptions when we want to give the user a pre-built structure to work with, and maybe provide some level of extensibility and configuration. At the moment load_context is doing all the assumptions (not just the src one, but also the .kedro.yml one), where the extensibility is provided by extending the KedroContext class in your project.
We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.
however assumptions about user code should not be in library code
It is true that there should be no assumptions about the user code in the library, but
load_contextis framework code and not library one. The library classes and code is meant to be as flexible as possible, so people can use it the way they see fit. The framework code on the other hand is meant to impose structure (that's what frameworks are for) and make the behaviour consistent between different entrypoints.We always need to make assumptions when we want to give the user a pre-built structure to work with, and maybe provide some level of extensibility and configuration. At the moment
load_contextis doing all the assumptions (not just thesrcone, but also the.kedro.ymlone), where the extensibility is provided by extending theKedroContextclass in your project.We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.
whatever ends up being polluted becomes framework code and hence the pollution is justified
This argument is a very unfair and incorrect interpretation of my comment. Framework code by definition is the code creating the structure of your application (hence the word framework). There is no intermingling of library code with framework code, no library components depend on the existence of a framework. You can find more information about the current architecture of Kedro here and how different components interact with each other.
People can still just pip install kedro and decide to not use the template and the CLI, I fail to see how the existance of KedroContext prevents that. We have provided the framework code by user demand too, most of our users don't want to deal with deciding how to structure their applications and are mostly busy with dealing with the business logic of their pipelines.
For more advanced users there's always the option of dropping the framework and creating their own structure (or framework), be it through custom project templates, custom context-like structures or whatever they see fit.
Redesigning the framework part of Kedro is needed due to our users identifying many additional usecases where they can use the framework to simplify their work. However we are still exploring what that restructuring should look like in order to be more extensible and able to fit the most common usecases without any unnecessary configuration. We will provide more details once we have the issue well documented.
We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.
First, thank you your work on Kedro!
We (with @bengpotter) have ran into the issue described here when trying to leverage Kedro on an existing codebase which did not follow the src convention but rather the packagename for source directory. We are appending to sys.path as a workaround to still leverage provided code for Context, but we were wondering if you have any update on the coming changes in framework?
@philippegr Thank you for raising this! As @idanov mentioned above, our architecture change in framework code is in our plan in the first quarter of this year, mainly improving the usability of KedroContext (and load_context etc). Hopefully we will be able to release it at the next major releases in near future :)
@philippegr My teams projects currently all run from a modified form of the kedro new template and do not use src or touch sys.path. We have a custom object that stores all of our kedro objects in one place along with run and several other methods (having everything wrapped in one object is quite handy).
In essence this is how we are loading context without touching the path.
import os
from pathlib import Path
from kedro.context import load_context
project_path = Path(__file__).parents[1].resolve() # needs to be the directory with your .kedro.yml file and conf
_working_dir = os.getcwd() # save current working directory before kedro moves it
context = load_context(project_path)
os.chdir(_working_dir) # move back to the current working directory
@921kiyo I am Excited to see the upcoming architecture changes this quarter! Thanks for all of the hard work.
We have made the source directory configurable with source_dir key in .kedor.yml in https://github.com/quantumblacklabs/kedro/commit/8de924819ae7e622a5a73b00f742e77efb936438 so you can customise it like
source_dir: .
Or
source_dir: src/nested/
if src doesn't suit your use case.
It will be released in the next release, so I'm closing this issue, but do let us know if you have any questions/comments. We are still continuing on the framework redesign, so more refactoring is coming soon :)
Thank you very much. Looking forward to this next release!
Most helpful comment
We have made the source directory configurable with
source_dirkey in.kedor.ymlin https://github.com/quantumblacklabs/kedro/commit/8de924819ae7e622a5a73b00f742e77efb936438 so you can customise it likeOr
if
srcdoesn't suit your use case.It will be released in the next release, so I'm closing this issue, but do let us know if you have any questions/comments. We are still continuing on the framework redesign, so more refactoring is coming soon :)