Kedro: Supporting conda install in kedro install command

Created on 17 Jun 2019  路  14Comments  路  Source: quantumblacklabs/kedro

Description

Frequently in data science projects we need to use conda install to handle packages that require configuration / building steps after installation.

Context

I am currently doing a data science project in Kedro which uses deep learning. In this situation using conda install is important as it allows me to handle CUDA/cuDNN dependencies within the ecosystem.

This enhancement fits naturally as most Kedro installations use conda to create virtual environments anyway.

Possible Implementation

If someone creates a file called requirements_conda.txt in the src folder the following code should work to install via pip and conda.

@cli.command()
def install():
    """Install project dependencies from requirements.txt."""
    python_call("pip", ["install", "-U", "-r", "src/requirements.txt"])
    os.system('conda install --file src/requirements_conda.txt --yes')

I'd be happy to contribute the code myself, but I was wondering if this is something that you all envisioned supporting?

Feature Request Help Wanted good first issue

Most helpful comment

Hi,
Not sure this issue has been resolved. I have seen the change that has been made. But I still get the following error when I run kedro install with an environment.yml file:

Screenshot 2020-04-22 at 16 40 54

And this is in alignment with https://github.com/conda/conda/issues/6827#issuecomment-365614464

I believe instead of running conda install.. it needs to change to conda env create --file envname.yml.

All 14 comments

HI @evanmiller29, thank you for submitting this PR! We think this is a great idea. @LorenaBalanQB will be able to guide you through this PR but we are excited to have you work on it.

Great. Happy to help out. Just a word of warning though, it's my first contribution to an open source project so it'll be a bit noob. @LorenaBalanQB would it be preferable to have requirements_conda as a txt file or a yml file (more consistent with conda install)

Hey @evanmiller29! That's exciting! We're honoured to be the first project on your journey. :)
To answer your question, whatever feels more natural for conda users - in this case yml file seems to be the standard, so I'd be inclined to go for that option.

@LorenaBalanQB cheers! yeah it's gonna be an adventure. I'm trying to run the make commands (make lint, make pytest) but I can't get them working on a windows system. Are there any any FAQs I could read to get this working?

make is a unix-like system specialty, systems which are much friendlier for development. If there's no easy way for you to switch from Windows to Linux or Mac, I reckon this would be a useful thread, specifically this answer.

Thanks for your help @LorenaBalanQB. I actually went down the docker route so anybody can run the kedro tests on their home machine regardless on their OS. Here's the repo if you're interested: https://github.com/evanmiller29/kedro_testing_docker

Maybe we can follow the convention of it being a environment.yml. I do think it should check that conda is installed and maybe that such a file exists. Maybe for now if should be kedro install --conda to make things a bit more explicit.

@cli.command()
@click.option("--conda")
def install():
    """Install project dependencies from requirements.txt."""
     if conda:
        python_call("conda", ["install", "-y", "--file", "src/environment.yml"])

    python_call("pip", ["install", "-U", "-r", "src/requirements.txt"])

@marcusinthesky - yep that sounds good. I agree that environment.yml would more suit the usual conda use case.

Just to clarify what I need to go through:

  • Add the option --conda and document in the files suggested by @LorenaBalanQB
  • Not having environment.yml will be one of the main errors caused by using this so I should try and see if it exists prior to run. I feel like this is a good point to build some tests (one with environment.yml, one without etc). Do you guys agree?
  • Another area in which there might be errors could be people using the same formatting between requirements.txt (pip) and requirements.yml (conda) installs. A link / small section would illustrate this pretty well. Would this be in/out of scope for the documentation?
  • Not having environment.yml will be one of the main errors caused by using this so I should try and see if it exists prior to run. I feel like this is a good point to build some tests (one with environment.yml, one without etc). Do you guys agree?

That's a good point, might be worth having some e2e tests to capture this. A good place to start would be install.feature in the features directory.

  • Another area in which there might be errors could be people using the same formatting between requirements.txt (pip) and requirements.yml (conda) installs. A link / small section would illustrate this pretty well. Would this be in/out of scope for the documentation?

You could have some commented out code inside the template yml file, which can serve as an example.

Hi,
Not sure this issue has been resolved. I have seen the change that has been made. But I still get the following error when I run kedro install with an environment.yml file:

Screenshot 2020-04-22 at 16 40 54

And this is in alignment with https://github.com/conda/conda/issues/6827#issuecomment-365614464

I believe instead of running conda install.. it needs to change to conda env create --file envname.yml.

I have the same issue as @Sadatay7

I can work on this. It may be better to use conda env update --file src/environment.yml --prune instead because this command can create a new environment and update an existing environment.

Following up on this issue (I am aware that the changes are already incorporated in master):
I used the kedro install version that uses conda env update --file src/environment.yml --prune
After this step, I got the message about activating the newly created environment. After this, however, kedro goes through the process of further processing the dependencies mentioned in requirements.txt.
Is this the intended behaviour?
As far as I understand those packages will anyway not be installed in the env created through the environment.yml file.
For my purposes, I already included all the packages from requirements.txt in environment.yml beforehand.
Deleting or renaming the requirements.txt file only leads to kedro complaining that the file can't be found.

I am just wondering whether it would be better that kedro install stops after the new env is created from environment.yml and not process requirements.txt too? and perhaps also activate the newly created environment in the same step.

I am happy to make the relevant changes and raise a PR.

In my experience the most reliable means by which to export and update Anaconda dependencies from file is with conda list --explicit --export > src/environment.yml. This creates a file like this:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.conda
https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-mkl.conda
https://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2020.7.22-0.conda

This is not the most human-readable set of dependencies but makes for an easy and reliable install using conda install -y --file=src/environment.yml.

Was this page helpful?
0 / 5 - 0 ratings