Kubeflow: [jupyter] packages in python need to be reinstalled

Created on 19 Aug 2019  路  3Comments  路  Source: kubeflow/kubeflow

I have a question regarding the usage of jupyter in Kubeflow on Kubernetes.

When I use "pip install sklearn" through the terminal in Jupyter Notebook, the package is sucessfully installed. But when I restart the kubeflow, the installed package sklearn did not seem to be stored in kubeflow permanetly and I need to reinstall it again. Does anyone know how to solve this problem plz?

In addition, what are the paths of storing the jupyter notebook data, katib data and pipeline data respectively?

Cheers.

kinquestion

Most helpful comment

When you restart kubeflow, it will also restart your notebook server. The way notebook server works now, when you restart the notebook server, the state will also be restarted (installed library, changes to file system, etc). I think the work around is:

  1. use Workspace Volume when you create notebook server
  2. install using pip install --user. When you use --user, pip will install the library in your home directory which is persisted in Workspace Volume.

This should help to persist the state when notebook server restarted.

All 3 comments

Issue-Label Bot is automatically applying the label community/question to this issue, with a confidence of 0.78. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

Hi @AnnieWei58 - I suspect the issues you are encountering re: packages not being available after restart, are due to the packages being installed into the container environment. This environment doesn't persist between restarts.

One way you can address this is by building a custom container to launch from the Notebook interface. Albeit not totally comprehensive, the instructions here should get you started https://www.kubeflow.org/docs/notebooks/custom-notebook/

Typically, we create a custom image including most of the common tools our teams utilise, from there, they can further customise their individual environments if required. Often they will support this individual customisation by modifying their conda/pip environments to install to their user home directories, which you can configure to be backed by persistent storage. When backed by persistent storage, they will persist between restarts (as long as you re-attach the correct volume).

When you restart kubeflow, it will also restart your notebook server. The way notebook server works now, when you restart the notebook server, the state will also be restarted (installed library, changes to file system, etc). I think the work around is:

  1. use Workspace Volume when you create notebook server
  2. install using pip install --user. When you use --user, pip will install the library in your home directory which is persisted in Workspace Volume.

This should help to persist the state when notebook server restarted.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hougangliu picture hougangliu  路  3Comments

authenticfake picture authenticfake  路  4Comments

avdaredevil picture avdaredevil  路  4Comments

jal06 picture jal06  路  3Comments

arun-gupta picture arun-gupta  路  4Comments