I have a question regarding the usage of jupyter in Kubeflow on Kubernetes.
When I use "pip install sklearn" through the terminal in Jupyter Notebook, the package is sucessfully installed. But when I restart the kubeflow, the installed package sklearn did not seem to be stored in kubeflow permanetly and I need to reinstall it again. Does anyone know how to solve this problem plz?
In addition, what are the paths of storing the jupyter notebook data, katib data and pipeline data respectively?
Cheers.
Issue-Label Bot is automatically applying the label community/question to this issue, with a confidence of 0.78. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
Hi @AnnieWei58 - I suspect the issues you are encountering re: packages not being available after restart, are due to the packages being installed into the container environment. This environment doesn't persist between restarts.
One way you can address this is by building a custom container to launch from the Notebook interface. Albeit not totally comprehensive, the instructions here should get you started https://www.kubeflow.org/docs/notebooks/custom-notebook/
Typically, we create a custom image including most of the common tools our teams utilise, from there, they can further customise their individual environments if required. Often they will support this individual customisation by modifying their conda/pip environments to install to their user home directories, which you can configure to be backed by persistent storage. When backed by persistent storage, they will persist between restarts (as long as you re-attach the correct volume).
When you restart kubeflow, it will also restart your notebook server. The way notebook server works now, when you restart the notebook server, the state will also be restarted (installed library, changes to file system, etc). I think the work around is:
Workspace Volume when you create notebook serverpip install --user. When you use --user, pip will install the library in your home directory which is persisted in Workspace Volume. This should help to persist the state when notebook server restarted.
Most helpful comment
When you restart kubeflow, it will also restart your notebook server. The way notebook server works now, when you restart the notebook server, the state will also be restarted (installed library, changes to file system, etc). I think the work around is:
Workspace Volumewhen you create notebook serverpip install --user. When you use--user, pip will install the library in your home directory which is persisted inWorkspace Volume.This should help to persist the state when notebook server restarted.