/kind bug
What steps did you take and what happened:
full Kubeflow deployment with version
wget https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize/v3.8.7/kustomize_v3.8.7_linux_amd64.tar.gz
tar xzf ./kustomize_v*_linux_amd64.tar.gz
mv kustomize ${KUSTOMIZE}
mkdir -p ${KUBEFLOW_MPI_DIR}
cd ${KUBEFLOW_MPI_DIR}
git clone ${KUBEFLOW_MPI_MANIFESTS_REPO}
cd manifests/mpi-job/mpi-operator
${KUSTOMIZE} build base | kubectl apply -f -
results in
W1127 15:51:45.096400 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+
fatal error: mlock failed
runtime stack:
runtime.throw(0x12223db, 0xc)
/usr/local/go/src/runtime/panic.go:1112 +0x72
runtime.mlockGsignal(0xc0003b4000)
/usr/local/go/src/runtime/os_linux_x86.go:72 +0x107
runtime.mpreinit(0xc000680000)
[... ommited]
What did you expect to happen:
Kubeflow runs successful
Anything else you would like to add:
Pods:
$ kubectl -n kubeflow get pods
NAME READY STATUS RESTARTS AGE
admission-webhook-bootstrap-stateful-set-0 1/1 Running 14 104m
admission-webhook-deployment-5fcc8b58dd-g7x2s 1/1 Running 0 4m26s
application-controller-stateful-set-0 1/1 Running 9 107m
argo-ui-684bcb587f-lhw6w 1/1 Running 9 104m
centraldashboard-7f4c448d-wff6r 1/1 Running 9 104m
jupyter-web-app-deployment-cdc856d5-99x5n 1/1 Running 9 104m
katib-controller-75c8d47f8c-zhvzv 1/1 Running 10 104m
katib-db-manager-6c88c68d79-nd2zf 1/1 Running 19 104m
katib-mysql-858f68f588-2f49z 1/1 Running 9 104m
katib-ui-68f59498d4-9749z 1/1 Running 9 104m
kfserving-controller-manager-0 2/2 Running 18 104m
metacontroller-0 1/1 Running 10 104m
metadata-db-57dbbcd9c9-58fvf 1/1 Running 8 104m
metadata-envoy-deployment-776686f9cd-rrfqb 1/1 Running 9 104m
metadata-grpc-deployment-7db798d964-b8d5j 1/1 Running 23 104m
minio-648f66c8f-4pzmz 1/1 Running 9 104m
ml-pipeline-5695d79dc5-qwfdz 1/1 Running 10 104m
ml-pipeline-persistenceagent-59965f7db7-jgprm 1/1 Running 13 104m
ml-pipeline-scheduledworkflow-5476d5cc5-wtlmr 1/1 Running 9 104m
ml-pipeline-ui-76df7bb8f6-x9ljs 1/1 Running 9 104m
ml-pipeline-viewer-controller-deployment-978d7b46c-btz9n 0/1 CrashLoopBackOff 19 73m
ml-pipeline-visualizationserver-7bb994d87b-v8tgt 1/1 Running 9 104m
mpi-operator-5559945c44-77mr6 1/1 Running 12 95m
mysql-8465c44858-7cx8t 1/1 Running 8 104m
notebook-controller-deployment-d56997676-b725v 1/1 Running 9 104m
profiles-deployment-5865c8d5ff-vfwss 2/2 Running 20 104m
pytorch-operator-b79799447-4fhkc 1/1 Running 13 104m
seldon-controller-manager-5fc5dfc86c-n5p4d 1/1 Running 11 104m
spark-operatorsparkoperator-67c6bc65fb-hd52d 1/1 Running 9 104m
spartakus-volunteer-6ddc7b6676-zx96m 1/1 Running 9 104m
tf-job-operator-5c97f4bf7-zxt6d 1/1 Running 13 104m
workflow-controller-5c7cc7976d-8djmf 1/1 Running 9 104m
Environment:
kfctl version): kfctl v1.1.0-0-g9a3621eminikube) vanila kubectl version):1.18.9/etc/os-release): ubuntu 20.04, 5.4.20@twittidai
The solution of https://github.com/NVIDIA/deepops/issues/771 also solved my problem.
Most helpful comment
@twittidai
The solution of https://github.com/NVIDIA/deepops/issues/771 also solved my problem.