Kubeflow: runtime: mlock of signal stack failed: 12 in ml-pipeline-viewer-controller-deployment

Created on 27 Nov 2020  路  1Comment  路  Source: kubeflow/kubeflow

/kind bug

What steps did you take and what happened:
full Kubeflow deployment with version

wget https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize/v3.8.7/kustomize_v3.8.7_linux_amd64.tar.gz
  tar xzf ./kustomize_v*_linux_amd64.tar.gz
  mv kustomize ${KUSTOMIZE}

  mkdir -p ${KUBEFLOW_MPI_DIR}
  cd ${KUBEFLOW_MPI_DIR}
  git clone ${KUBEFLOW_MPI_MANIFESTS_REPO}
  cd manifests/mpi-job/mpi-operator
  ${KUSTOMIZE} build base | kubectl apply -f -

results in

W1127 15:51:45.096400       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the                                                                                                                             inClusterConfig.  This might not work.
runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+
fatal error: mlock failed

runtime stack:
runtime.throw(0x12223db, 0xc)
        /usr/local/go/src/runtime/panic.go:1112 +0x72
runtime.mlockGsignal(0xc0003b4000)
        /usr/local/go/src/runtime/os_linux_x86.go:72 +0x107
runtime.mpreinit(0xc000680000)
[... ommited]

What did you expect to happen:
Kubeflow runs successful

Anything else you would like to add:
Pods:

$ kubectl -n kubeflow get pods
NAME                                                       READY   STATUS             RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0                 1/1     Running            14         104m
admission-webhook-deployment-5fcc8b58dd-g7x2s              1/1     Running            0          4m26s
application-controller-stateful-set-0                      1/1     Running            9          107m
argo-ui-684bcb587f-lhw6w                                   1/1     Running            9          104m
centraldashboard-7f4c448d-wff6r                            1/1     Running            9          104m
jupyter-web-app-deployment-cdc856d5-99x5n                  1/1     Running            9          104m
katib-controller-75c8d47f8c-zhvzv                          1/1     Running            10         104m
katib-db-manager-6c88c68d79-nd2zf                          1/1     Running            19         104m
katib-mysql-858f68f588-2f49z                               1/1     Running            9          104m
katib-ui-68f59498d4-9749z                                  1/1     Running            9          104m
kfserving-controller-manager-0                             2/2     Running            18         104m
metacontroller-0                                           1/1     Running            10         104m
metadata-db-57dbbcd9c9-58fvf                               1/1     Running            8          104m
metadata-envoy-deployment-776686f9cd-rrfqb                 1/1     Running            9          104m
metadata-grpc-deployment-7db798d964-b8d5j                  1/1     Running            23         104m
minio-648f66c8f-4pzmz                                      1/1     Running            9          104m
ml-pipeline-5695d79dc5-qwfdz                               1/1     Running            10         104m
ml-pipeline-persistenceagent-59965f7db7-jgprm              1/1     Running            13         104m
ml-pipeline-scheduledworkflow-5476d5cc5-wtlmr              1/1     Running            9          104m
ml-pipeline-ui-76df7bb8f6-x9ljs                            1/1     Running            9          104m
ml-pipeline-viewer-controller-deployment-978d7b46c-btz9n   0/1     CrashLoopBackOff   19         73m
ml-pipeline-visualizationserver-7bb994d87b-v8tgt           1/1     Running            9          104m
mpi-operator-5559945c44-77mr6                              1/1     Running            12         95m
mysql-8465c44858-7cx8t                                     1/1     Running            8          104m
notebook-controller-deployment-d56997676-b725v             1/1     Running            9          104m
profiles-deployment-5865c8d5ff-vfwss                       2/2     Running            20         104m
pytorch-operator-b79799447-4fhkc                           1/1     Running            13         104m
seldon-controller-manager-5fc5dfc86c-n5p4d                 1/1     Running            11         104m
spark-operatorsparkoperator-67c6bc65fb-hd52d               1/1     Running            9          104m
spartakus-volunteer-6ddc7b6676-zx96m                       1/1     Running            9          104m
tf-job-operator-5c97f4bf7-zxt6d                            1/1     Running            13         104m
workflow-controller-5c7cc7976d-8djmf                       1/1     Running            9          104m

Environment:

  • Kubeflow version: v1beta1
  • kfctl version: (use kfctl version): kfctl v1.1.0-0-g9a3621e
  • Kubernetes platform: (e.g. minikube) vanila
  • Kubernetes version: (use kubectl version):1.18.9
  • OS (e.g. from /etc/os-release): ubuntu 20.04, 5.4.20
kinbug

Most helpful comment

@twittidai
The solution of https://github.com/NVIDIA/deepops/issues/771 also solved my problem.

>All comments

@twittidai
The solution of https://github.com/NVIDIA/deepops/issues/771 also solved my problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hougangliu picture hougangliu  路  3Comments

avdaredevil picture avdaredevil  路  4Comments

arun-gupta picture arun-gupta  路  4Comments

AnnieWei58 picture AnnieWei58  路  3Comments

sh0gg0th picture sh0gg0th  路  3Comments