I'm deploying a Grafana chart but when it starts, I get an error telling the database is locked.
I have a StorageClass with NFS (nfs-client provisioner) and many pods running on it (Gitlab, Influx, Jenkins).
helm install --name grafana --set server.serviceType=NodePort stable/grafana
When I create the chart without Persistent Storage, it works.
Here are some logs:
[root@kube-master-1 influx-grafana]# kubectl describe storageclass default
Name: default
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: fuseim.pri/ifs
Parameters: <none>
Events: <none>
[root@kube-master-1 influx-grafana]#
[root@kube-master-1 influx-grafana]# kubectl describe pv default-grafana-grafana-pvc-ed1094b0-3752-11e7-97a1-fa163e5e86fb
Name: default-grafana-grafana-pvc-ed1094b0-3752-11e7-97a1-fa163e5e86fb
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by=fuseim.pri/ifs
StorageClass: default
Status: Bound
Claim: default/grafana-grafana
Reclaim Policy: Delete
Access Modes: RWO
Capacity: 1Gi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.178.11.188
Path: /mnt/nfs/nfs/kubernetes/default-grafana-grafana-pvc-ed1094b0-3752-11e7-97a1-fa163e5e86fb
ReadOnly: false
Events: <none>
[root@kube-master-1 influx-grafana]#
[root@kube-master-1 influx-grafana]# kubectl describe pvc
concourse-work-dir-concourse-worker-0 gitlab-gitlab-ce-etc gitlab-redis jenkins-claim
gitlab-gitlab-ce-data gitlab-postgresql grafana-grafana
[root@kube-master-1 influx-grafana]# kubectl describe pvc grafana-grafana
Name: grafana-grafana
Namespace: default
StorageClass: default
Status: Bound
Volume: default-grafana-grafana-pvc-ed1094b0-3752-11e7-97a1-fa163e5e86fb
Labels: app=grafana-grafana
chart=grafana-0.3.5
component=grafana
heritage=Tiller
release=grafana
Annotations: control-plane.alpha.kubernetes.io/leader={"holderIdentity":"d1ee140c-35c4-11e7-975e-bef9eb92e4d5","leaseDurationSeconds":15,"acquireTime":"2017-05-12T20:38:19Z","renewTime":"2017-05-12T20:38:21Z","lea...
pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
volume.alpha.kubernetes.io/storage-class=default
volume.beta.kubernetes.io/storage-provisioner=fuseim.pri/ifs
Capacity: 1Gi
Access Modes: RWO
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 3 persistentvolume-controller Normal ProvisioningIgnoreAlpha both "volume.alpha.kubernetes.io/storage-class" annotation and storageClassName are present, using storageClassName
1m 1m 3 persistentvolume-controller Normal ExternalProvisioning cannot find provisioner "fuseim.pri/ifs", expecting that a volume for the claim is provisioned either manually or via external software
1m 1m 1 fuseim.pri/ifs nfs-client-provisioner-2294243218-8bg2d d1ee140c-35c4-11e7-975e-bef9eb92e4d5 Normal Provisioning External provisioner is provisioning volume for claim "default/grafana-grafana"
1m 1m 1 fuseim.pri/ifs nfs-client-provisioner-2294243218-8bg2d d1ee140c-35c4-11e7-975e-bef9eb92e4d5 Normal ProvisioningSucceeded Successfully provisioned volume default-grafana-grafana-pvc-ed1094b0-3752-11e7-97a1-fa163e5e86fb
Pod log messages:
Pod: grafana-grafana-1742814040-znz1d
t=2017-05-12T20:42:29+0000 lvl=info msg="Starting Grafana" logger=main version=4.3.0-beta1 commit=3a89272 compiled=2017-05-12T09:45:26+0000
t=2017-05-12T20:42:29+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2017-05-12T20:42:29+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.log.mode=console"
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=admin"
t=2017-05-12T20:42:29+0000 lvl=info msg="Config overriden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
t=2017-05-12T20:42:29+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2017-05-12T20:42:29+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana/data
t=2017-05-12T20:42:29+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2017-05-12T20:42:29+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2017-05-12T20:42:29+0000 lvl=info msg="Initializing DB" logger=sqlstore dbtype=sqlite3
t=2017-05-12T20:42:29+0000 lvl=info msg="Starting DB migration" logger=migrator
t=2017-05-12T20:42:34+0000 lvl=eror msg="Fail to initialize orm engine" logger=sqlstore error="Sqlstore::Migration failed err: database is locked\n"
[root@kube-master-1 influx-grafana]#
+1
Have you confirmed write access to the mounted PVs from within the container?
Yes, I've been able to kubectl exec into the container while starting and echo'ed into a file in the same dir the error was shown. The file was created correctly.
[root@kube-master-1 ~]# kubectl exec -it grafana-test-grafana-700401875-mssh7 /bin/bash
root@grafana-test-grafana-700401875-mssh7:/# echo "TEST" > /var/lib/grafana/data/TESTFILE
[root@kube-master-1 default-grafana-test-grafana-pvc-cb09eca0-3fe0-11e7-83e9-fa163e5e86fb]# ls -tlr
total 4
-rw-r--r--+ 1 104 107 0 May 23 2017 grafana.db
-rw-rw-rw-+ 1 root 107 5 May 23 2017 TESTFILE
Any progress on this?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
I'm not able to retest this right now but will be in a couple of weeks when my Kubernetes lab will be up.
/remove-lifecycle rotten
@carlosedp:
Not sure if this is related or not to the issue you found, but I had the following error when trying to deploy the latest Grafana chart changes (0.8.3) using a persistent volume. The job that configured the persistent volume was throwing this error when server.persistentVolume.enabled: true:
curl: (6) Could not resolve host: grafana-monitoring-grafana
I solved this by making the change in this commit to stable/grafana/templates/job.yaml, which basically is a change from using
...{{ template "grafana.fullname" . }}...
to
...{{ template "grafana.server.fullname" . }}...
Having the same issue trying to persist grafana data to Azure file as a PV mount
t=2018-04-10T02:01:31+0000 lvl=eror msg="Fail to initialize orm engine" logger=sqlstore error="Sqlstore::Migration failed err: database is locked\n"
Mounting to standard grafana storage folder:
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var/lib/grafana
The PVC mount works if I mount to another folder like /data . I suspect its a permission issue. See below that a volume mount from PV comes as root only
Example if I mount to /data
root@grafana-692657060-00vwv:/# ls -l /data
total 0
-rwxrwxrwx 1 root root 0 Apr 10 00:47 grafana.db
Might try something like this init container hack as grafana runs as unprivileged user called "grafana" and I suspect the volume is coming in with only root access ?
initContainers:
#ref: https://github.com/kubernetes/kubernetes/issues/2630#issuecomment-375504696
- name: volume-mount-hack
image: alpine:3.7
command:
- sh
- -c
- 'chmod -R a+rwx /var/lib/grafana'
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var/lib/grafana
securityContext:
runAsUser: 0
The above seems messy, but will post a workaround if I find one.
Happy with any suggestions too.
@marcel-dempers did that workaround work for you? I also faced this when trying to use Azure file PV.
@htuomola Unfortunately not. I have a suspicion its because of Azure File's storage type and Grafana's data store does not like it. I have no proof of this though, but had a similar issue when I was trying to persist a PostgreSQL data to volume in Docker (Was using Docker for Windows at that time). Kinda like storing linux stuff on NTFS volume, there were issues writing to disk.
As a workaround for this whole issue I decided on a basic tier Azure PostgreSQL PaaS offering so I run that outside of Kubernetes. Doing it this way, my Grafana pods become stateless and the data is persisted off cluster.
Might be an option for you to get unblocked
@marcel-dempers alright, thanks for the quick reply. Might be that grafana doesn't work with the CIFS mounts what azure files are, AFAIK. I already a bit earlier found that Prometheus doesn't either. I'm using regular disks instead now.
@htuomola yeah my guess is that its something along those lines.. A bit off topic : just remember if you are using Azure Managed Disk, it can only be mount to one node at a time. Not sure what will happen if the pod moves to another node (node failure) or starts somewhere else (grafana crashes). The PV might not mount to the new server as it can only mount to one. OR it might take time for unmountremount to happen. I might be wrong, but might be worth testing out a few failure scenarios to see how stable it is :) Here is a doco (check out that note about limitation) https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv
Maybe related, just tried to install on my cluster (local cluster, using rook/ceph for the persistent volumes). And I'm getting the following error:
t=2018-04-20T19:22:27+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index star.user_id_dashboard_id"
t=2018-04-20T19:22:29+0000 lvl=info msg="Executing migration" logger=migrator id="create org table v1"
t=2018-04-20T19:22:30+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_org_name - v1"
t=2018-04-20T19:22:31+0000 lvl=info msg="Executing migration" logger=migrator id="create org_user table v1"
t=2018-04-20T19:22:32+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_org_user_org_id - v1"
t=2018-04-20T19:22:33+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_org_user_org_id_user_id - v1"
t=2018-04-20T19:22:35+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account to org"
t=2018-04-20T19:22:35+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account to org"
t=2018-04-20T19:22:35+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account_user to org_user"
t=2018-04-20T19:22:35+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account_user to org_user"
t=2018-04-20T19:22:35+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table account"
t=2018-04-20T19:22:37+0000 lvl=info msg="Shutdown started" logger=server code=0 reason="system signal: terminated"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xef1f56]
goroutine 21 [running]:
github.com/grafana/grafana/pkg/api.(*HttpServer).Shutdown(0x0, 0x1f87fe0, 0xc420245c00, 0xc420354400, 0x4)
/go/src/github.com/grafana/grafana/pkg/api/http_server.go:97 +0x26
main.(*GrafanaServerImpl).Shutdown(0xc420245c80, 0x0, 0xc4203c0520, 0x19)
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/server.go:137 +0x19f
main.listenToSystemSignals(0xc420245c80, 0xc420128660)
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/main.go:114 +0x374
created by main.main
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/main.go:85 +0x2a3
I used
helm install --name grafana-stable stable/grafana --set persistence.enabled=true,persistence.size=10Gi,persistence.accessModes=[ReadWriteOnce]
To install and I see the volume was created and if I look in the volume I see the following structure
So it seems to be able to write something to the volume before it fails
I'm having the same issue when using grafana with glusterfs:
t=2018-05-22T08:12:41+0000 lvl=info msg="Executing migration" logger=migrator id="drop login_attempt_tmp_qwerty"
t=2018-05-22T08:12:42+0000 lvl=info msg="Executing migration" logger=migrator id="create user auth table"
t=2018-05-22T08:12:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_user_auth_auth_module_auth_id - v1"
t=2018-05-22T08:12:43+0000 lvl=info msg="Executing migration" logger=migrator id="alter user_auth.auth_id to length 190"
panic: runtime error: invalid memory address or nil pointer dereference
t=2018-05-22T08:12:44+0000 lvl=info msg="Shutdown started" logger=server code=0 reason="system signal: terminated"
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xe590f6]
goroutine 11 [running]:
github.com/grafana/grafana/pkg/api.(*HTTPServer).Shutdown(0x0, 0x15183c0, 0xc420208c00, 0xc4207d9b00, 0x4)
/go/src/github.com/grafana/grafana/pkg/api/http_server.go:100 +0x26
main.(*GrafanaServerImpl).Shutdown(0xc420208d00, 0x0, 0xc420777c40, 0x19)
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/server.go:137 +0x196
main.listenToSystemSignals(0xc420208d00, 0xc42003e240)
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/main.go:113 +0x346
created by main.main
/go/src/github.com/grafana/grafana/pkg/cmd/grafana-server/main.go:84 +0x28b
+1 .
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
I have also had this issue starting with a recent upgrade to the latest grafana helm chart when using persistant volumes on AWS.
Recent docker images changed the UID/GID for user grafana. At a minimum the PersistentVolume needs to be chmod'd
`chmod -R 472:472 /var/lib/grafana /usr/share/grafana
From the docker page:
I got it to work with
--set spec.containers.0.securityContext.fsGroup=417
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Something has to be changed in the grafana helm chart, I can't deploy this on a 1.10.3 kops deployed cluster as we keep hitting:
lastState:
terminated:
containerID: docker://18054f56d35b9733a0e604c4b1d8718b909a37feed4eb38a487d643f6bb991a0
exitCode: 128
finishedAt: 2018-08-30T13:03:34Z
message: 'error setting label on mount source ''/var/lib/kubelet/pods/f770ffdd-ac53-11e8-b02d-06ad5b90fde4/volume-subpaths/config/grafana/2'':
read-only file system'
reason: ContainerCannotRun
startedAt: 2018-08-30T13:03:34Z
Hi, this still seems to be an issue, with the latest chart. The grafana container is causing the pod to go into CrashLoopBackoff, with this message:
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied
The quick fix is running grafana as root. From the helm chart:
securityContext:
runAsUser: 0
fsGroup: 0
This obviously doesn't seem ideal. Is there a best practice solution to this issue?
Chagne GF_PATHS_DATA PVS to 777 mode can slove.
If the hostPath is /grafana, chmod -R 777 /grafana before helm install is ok.
For those of you using kubernetes, the best solution i've come up with, is mouting the volume to another pod, chmod 777 /data/grafana and chown www-data: /data/grafana, and then mount it back on the grafana pod.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
We ran into the same issue as well, using AWS EFS (via https://github.com/previousnext/k8s-aws-efs)
kubectl -n grafana logs deployment.apps/grafana
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied
Based on info in this thread managed to 'fix' by deploying a pod and changing mountpoint user and group to 472.
# debug-efs-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
labels:
app: debug-app
spec:
containers:
- name: debug-efs
image: sickp/alpine-sshd
imagePullPolicy: Always
volumeMounts:
- name: debug-efs
mountPath: "/mnt"
volumes:
- name: debug-efs
persistentVolumeClaim:
claimName: grafana
kubectl -n grafana apply -f debug-efs-pod.yaml
kubectl -n grafana exec -it debug-pod ash
/ # cd /mnt
/mnt # ls -al
total 8
drwxr-xr-x 2 root root 6144 Nov 19 15:24 .
drwxr-xr-x 1 root root 4096 Nov 21 07:17 ..
/mnt # chown 472:472 /mnt
/mnt # ls -al
total 8
drwxr-xr-x 2 472 472 6144 Nov 19 15:24 .
drwxr-xr-x 1 root root 4096 Nov 21 07:17 ..
/mnt # exit
As a real fix, providing a smooth Helm experience, would any of the following additions to the Grafana Helm chart be considered a solution?
initContainers yaml fragment via values.I also had the same problem, using nfs storage.
https://github.com/grafana/grafana/issues/14584#event-2035224133
After I installed the source code, I tested each parameter and environment variable and found that only provisioning.datasources could not use NFS storage. Error content: "can't read datasource provisioning files from directory logger=provisioning.datasources"
I have same problem using nfs storage. Any updates ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
For those running k8s on azure with azure files, I finally got it working using: (Source)
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0472
- gid=0472
- mfsymlinks
- nobrl
- cache=none
parameters:
skuName: Standard_LRS
working with follow script:
chown -R 472:472 /opt/volume/monitoring/grafana
For those running k8s on azure with azure files, I finally got it working using: (Source)
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: azurefile provisioner: kubernetes.io/azure-file mountOptions: - dir_mode=0777 - file_mode=0777 - uid=0472 - gid=0472 - mfsymlinks - nobrl - cache=none parameters: skuName: Standard_LRS
This worked for me :) thx for sharing.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Would be nice if it finally gets fixed. I still cannot get it to run with external nfs volume on kubernetes (on prem). chown -R 472:472 does not help.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Does not look like it's fixed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Still having issues on upgrading.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
I was having this problem with Azure-File storage.
For anyone still having this problem, this is due to how SQLITE works.
Just set the nobrl mount option on your storage class (Azure-File) like so:
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=1000
- gid=1000
- nobrl
Problem solved.
References:
https://github.com/kubernetes/kubernetes/issues/61767
https://docs.microsoft.com/bs-latn-ba/azure/aks/azure-files-volume#mount-options
https://github.com/docker/for-win/issues/11
For those who still facing this issue. If you using NFS let add mount options to the volume or nfs client provisioner:
mountOptions:
- vers=3
- sec=sys
- proto=tcp
- actimeo=30
source: https://github.com/andyzhangx/demo/blob/master/pv/pv-nfs-mountoptions.yaml
For those running k8s on azure with azure files, I finally got it working using: (Source)
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: azurefile provisioner: kubernetes.io/azure-file mountOptions: - dir_mode=0777 - file_mode=0777 - uid=0472 - gid=0472 - mfsymlinks - nobrl - cache=none parameters: skuName: Standard_LRS
Thanks.. this solved my issue too in Azure
I had this problem with Grafana 7.3.7 on AWS using EFS. I have worked it around by adding the following to my Kubernetes deployment configuration:
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 472:472 /var/lib/grafana"]
securityContext:
runAsUser: 0
runAsNonRoot: false
volumeMounts:
- name: grafana-efs-volume
mountPath: /var/lib/grafana/
Most helpful comment
For those running k8s on azure with azure files, I finally got it working using: (Source)