Describe the bug
After looking at the arm64 support PR, I deployed Longhorn on my 8 node Raspberry Pi 4 K8S cluster. I was able to find all the needed docker images on DockerHub that were built by ivang. I used the official Longhorn helm chart and modified the images and their respective tags in the values.yaml file before deploying. I also made sure to run the environment check script, install open-iscsi on all nodes and have the iscsid daemon running.
From the Longhorn UI, I can successfully create volumes and attach them to any worker nodes. I can also create PVC's using the provided storage class successfully.
I keep running into this issue where the created volume cannot bind to any pod. For example, I ran the sample nginx deployment provided in the documentation.
running "VolumeBinding" filter plugin for pod "volume-test": pod has unbound immediate PersistentVolumeClaims
AttachVolume.Attach failed for volume "pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c" : attachdetachment timeout for volume pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c
Unable to attach or mount volumes: unmounted volumes=[volv], unattached volumes=[default-token-64qvs volv]: timed out waiting for the condition
For reference, this is the created PVC:
Name: longhorn-volv-pvc
Namespace: test
StorageClass: usb
Status: Bound
Volume: pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 2Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: volume-test
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 19m driver.longhorn.io_csi-provisioner-67f6b9d8f-gnwht_06721115-849b-44d4-8272-c8e22ffbb189 External provisioner is provisioning volume for claim "test/longhorn-volv-pvc"
Normal ExternalProvisioning 19m (x3 over 19m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "driver.longhorn.io" or manually created by system administrator
Normal ProvisioningSucceeded 19m driver.longhorn.io_csi-provisioner-67f6b9d8f-gnwht_06721115-849b-44d4-8272-c8e22ffbb189 Successfully provisioned volume pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c
This is the PV:
Name: pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: driver.longhorn.io
Finalizers: [kubernetes.io/pv-protection external-attacher/driver-longhorn-io]
StorageClass: usb
Status: Bound
Claim: test/longhorn-volv-pvc
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 2Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: driver.longhorn.io
FSType: ext4
VolumeHandle: pvc-7cc9759f-474a-4fe8-8eb2-ee9ad5757d6c
ReadOnly: false
VolumeAttributes: diskSelector=usb
fromBackup=
numberOfReplicas=2
staleReplicaTimeout=30
storage.kubernetes.io/csiProvisionerIdentity=1597799569063-8081-driver.longhorn.io
Events: <none>
From the Longhorn UI, I can see that the PV is healthy and is attached to the correct worker node.
I also tried scaling down the longhorn-driver-deployer to 0, delete deployments then scale it back up. This resulted in the pod 'has unbound immediate PersistentVolumeClaims' error to go away on the first PVC claim, but it returned after I tested creating another PVC.
Expected behavior
The volume should attach to the pod.
Environment:
@ivang Is there anything specific we need to do with the CSI driver for ARM64?
Yes, the original CSI drivers are not yet available for ARM64. Thus, I had to build them myself (see this comment), push them to my docker hub, and modify these lines of the deployment scripts to make use of them. I didn't want to clutter the PR with my local changes to the deployment scripts but I can include them if necessary.
PS. It seems that support for ARM64 in CSI is on its way.
Update:
I was able to get it to work! After looking at @ivang comment, I found csi docker images that did the trick! I was able to deploy the sample nginx deployment and I just finished deploying cluster monitoring, which uses persistence for Grafana and Prometheus. Works great so far!
For anyone that comes across this, here is what I did to get Longhorn up and running:
- git clone https://github.com/longhorn/longhorn.git
- cd longhorn/chart/
- nano values.yaml
Replace the yaml file content with the following (or manually change the docker images and their tags):
# Default values for longhorn.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
image:
longhorn:
engine: ivanangelov/longhorn-engine
engineTag: b0a22eb
manager: ivanangelov/longhorn-manager
managerTag: 92a5f6a5_arm64
ui: ivanangelov/longhorn-ui
uiTag: 1607c9e
instanceManager: ivanangelov/longhorn-instance-manager
instanceManagerTag: v1_20200711
pullPolicy: IfNotPresent
service:
ui:
type: ClusterIP
nodePort: null
manager:
type: ClusterIP
nodePort: ""
persistence:
defaultClass: true
# The default replica count is 3
defaultClassReplicaCount: 2
csi:
attacherImage: raspbernetes/csi-external-attacher
attacherImageTag: latest
provisionerImage: raspbernetes/csi-external-provisioner
provisionerImageTag: latest
nodeDriverRegistrarImage: raspbernetes/csi-node-driver-registrar
nodeDriverRegistrarImageTag: latest
resizerImage: raspbernetes/csi-external-resizer
resizerImageTag: latest
kubeletRootDir: ~
attacherReplicaCount: ~
provisionerReplicaCount: ~
resizerReplicaCount: ~
defaultSettings:
backupTarget: ~
backupTargetCredentialSecret: ~
createDefaultDiskLabeledNodes: ~
defaultDataPath: ~
replicaSoftAntiAffinity: ~
storageOverProvisioningPercentage: ~
storageMinimalAvailablePercentage: ~
upgradeChecker: ~
defaultReplicaCount: ~
guaranteedEngineCPU: ~
defaultLonghornStaticStorageClass: ~
backupstorePollInterval: ~
taintToleration: ~
priorityClass: ~
registrySecret: ~
autoSalvage: ~
disableSchedulingOnCordonedNode: ~
replicaZoneSoftAntiAffinity: ~
volumeAttachmentRecoveryPolicy: ~
mkfsExt4Parameters: ~
privateRegistry:
registryUrl: ~
registryUser: ~
registryPasswd: ~
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
#
ingress:
## Set to true to enable ingress record generation
enabled: false
host: xip.io
## Set this to true in order to enable TLS on the ingress record
## A side effect of this will be that the backend service will be connected at port 443
tls: false
## If TLS is set to true, you must declare what secret will store the key/certificate for TLS
tlsSecret: longhorn.local-tls
## Ingress annotations done as key:value pairs
## If you're using kube-lego, you will want to add:
## kubernetes.io/tls-acme: true
##
## For a full list of possible ingress annotations, please see
## ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/annotations.md
##
## If tls is set to true, annotation ingress.kubernetes.io/secure-backends: "true" will automatically be set
annotations:
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: true
secrets:
## If you're providing your own certificates, please use this to add the certificates as secrets
## key and certificate should start with -----BEGIN CERTIFICATE----- or
## -----BEGIN RSA PRIVATE KEY-----
##
## name should line up with a tlsSecret set further up
## If you're using kube-lego, this is unneeded, as it will create the secret for you if it is not set
##
## It is also possible to create and manage the certificates outside of this helm chart
## Please see README.md for more information
# - name: longhorn.local-tls
# key:
# certificate:
# Configure a pod security policy in the Longhorn namespace to allow privileged pods
enablePSP: true
Then install the chart:
kubectl create namespace longhorn-system
helm install longhorn . --namespace longhorn-system
To test if it works, create a test-longhorn.yaml file with the following content:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
restartPolicy: Always
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
If everything goes according to plan, this should spin up an nginx pod with a volume created and provisioned by Longhorn!
Thank you @ivang and @yasker for your help!
Most helpful comment
Update:
I was able to get it to work! After looking at @ivang comment, I found csi docker images that did the trick! I was able to deploy the sample nginx deployment and I just finished deploying cluster monitoring, which uses persistence for Grafana and Prometheus. Works great so far!
For anyone that comes across this, here is what I did to get Longhorn up and running:
Replace the yaml file content with the following (or manually change the docker images and their tags):
Then install the chart:
To test if it works, create a test-longhorn.yaml file with the following content:
If everything goes according to plan, this should spin up an nginx pod with a volume created and provisioned by Longhorn!
Thank you @ivang and @yasker for your help!