BUG REPORT
Environment:
Minikube version: v0.30.0
What happened:
NFS volume fails to mount due to DNS error (Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known). This problem does not occur when deployed on GKE.
What you expected to happen:
NFS volume is mounted without an error.
How to reproduce it (as minimally and precisely as possible):
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nfs-server
spec:
replicas: 1
selector:
matchLabels:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: exports
volumes:
- name: exports
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: nfs-server
spec:
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
apiVersion: v1
kind: ReplicationController
metadata:
name: nfs-busybox
spec:
replicas: 1
selector:
name: nfs-busybox
template:
metadata:
labels:
name: nfs-busybox
spec:
containers:
- image: busybox
command:
- sh
- -c
- 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'
imagePullPolicy: IfNotPresent
name: busybox
volumeMounts:
- name: nfs
mountPath: "/mnt"
volumes:
- name: nfs
nfs:
server: nfs-server.default.svc.cluster.local
path: "/"
Output of minikube logs (if applicable):
In kubectl describe pod nfs-busybox-... is this error:
Warning FailedMount 4m kubelet, minikube MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit: run-r23cae2998bf349df8046ac3c61bfe4e9.scope
mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
Which indicates problem with DNS resolution for nfs-server.default.svc.cluster.local.
Note: The NFS is mounted successfully when specified by ClusterIP instead of domain name.
Anything else do we need to know:
The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.
When running kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local:
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: nfs-server.default.svc.cluster.local
Address: 10.105.22.251
*** Can't find nfs-server.default.svc.cluster.local: No answer
Where strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).
Have you seen https://github.com/kubernetes/minikube/issues/2218#issuecomment-436821733 ?
@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix.
We have the same issue:
Having error message from pod:
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/7940ceed-ffad-11e8-890b-005056010f5a/volumes/kubernetes.io~nfs/pv-nfs-10gi --scope -- mount -t nfs ext-nfs-svc.default.svc.cluster.local:/data/nfs/test /var/lib/kubelet/pods/7940ceed-ffad-11e8-890b-005056010f5a/volumes/kubernetes.io~nfs/pv-nfs-10gi
Output: Running scope as unit: run-r3a24d6989c5d4e0c99d4b0eb5429a210.scope
mount.nfs: Failed to resolve server ext-nfs-svc.default.svc.cluster.local: Name or service not known
Eventhough resolving works as expected:
kubectl exec -it busybox -- nslookup ext-nfs-svc.default.svc.cluster.local
Answer is:
`Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: ext-nfs-svc.default.svc.cluster.local
Address 1: 10.96.152.237 ext-nfs-svc.default.svc.cluster.local`
Using the ip for nfs connection works as described above.
I suspect this is because NFS on the host system doesn't currently point to 10.96.0.10 within the guest VM - only within pods for what appears to be obsolete historical reasons. I could be completely wrong though.
I guess you are right. Defining the IP for ext-nfs-svc.default.svc.cluster.local on the cluster-workers hosts file does solve the problem. Somehow it seems that the nfs mounting does not use the cluster internal dns resolution and also does not really use the external ip defined in the service. I'm not sure if this is the expected behaviour but to me it does not make much sense.
👀
well, I'm running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? :(
Apologies, I'm not a Minikube user but this is the most apt issue I've found for the problems that I'm having.
I'm experiencing these exact problems:
nfs-server.default.svc.cluster.local) doesn't work during ContainerCreating phasenslookup in there resolves the domain just fine.Based on my googling efforts so far, this seems to be a Kubernetes issue where the NFS is being set up before the container can reach coredns. Perhaps an initialization order problem?
The problem is that the components responsible for NFS storage backends do not use the cluster internal DNS but try to resolve the NFS server with the DNS information given on the worker node itself. One way to make this work would be to do a hosts-file entry on the worker nodes using (nfs-server.default.svc.cluster.local) and the nfs-server's ip address. But this is just a quick and dirty hack-around.
But it's just odd that this component is not able to use the cluster internal DNS resolution. This would make much more sense and be more intuitive to use.
well, I'm running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? :(
I'm also having this issue on EKS.
I don't think it's an issue related to any specific kubernetes cloud solution, but a general one.
From what I can tell, the only solution to this would be to have the k8s node have access to k8s's coredns, which is responsible for resolving these names. However in my experience most k8s nodes use their own dns independent of k8s.
@ikkerens I'm pretty sure that would work. Having an Ingress for the kube-dns service which is only reachable from the k8s-nodes itself could achieve this. But as you said, one would have to change the dns settings on the nodes.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I have the same issue on AWS with an NFS server backed by an EBS disk.
Using the IP addr, works just fine. The nfs server name cannot be resolved.
I'm running into the same issue. I can get it to work fine in GKE, won't work locally.
Same issue on Azure AKS too.
BUG REPORT
Environment:
Minikube version: v0.30.0
- OS: Fedora 29
- VM Driver: virtualbox, kvm2
- ISO version: v0.30.0
Others:
- kubernetes version: tested on v1.10.0, v1.13.0
- tested with coredns and kube-dns minikube addons
What happened:
NFS volume fails to mount due to DNS error (Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known). This problem does not occur when deployed on GKE.What you expected to happen:
NFS volume is mounted without an error.How to reproduce it (as minimally and precisely as possible):
- Start nfs-server:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nfs-server spec: replicas: 1 selector: matchLabels: role: nfs-server template: metadata: labels: role: nfs-server spec: containers: - name: nfs-server image: gcr.io/google_containers/volume-nfs:0.8 ports: - name: nfs containerPort: 2049 - name: mountd containerPort: 20048 - name: rpcbind containerPort: 111 securityContext: privileged: true volumeMounts: - mountPath: /exports name: exports volumes: - name: exports emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: nfs-server spec: ports: - name: nfs port: 2049 - name: mountd port: 20048 - name: rpcbind port: 111 selector: role: nfs-server
- Start service consuming the nfs volume (e.g. busybox):
apiVersion: v1 kind: ReplicationController metadata: name: nfs-busybox spec: replicas: 1 selector: name: nfs-busybox template: metadata: labels: name: nfs-busybox spec: containers: - image: busybox command: - sh - -c - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done' imagePullPolicy: IfNotPresent name: busybox volumeMounts: - name: nfs mountPath: "/mnt" volumes: - name: nfs nfs: server: nfs-server.default.svc.cluster.local path: "/"Output of
minikube logs(if applicable):
Inkubectl describe pod nfs-busybox-...is this error:Warning FailedMount 4m kubelet, minikube MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs Output: Running scope as unit: run-r23cae2998bf349df8046ac3c61bfe4e9.scope mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not knownWhich indicates problem with DNS resolution for
nfs-server.default.svc.cluster.local.Note: The NFS is mounted successfully when specified by ClusterIP instead of domain name.
Anything else do we need to know:
The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.When running
kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local:Server: 10.96.0.10 Address: 10.96.0.10:53 Name: nfs-server.default.svc.cluster.local Address: 10.105.22.251 *** Can't find nfs-server.default.svc.cluster.local: No answerWhere strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).
@fhaifler - With these configurations there is no data being shared between the pods. That is, anything inside the '/' is not visible inside the '/mnt' folder.
Any idea why?
Also, I'm not able to mount the '/nfs-data-example-folder' into '/mnt' folder. It throws permission error.
Any idea why?
@ramkrishnan8994 I am not sure I understand the question. Have you managed to make it work even with the domain name for nfs server (nfs-server.default.svc.cluster.local)? It is still not working for me even with updated minikube.
That is, anything inside the '/' is not visible inside the '/mnt' folder.
I am not sure what do you mean. / corresponds to root exported directory by the nfs server, therefore /exports directory inside the nfs-server pod. The same content should be visible inside nfs-busybox under /mnt directory.
Also, I'm not able to mount the '/nfs-data-example-folder' into '/mnt' folder. It throws permission error.
I don't know what /nfs-data-example-folder should be. Can you elaborate please?
This would likely be addressed by resolving #2162 (help wanted)
I run into the same issue with Azure AKS but not with Google GKE. How come Google have a fix and not other cloud provider.
This is a known issue in Kubernetes:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues
Kubernetes installs do not configure the nodes’ resolv.conf files to use the cluster DNS by default, because that process is inherently distribution-specific. This should probably be implemented eventually.
seen in https://github.com/kubernetes/minikube/issues/2162#issuecomment-533696513
write /etc/hosts of all nodes (independent of distribution) or configure nodes to use cluster dns
Manually write name of service in /etc/hosts on all nodes
daemonset with an init container doing the update and rancher/pause as app container.
The init container gets a list of services to handle. It looks up the ip address of the services and writes name and ip in /to_edit/hosts (which is mounted from /etc/hosts of node). On changes, restart the daemonset manually.
Write a controller which listens to all services (or only specially labeled services) and writes /etc/hosts on each host. See links in https://github.com/kubernetes/kubernetes/issues/64623#issuecomment-609875003
Update resolv.conv manually on each node. Depending on the distributon (using systemd, ...), this may be different. Find the nameserver in /etc/resolv.conf of any pod.
daemonset with an init container doing the update and rancher/pause as app container. The init container updates /to_edit/resolv.conv, which is mounted from host. No restart required.
For anyone else running into this in general (not only with minikube), I've made a small image+daemonset that basically does the later option mentionned above (daemonset updating host's /etc/systemd/resolved.conf)
Should work in most scenarios where the cloud provider isn't doing something too too funky with their DNS config https://github.com/Tristan971/kube-enable-coredns-on-node
(bit dirty/ad-hoc in its current state, but could be made to support more hosts setups)
I was able to solve this problem by creating a service with a static clusterIP and then mounting to the IP instead of service name. No DNS required. This is working nicely on Azure. I haven't tried elsewhere
In my case, I'm using an HDFS NFS Gateway and chose 10.0.200.2 for the clusterIP
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hdfs
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: Service
metadata:
name: hdfs-nfs
labels:
component: hdfs-nn
spec:
type: ClusterIP
clusterIP: 10.0.200.2
ports:
- name: portmapper
port: 111
protocol: TCP
- name: nfs
port: 2049
protocol: TCP
- name: mountd
port: 4242
protocol: TCP
selector:
component: hdfs-nn
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: hdfs
spec:
storageClassName: hdfs
capacity:
storage: 3000Gi
accessModes:
- ReadWriteMany
mountOptions:
- vers=3
- proto=tcp
- nolock
- noacl
- sync
nfs:
server: 10.0.200.2
path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hdfs
spec:
storageClassName: hdfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 3000Gi
Would mounting it inside the container be an option? i.e traditional way of installing nfs-client in the container and using the mount command instead of letting the Kubernetes to mount it?
Most helpful comment
@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix.