Minikube: nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

Created on 5 Dec 2018 · 27Comments · Source: kubernetes/minikube

BUG REPORT

Environment:

Minikube version: v0.30.0

OS: Fedora 29
VM Driver: virtualbox, kvm2
ISO version: v0.30.0
Others:
- kubernetes version: tested on v1.10.0, v1.13.0
- tested with coredns and kube-dns minikube addons

What happened:
NFS volume fails to mount due to DNS error (Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known). This problem does not occur when deployed on GKE.

What you expected to happen:
NFS volume is mounted without an error.

How to reproduce it (as minimally and precisely as possible):

Start nfs-server:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nfs-server
spec:
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
        - name: nfs
          containerPort: 2049
        - name: mountd
          containerPort: 20048
        - name: rpcbind
          containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /exports
          name: exports
      volumes:
      - name: exports
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: nfs-server
spec:
  ports:
  - name: nfs
    port: 2049
  - name: mountd
    port: 20048
  - name: rpcbind
    port: 111
  selector:
    role: nfs-server

Start service consuming the nfs volume (e.g. busybox):

apiVersion: v1
kind: ReplicationController
metadata:
  name: nfs-busybox
spec:
  replicas: 1
  selector:
    name: nfs-busybox
  template:
    metadata:
      labels:
        name: nfs-busybox
    spec:
      containers:
      - image: busybox
        command:
          - sh
          - -c
          - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
          - name: nfs
            mountPath: "/mnt"
      volumes:
      - name: nfs
        nfs:
          server: nfs-server.default.svc.cluster.local
          path: "/"

Output of minikube logs (if applicable):
In kubectl describe pod nfs-busybox-... is this error:

  Warning  FailedMount  4m    kubelet, minikube  MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit: run-r23cae2998bf349df8046ac3c61bfe4e9.scope
mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

Which indicates problem with DNS resolution for nfs-server.default.svc.cluster.local.

Note: The NFS is mounted successfully when specified by ClusterIP instead of domain name.

Anything else do we need to know:
The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.

When running kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local:

Server:         10.96.0.10
Address:        10.96.0.10:53

Name:   nfs-server.default.svc.cluster.local
Address: 10.105.22.251

*** Can't find nfs-server.default.svc.cluster.local: No answer

Where strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).

aredns help wanted kinbug lifecyclfrozen prioritawaiting-more-evidence

Source

fhaifler

👍11

Most helpful comment

@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix.

fhaifler on 10 Dec 2018

👍7

All 27 comments

Have you seen https://github.com/kubernetes/minikube/issues/2218#issuecomment-436821733 ?

tamalsaha on 8 Dec 2018

@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix.

fhaifler on 10 Dec 2018

👍7

We have the same issue:

Having error message from pod:
Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/7940ceed-ffad-11e8-890b-005056010f5a/volumes/kubernetes.io~nfs/pv-nfs-10gi --scope -- mount -t nfs ext-nfs-svc.default.svc.cluster.local:/data/nfs/test /var/lib/kubelet/pods/7940ceed-ffad-11e8-890b-005056010f5a/volumes/kubernetes.io~nfs/pv-nfs-10gi Output: Running scope as unit: run-r3a24d6989c5d4e0c99d4b0eb5429a210.scope mount.nfs: Failed to resolve server ext-nfs-svc.default.svc.cluster.local: Name or service not known

Eventhough resolving works as expected:
kubectl exec -it busybox -- nslookup ext-nfs-svc.default.svc.cluster.local

Answer is:
`Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: ext-nfs-svc.default.svc.cluster.local
Address 1: 10.96.152.237 ext-nfs-svc.default.svc.cluster.local`

Using the ip for nfs connection works as described above.

it-monkey on 14 Dec 2018

I suspect this is because NFS on the host system doesn't currently point to 10.96.0.10 within the guest VM - only within pods for what appears to be obsolete historical reasons. I could be completely wrong though.

tstromberg on 18 Dec 2018

I guess you are right. Defining the IP for ext-nfs-svc.default.svc.cluster.local on the cluster-workers hosts file does solve the problem. Somehow it seems that the nfs mounting does not use the cluster internal dns resolution and also does not really use the external ip defined in the service. I'm not sure if this is the expected behaviour but to me it does not make much sense.

it-monkey on 19 Dec 2018

👀

bondarewicz on 21 Jan 2019

well, I'm running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? :(

astleychen on 26 Apr 2019

👍1

Apologies, I'm not a Minikube user but this is the most apt issue I've found for the problems that I'm having.

I'm experiencing these exact problems:

NFS-mounting by the internal domain (nfs-server.default.svc.cluster.local) doesn't work during ContainerCreating phase
Using the service IP does work.
Setting up a busybox pod, and using nslookup in there resolves the domain just fine.

Based on my googling efforts so far, this seems to be a Kubernetes issue where the NFS is being set up before the container can reach coredns. Perhaps an initialization order problem?

ikkerens on 17 May 2019

The problem is that the components responsible for NFS storage backends do not use the cluster internal DNS but try to resolve the NFS server with the DNS information given on the worker node itself. One way to make this work would be to do a hosts-file entry on the worker nodes using (nfs-server.default.svc.cluster.local) and the nfs-server's ip address. But this is just a quick and dirty hack-around.

But it's just odd that this component is not able to use the cluster internal DNS resolution. This would make much more sense and be more intuitive to use.

it-monkey on 17 May 2019

👍2

well, I'm running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? :(

I'm also having this issue on EKS.

bmbferreira on 21 May 2019

I don't think it's an issue related to any specific kubernetes cloud solution, but a general one.

it-monkey on 22 May 2019

From what I can tell, the only solution to this would be to have the k8s node have access to k8s's coredns, which is responsible for resolving these names. However in my experience most k8s nodes use their own dns independent of k8s.

ikkerens on 22 May 2019

👍1

@ikkerens I'm pretty sure that would work. Having an Ingress for the kube-dns service which is only reachable from the k8s-nodes itself could achieve this. But as you said, one would have to change the dns settings on the nodes.

it-monkey on 23 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 21 Aug 2019

/remove-lifecycle stale

rjohnson3 on 21 Aug 2019

I have the same issue on AWS with an NFS server backed by an EBS disk.
Using the IP addr, works just fine. The nfs server name cannot be resolved.

zvonkok on 7 Sep 2019

I'm running into the same issue. I can get it to work fine in GKE, won't work locally.

dafrenchyman on 21 Sep 2019

Same issue on Azure AKS too.

raftAtGit on 23 Sep 2019

BUG REPORT

Environment:

Minikube version: v0.30.0

OS: Fedora 29

VM Driver: virtualbox, kvm2

ISO version: v0.30.0

Others:

kubernetes version: tested on v1.10.0, v1.13.0

tested with coredns and kube-dns minikube addons

What happened:
NFS volume fails to mount due to DNS error (Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known). This problem does not occur when deployed on GKE.

What you expected to happen:
NFS volume is mounted without an error.

How to reproduce it (as minimally and precisely as possible):

Start nfs-server:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nfs-server
spec:
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
        - name: nfs
          containerPort: 2049
        - name: mountd
          containerPort: 20048
        - name: rpcbind
          containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /exports
          name: exports
      volumes:
      - name: exports
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: nfs-server
spec:
  ports:
  - name: nfs
    port: 2049
  - name: mountd
    port: 20048
  - name: rpcbind
    port: 111
  selector:
    role: nfs-server
Start service consuming the nfs volume (e.g. busybox):
apiVersion: v1
kind: ReplicationController
metadata:
  name: nfs-busybox
spec:
  replicas: 1
  selector:
    name: nfs-busybox
  template:
    metadata:
      labels:
        name: nfs-busybox
    spec:
      containers:
      - image: busybox
        command:
          - sh
          - -c
          - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
          - name: nfs
            mountPath: "/mnt"
      volumes:
      - name: nfs
        nfs:
          server: nfs-server.default.svc.cluster.local
          path: "/"
Output of minikube logs (if applicable):
In kubectl describe pod nfs-busybox-... is this error:
  Warning  FailedMount  4m    kubelet, minikube  MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit: run-r23cae2998bf349df8046ac3c61bfe4e9.scope
mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
Which indicates problem with DNS resolution for nfs-server.default.svc.cluster.local.

Note: The NFS is mounted successfully when specified by ClusterIP instead of domain name.

Anything else do we need to know:
The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.

When running kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local:
Server:         10.96.0.10
Address:        10.96.0.10:53

Name:   nfs-server.default.svc.cluster.local
Address: 10.105.22.251

*** Can't find nfs-server.default.svc.cluster.local: No answer
Where strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).

@fhaifler - With these configurations there is no data being shared between the pods. That is, anything inside the '/' is not visible inside the '/mnt' folder.
Any idea why?

Also, I'm not able to mount the '/nfs-data-example-folder' into '/mnt' folder. It throws permission error.
Any idea why?

ramkrishnan8994 on 24 Sep 2019

@ramkrishnan8994 I am not sure I understand the question. Have you managed to make it work even with the domain name for nfs server (nfs-server.default.svc.cluster.local)? It is still not working for me even with updated minikube.

That is, anything inside the '/' is not visible inside the '/mnt' folder.

I am not sure what do you mean. / corresponds to root exported directory by the nfs server, therefore /exports directory inside the nfs-server pod. The same content should be visible inside nfs-busybox under /mnt directory.

Also, I'm not able to mount the '/nfs-data-example-folder' into '/mnt' folder. It throws permission error.

I don't know what /nfs-data-example-folder should be. Can you elaborate please?

fhaifler on 25 Sep 2019

This would likely be addressed by resolving #2162 (help wanted)

tstromberg on 2 Oct 2019

I run into the same issue with Azure AKS but not with Google GKE. How come Google have a fix and not other cloud provider.

pievalentin on 6 Dec 2019

This is a known issue in Kubernetes:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues

Kubernetes installs do not configure the nodes’ resolv.conf files to use the cluster DNS by default, because that process is inherently distribution-specific. This should probably be implemented eventually.

seen in https://github.com/kubernetes/minikube/issues/2162#issuecomment-533696513

SimonHeimberg on 6 Apr 2020

ideas of a workarounds

write /etc/hosts of all nodes (independent of distribution) or configure nodes to use cluster dns

/etc/hosts manually

Manually write name of service in /etc/hosts on all nodes

/etc/hosts partially automated

daemonset with an init container doing the update and rancher/pause as app container.
The init container gets a list of services to handle. It looks up the ip address of the services and writes name and ip in /to_edit/hosts (which is mounted from /etc/hosts of node). On changes, restart the daemonset manually.

/etc/hosts fully automated

Write a controller which listens to all services (or only specially labeled services) and writes /etc/hosts on each host. See links in https://github.com/kubernetes/kubernetes/issues/64623#issuecomment-609875003

resolve.conf manually

Update resolv.conv manually on each node. Depending on the distributon (using systemd, ...), this may be different. Find the nameserver in /etc/resolv.conf of any pod.

resolve.conf manually

daemonset with an init container doing the update and rancher/pause as app container. The init container updates /to_edit/resolv.conv, which is mounted from host. No restart required.

SimonHeimberg on 23 Apr 2020

For anyone else running into this in general (not only with minikube), I've made a small image+daemonset that basically does the later option mentionned above (daemonset updating host's /etc/systemd/resolved.conf)

Should work in most scenarios where the cloud provider isn't doing something too too funky with their DNS config https://github.com/Tristan971/kube-enable-coredns-on-node

(bit dirty/ad-hoc in its current state, but could be made to support more hosts setups)

Tristan971 on 22 Jul 2020

I was able to solve this problem by creating a service with a static clusterIP and then mounting to the IP instead of service name. No DNS required. This is working nicely on Azure. I haven't tried elsewhere

In my case, I'm using an HDFS NFS Gateway and chose 10.0.200.2 for the clusterIP

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hdfs
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: Service
metadata:
  name: hdfs-nfs
  labels:
    component: hdfs-nn
spec:
  type: ClusterIP
  clusterIP: 10.0.200.2
  ports:
    - name: portmapper
      port: 111
      protocol: TCP
    - name: nfs
      port: 2049
      protocol: TCP
    - name: mountd
      port: 4242
      protocol: TCP
  selector:
    component: hdfs-nn
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: hdfs
spec:
  storageClassName: hdfs
  capacity:
    storage: 3000Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - vers=3
    - proto=tcp
    - nolock
    - noacl
    - sync    
  nfs:
    server: 10.0.200.2
    path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hdfs
spec:
  storageClassName: hdfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3000Gi

BrianHuf on 6 Aug 2020

❤1

Would mounting it inside the container be an option? i.e traditional way of installing nfs-client in the container and using the mount command instead of letting the Kubernetes to mount it?