Charts: [stable/nfs-server-provisioner] container restarts when api server is failing

Created on 19 Jun 2019 · 14Comments · Source: helm/charts

Describe the bug

The container restarts frequently when the api server is not responding but it should not have to do it.

Version of Helm and Kubernetes:

helm 2.14.1
kubernetes 1.12.7-1

Which chart:

latest stable/nfs-server-provisioner

What happened:

I have a cluster with latency and sometimes failure issues of the api server. I have submitted a bug to my cloud provider that manages the control plane.
My issue with nfs-server-provisioner is that during periods of failures of the api server I seel logs like this and the container restarts :

I0619 07:05:11.965197       1 leaderelection.go:231] failed to renew lease nfs/cluster.local-nfs-data-nfs-server-provisioner: failed to tryAcquireOrRenew context deadline exceeded
F0619 07:05:11.966172       1 controller.go:646] leaderelection lost

I have lots of problems with the PVCs based on this provisionner: binding timeouts, lost file writes, etc.
I am not certain but I think it is related to these frequent restarts of the nfs server.

What you expected to happen:

The nfs server should tolerate api server failures.

How to reproduce it (as minimally and precisely as possible):

I do not manage the control plane of my cluster. But I suppose that manually stopping the api server should recreate the issue.

Anything else we need to know:

I understand restarting the container in this case is probably the expected behavior, but I think it is wrong.

replication is not relevant in the case of nfs-server-provisioner and I don't see the point of a leader election
stability issues of the api server should not propagate to a storage provisioner and then to a bunch of volumes used by a bunch of services

lifecyclstale

Source

albanm

👍1

Most helpful comment

The image is here: https://hub.docker.com/r/koumoul/nfs-provisioner/tags

The fork is here: https://github.com/koumoul-dev/external-storage

There are 5 commits by me, but 4 of them are only about build. The fix commit is this one: https://github.com/koumoul-dev/external-storage/commit/aa1869b605c6944f271df351e908d4142deac0d0

albanm on 13 Aug 2019

👍3

All 14 comments

@albanm I'm seeing a very similar issue - which cloud provider are you seeing the latency with?

acejam on 23 Jun 2019

It is OVH. Their managed kubernetes solution is quite recent.

I have published a temporary fork of nfs-provisioner on docker hub to deactivate the leader election system in the meantime. If you wish you can test it with these options for the helm chart:

image:
  repository: koumoul/nfs-provisioner
  tag: v1.0.0

albanm on 23 Jun 2019

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] on 23 Jul 2019

This issue is being automatically closed due to inactivity.

stale[bot] on 7 Aug 2019

Hey, I have seen the issue described above in multiple clusters. More recently on IBM and AWS, both restart the pod, and the last logs you can read:

I0810 18:47:10.541004       1 leaderelection.go:231] failed to renew lease default/worldsibu.com-nfs: failed to tryAcquireOrRenew context deadline exceeded
F0810 18:47:10.541039       1 controller.go:646] leaderelection lost

diestrin on 13 Aug 2019

@albanm do you happen to have the changes performed on a git repo?

diestrin on 13 Aug 2019

The image is here: https://hub.docker.com/r/koumoul/nfs-provisioner/tags

The fork is here: https://github.com/koumoul-dev/external-storage

There are 5 commits by me, but 4 of them are only about build. The fix commit is this one: https://github.com/koumoul-dev/external-storage/commit/aa1869b605c6944f271df351e908d4142deac0d0

albanm on 13 Aug 2019

👍3

Hey @albanm, thank you very much for the links. I'll test them on my clusters.

diestrin on 14 Aug 2019

Hi @albanm , do you have update your fork with new nfs-provisioner code and k8s compatibility?

Zero-2 on 3 Oct 2019

No, I didn't do that. I suppose I should do a merge once in a while. I haven't encountered problems yet.

albanm on 3 Oct 2019

Hi @albanm I just had this issue. I guess the problem has persisted still in the latest version?

vitobotta on 26 May 2020

Honestly I didn't check. I still use my fork without asking question as it works well for me.

albanm on 26 May 2020

@albanm The official one has been updated 4 months ago, while your image was updated one year ago if I am not mistaken. Are there any other differences besides the leader election thing? Thanks

vitobotta on 26 May 2020

No, the only meaningful commit of my fork is this one https://github.com/koumoul-dev/external-storage/commit/aa1869b605c6944f271df351e908d4142deac0d0

albanm on 26 May 2020

Was this page helpful?

0 / 5 - 0 ratings