There seems to be a recurring bad practice among the charts in this repository: using a Deployment to manage pods using Persistent Volume Claims, rather than the proper StatefulSet.
To demonstrate just how pervasive the problem is, one can compare the list of charts using a StatefulSet vs a Deployment.
The list of stateful charts using a StatefulSet:
$ git grep -li 'kind: *StatefulSet' |
awk -F '/' '{print $1}'
cockroachdb
concourse
consul
ipfs
memcached
minio
mongodb-replicaset
rethinkdb
versus the stateful charts using a Deployment:
$ git grep -l -i 'kind: *Deployment' |
xargs grep -i PersistentVolumeClaim |
awk -F '/' '{print $1}' |
sort -u
artifactory
chronograf
dokuwiki
drupal
factorio
ghost
gitlab-ce
gitlab-ee
grafana
influxdb
jasperreports
jenkins
joomla
kapacitor
magento
mariadb
mediawiki
minecraft
minio
mongodb
moodle
mysql
odoo
opencart
openvpn
orangehrm
osclass
owncloud
percona
phabricator
phpbb
postgresql
prestashop
prometheus
rabbitmq
redis
redmine
rocketchat
sentry
testlink
traefik
wordpress
Hopefully I'm not completely missing something here -- please let me know if I overlooked a good reason why these charts are using Deployments instead of StatefulSets.
Assuming that I'm not completely off in the weeds, there are a few clear asks here:
@apeschel Thanks for the issue. Totally agree with you i have been thinking about this recently as well, yes as a part of Kubernetes 1.8 and 1.9 sig-apps is expecting more feedback from the community with regards to statefulset. Migrating stateful applications from deployment to statefulset is one of the best way to start getting feedback from users.
We have already started reasoning with _(new)_ chart contributors about their choice of deployments over statefulsets for stateful applications.
I think _(apart from adding in best practices)_ we should start by migrating well-known DBs and K/V stores to statefulsets from deployments
mysql
minio
mariadb
mongodb
postgresql
cc: @kow3ns
Stateful sets are not applicable in all cases. For example some pods need to share a pvc, whereas stateful sets are designed so that each pod is backed by its own storage. In that case a Deployment is more appropriate.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
This is most definitely still an issue.
Thats a huge issue with RWO (e.g. Block Storage) PVCs like Longhorn. You cannot upgrade the chart because the upgrade cannot mount the storage, used by the old pod.
Even with NFS & co is very dangerous .. imagine the database pod needs to be upgraded, a new pod is started accessing the same storage and files as the already running old pod.
@jfelten
Stateful sets are not applicable in all cases. For example some pods need to share a pvc, whereas stateful sets are designed so that each pod is backed by its own storage. In that case a Deployment is more appropriate.
StatefulSets allow you to use a volumeClaimTemplates, but you can also declare volumes as you do within deployments, and volumeMount for a container in the pod. At least I've done so, but I did not use a volumeClaimTemplates field at the same time.
Another advantage of StatefulSet is that you can helm delete --purge RELEASE-NAME and re-create it with the same name, and it'll keep&reuse the data. There is a lot lower risk of deleting data.
There's indeed still the cases where a single volume is used by multiple Pods. It's more advanced as more volumes support only RWO and those that don't are slow(er). It may use StatefulSet but switch to use a PVC RWM when >1 replicas is asked (or using a value).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Duplicate of related to https://github.com/helm/charts/issues/809
As an afterthought, I think switching to statefulset for DBs like postgres that don't natively scale is good for one thing and only one: VolumeClaimTemplate and the ability to delete a Release then reinstall it (without changing values to use custom PVC), and still having the PVC.
This would be a very helpful feature for my use cases (a lot of test Releases that are automatically created as needed then deleted)
@desaintmartin ah that is less troublesome with statefuleset?! Nice!
I'm currently doing something quite troublesome whenever that needs to be done: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/CHANGELOG.md
Actually, with Deployments, you need to declare the PVC (AFAIK). So when you delete a release but you set the PVC to stay (using annotations), when you reinstall the chart, Helm will complain that the PVC already exists. So you need to change the values.yaml (when possible) to manually set the PVC and don't automatically create it.
With StatefulSet, it's automated.
Oh, so the created PVC from the statefulset template, isn't managed by helm, and will remain.
While a PVC created for a deployment with helm, is managed by helm, and will be deleted. The underlying PV can be Retained though if the storageClass used has a reclaimPolicy of Retain rather than Delete. But the PV cannot be reused by a new PVC with a new uid until it has been made available again, and that won't happen unless:
# makes a `Released` PV `Available` again
kubectl patch pv $pv --patch '{"spec":{"claimRef":{"uid":null}}}}'
So, the StatefulSet is binding to the same PV again by requesting the same PVC, but if the PVC is deleted, one has to do extra work no matter what. A new PVC, created by the statefulset or by helm, will get a new uid no matter what I figure.
To summarize, the benefit you see @desaintmartin, is that statefulsets' PVCs are not manage by helm, and will be reused by statefulsets coming and going. This differs from a Deployment + PVC managed by helm, that comes and goes, as the PV is bound to a specific PVC with a certain uid and recreating that will force you to make the pv Available again manually, if it was set to Retain at all, if not it has simply been deleted.
I was just bitten badly by this chart not following that pattern. I did a helm delete and a helm install but I lost all of my dashboards because the PVC vanished. The rest of my services that used persistence restarted as intended because they were statefulsets.
https://github.com/helm/charts/pull/8004 proposes to switch to StatefulSet. It might take some time to get this done.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
I added #8924 for OpenVPN
I have opened an issue (not a PR ;) to discuss this only in the context of minio: #9620
@maver1ck What is a better idea the StatefulSet in your opinion and why?
Using the PV annotation prevents you from reinstalling chart after deletion without manually modifying the values.yaml to use existing PVC.
Using statefulset actually allows you to be stateless (how ironic) from that point of view.
That was the question I wanted to ask.
How this would behave when you're reinstalling chart with such an annotation.
With StatefulSet there is no problem and existing PVC are used.
How this annotation is working ?
Basically, when you delete a Release, the objects having this annotation will stay and go outside of helm control.
This is nice with PVCs as it will be preserved after deletion of a Release.
This is not nice as when you try to reinstall the same Chart with the same Release name, it will fail, complaining that the PVC already exists. If you are lucky, the chart you are using provides the parameter to use an existing PVC instead of creating a new one. But that requires manual intervention (i.e no longer automation unless someone writes a script that detects existence of PVC and changes values.yaml accordingly).
Using a StatefulSet allows you to delegate this management to Kubernetes itself.
I've made the change for a few Charts on my production environments, this is really helpful as it allows to delete/reinstall the Release at will (tests environments for example, automatically deployed through CI and deleted at nights, or "last resort measure" for a Helm bug to clean any state).
Thanks for explaining this.
I'm withdrawing my voice for annotation.
@mattfarina @scottrigby has this been discussed for official review guidelines?
@dnauck Are you using RollingUpdate deployment strategy? I suspect it's that making a "standoff" situation between the starting pod and the pod refusing to terminate, fighting for the same PVC. Using .spec.strategy.type==Recreate would first make the pod terminate. thus preventing the "standoff".
@desaintmartin the PVC deleting didn't bite me, but persistentVolumeReclaimPolicy:Delete bit me hard :(
For future reference, the stable postgresql chart is currently using statefulsets. I believe this was introduce in PR #8004.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove-lifecycle stale
This issue is being automatically closed due to inactivity.
This issue and comments show that it's not clear to everyone when workloads that require state should use a Deployment with PersistentVolumeClaim vs StatefulSet. Rather than assume all formers should be converted to the latter, let's use this issue to agree on the rules of thumb, and to update our documentation to reflect that.
Currently the REVIEW_GUIDELINES Kubernetes native Workloads section is the closest we have to documentation (added in #3334 for #3011), but right now I think it's a little misleading in that it seems to say _any_ workload requiring state should use a StatefulSet. There are cases where that's not a good idea. Let's work on this.
@scottrigby you say:
There are cases where that's not a good idea
Can you expand on that? Are you for example making a distinction between transient state (caches for example) and persistent state (let's say minio or postgresql), or is it about something else?
Coming in late for this discussion with an interesting question...
What happens when you are using StatefulSets without a dynamic PV provisioning solution?
I'll admit manually creating each PV to match a specific PVC is awful, but it needs to be done anyway in this case. PVCs are created slowly as each pod in the StatefulSet becomes ready.
Would it be possible to prepare the chart template to automatically assign a PV volume name to the PVC spec?
Another advantage of StatefulSet is that you can
helm delete --purge RELEASE-NAMEand re-create it with the same name, and it'll keep&reuse the data. There is a lot lower risk of deleting data.
One person's feature is another person's bug :)
I come here wondering why my postgres deployments contain old data even though I purged the previous deployment.
I have a chart that uses postgres as a subchart. After reading all this I still don't get how to configure it in such a way that the data does get purged. I'm not even sure that it can be done at all. Can it @wernight @desaintmartin ?
Unfortunately, right now, it cannot, as it has not been created by Helm. See https://github.com/helm/helm/issues/5156
This is why:
I've actually seen the case where a new Jenkins master pod is unable to start because the other is holding onto its PersistentVolumeClaim. Fundamentally, the Jenkins master is a stateful application, and needs to be handled as such.
@dylanpiergies I am adding the same for Sonarqube which depicts the same behavior as Jenkins master. The pvc which is required by the service is being hold up by the existing pod and updates failed. See the logs below:
Warning FailedAttachVolume 42m attachdetach-controller Multi-Attach error for volume "pvc-02341115-174c-xxxx-xxxxxxx" Volume is already used by pod(s) sonarqube-sonarqube-xxxxxx-xxxxx
Warning FailedMount 90s (x18 over 40m) kubelet, aks-basepool-XXXXX Unable to mount volumes for pod "sonarqube-sonarqube-xxxxx-xxxxx_xxxxx(cd802a4d-1c02-11ea-847b-xxxxxxx)": timeout expired waiting for volumes to attach or mount for pod "xxxx-pods"/"sonarqube-sonarqube-xxxxxxxxx". list of unmounted volumes=[sonarqube]. list of unattached volumes=[config install-plugins copy-plugins sonarqube tmp-dir default-token-ztvcd]
Most helpful comment
Stateful sets are not applicable in all cases. For example some pods need to share a pvc, whereas stateful sets are designed so that each pod is backed by its own storage. In that case a Deployment is more appropriate.