Is this a request for help?: No
Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST
Version of Helm and Kubernetes:
Kubernetes v1.9.2
Helm v2.8.2
Which chart:
stable/percona-xtradb-cluster
What happened:
Install the chart, using persistent storage.
Write some data to the database.
Delete the StatefulSet.
Recreate the StatefulSet.
The first pod enters a CrashLoop due to safe-to-bootstrap protection
Manually edit the grastate.dat file within the volume mounted by the first node, and set safe_to_bootstrap to 1.
The StatefulSet properly gets created.
What you expected to happen:
Given that Kubernetes StatefulSets provide guarantees around the ordering of deletion of pods, I would expect that on a re-create, the SS should be able to properly start and bootstrap from the first pod.
How to reproduce it (as minimally and precisely as possible):
See above.
Anything else we need to know:
cc @stephenlawrence
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
This issue is being automatically closed due to inactivity.
Hi @skriss I'm facing the exact same issue.
percona-cluster-pxc-0 2/3 CrashLoopBackOff stepI didn't get your solution to this. Where is grastate.dat file exists and how can I edit it?
I'm using this job to edit grastate.dat. PXC cluster should be stopped first. Use proper claim name inside.
apiVersion: batch/v1
kind: Job
metadata:
name: safe-to-bootstrap
spec:
template:
spec:
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: mysql-data-pxc-0
containers:
- name: safe-to-bootstrap
image: busybox
imagePullPolicy: IfNotPresent
command:
- sed
- -i
- "s|safe_to_bootstrap.*:.*|safe_to_bootstrap:1|1"
- /var/lib/mysql/grastate.dat
volumeMounts:
- mountPath: /var/lib/mysql
name: mysql-data
restartPolicy: OnFailure
I think this is still a issue. What if the first node's grastate.dat contains seqno: -1 and is thus not the best node to bootstrap from because it doesn't contain the last version of the database. What if it is the third node in the statefulset that contains the last version of the data. Maybe you can bootstrap the cluster with the third node if you change pod management policy to parallel. I don't think the issue is solved by just executing the job above, because there is a risk of data loss.
I got same problem. It can't pass our failover test.
Most helpful comment
I'm using this job to edit
grastate.dat. PXC cluster should be stopped first. Use proper claim name inside.