Is this a request for help?:
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug
Version of Helm and Kubernetes:
1.12.4
Which chart:
https://github.com/helm/charts/tree/master/stable/elastic-stack
What happened:
fantastic-spaniel-logstash-0 0/1 CrashLoopBackOff 6 8m5s
Logstash statefulset never succeed
kubectl describe pod shows liveness and readiness probe failed:
Normal Scheduled 9m12s default-scheduler Successfully assigned default/fantastic-spaniel-logstash-0 to aks-agentpool-34733733-3
Normal Pulled 7m22s (x3 over 9m2s) kubelet, aks-agentpool-34733733-3 Container image "docker.elastic.co/logstash/logstash-oss:6.6.0" already present on machine
Normal Created 7m22s (x3 over 9m2s) kubelet, aks-agentpool-34733733-3 Created container
Normal Started 7m21s (x3 over 9m1s) kubelet, aks-agentpool-34733733-3 Started container
Warning Unhealthy 6m55s (x7 over 8m35s) kubelet, aks-agentpool-34733733-3 Readiness probe failed: Get http://10.240.0.15:9600/: dial tcp 10.240.0.15:9600: connect: connection refused
Warning Unhealthy 6m52s (x7 over 8m32s) kubelet, aks-agentpool-34733733-3 Liveness probe failed: Get http://10.240.0.15:9600/: dial tcp 10.240.0.15:9600: connect: connection refused
Normal Killing 4m2s (x6 over 8m12s) kubelet, aks-agentpool-34733733-3 Killing container with id docker://logstash:Container failed liveness probe.. Container will be killed and recreated.
kubectl logs shows:
june@Azure:~$ kubectl logs fantastic-spaniel-logstash-0
2019/03/07 06:27:45 Setting 'path.config' from environment.
2019/03/07 06:27:45 Setting 'queue.max_bytes' from environment.
2019/03/07 06:27:45 Setting 'queue.drain' from environment.
2019/03/07 06:27:45 Setting 'http.port' from environment.
2019/03/07 06:27:45 Setting 'http.host' from environment.
2019/03/07 06:27:45 Setting 'path.data' from environment.
2019/03/07 06:27:45 Setting 'queue.checkpoint.writes' from environment.
2019/03/07 06:27:45 Setting 'queue.type' from environment.
2019/03/07 06:27:45 Setting 'config.reload.automatic' from environment.
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
I have tried killed the pod multiples time but never succeed.
What you expected to happen:
The logstash should workd otherwise the helm charts is not working.
How to reproduce it (as minimally and precisely as possible):
Deploy a k8s with version 1.12.4 and helm install stable/elastic-stack
Anything else we need to know:
Any comments will be highly appreciated
Same here!
same here with stable/logstash forcing version to docker.elastic.co/logstash/logstash-oss:6.6.0
Was working fine last month.
On working stable/logstash :
chart: logstash-1.5.0
image: docker.elastic.co/logstash/logstash-oss:6.6.0
- env:
- name: HTTP_HOST
value: 0.0.0.0
- name: HTTP_PORT
value: "9600"
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: monitor
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
curl http://localhost:9600
{"host":"logstash-0.logstash.logging.svc.cluster.local","version":"6.6.0","http_address":"0.0.0.0:9600","id":"00390d51-8040-404d-9f7e-676a8b7b224b","name":"logstash-0.logstash.logging.svc.cluster.local","build_date":"2019-01-24T12:13:56+00:00","build_sha":"e4390be7e4d511af9d48bc503c9dcc15b03d3bce","build_snapshot":false}
On non working stable/logstash :
chart: logstash-1.5.2
image: docker.elastic.co/logstash/logstash-oss:6.6.0
- env:
- name: HTTP_HOST
value: 0.0.0.0
- name: HTTP_PORT
value: "9600"
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: monitor
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
curl http://localhost:9600
curl: (7) Failed connect to localhost:9600; Connection refused
I tried several things, and removing the probes I noticed I had a creating queue problem. It was due to Azure not deleting entirely the Azure disk of file behind the Kubernetes PVC. So a new PVC with the same as using the same Azure disk and there was a problem reusing the existing queue.
I deleted the disks/files, and retried. It still was crash looping, so I removed the probes again, and the logs show that logstash was ready after about 1 minute and 30 seconds. So the 20 seconds delay configured is not enough in my case.
Try to set initialDelaySeconds: 120 or remove the probes and watch the logs.
$ kubectl logs logstash-0 -n logging
2019/03/13 14:18:33 Setting 'queue.max_bytes' from environment.
...
[2019-03-13T14:20:00,295][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
Hello, I tried the solution and it worked well (rebmote the liveness probe).
But I don't understand why the port 9600 doesn't work anymore more. I did the deployment one month ago and it was working.
Does anyone have an explaination why the port 9600 is not working anymore?
Hi,
I was able to avoid the probe failed by setting an initialDelaySeconds of 60.
readinessProbe:
httpGet:
path: /
port: monitor
initialDelaySeconds: 60
# periodSeconds: 30
timeoutSeconds: 30
# failureThreshold: 6
# successThreshold: 1
livenessProbe:
httpGet:
path: /
port: monitor
initialDelaySeconds: 60
# periodSeconds: 30
timeoutSeconds: 30
# failureThreshold: 6
# successThreshold: 1
Proof it started:
[2019-09-28T21:08:27,234][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2019-09-28T21:08:28,119][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
Most helpful comment
Hi,
I was able to avoid the probe failed by setting an initialDelaySeconds of 60.
Proof it started: