Charts: Logstash liveiness and readiness probe Failed 9600

Created on 7 Mar 2019 · 9Comments · Source: helm/charts

Is this a request for help?:

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Bug

Version of Helm and Kubernetes:
1.12.4

Which chart:
https://github.com/helm/charts/tree/master/stable/elastic-stack

What happened:
fantastic-spaniel-logstash-0 0/1 CrashLoopBackOff 6 8m5s

Logstash statefulset never succeed

kubectl describe pod shows liveness and readiness probe failed:
Normal Scheduled 9m12s default-scheduler Successfully assigned default/fantastic-spaniel-logstash-0 to aks-agentpool-34733733-3
Normal Pulled 7m22s (x3 over 9m2s) kubelet, aks-agentpool-34733733-3 Container image "docker.elastic.co/logstash/logstash-oss:6.6.0" already present on machine
Normal Created 7m22s (x3 over 9m2s) kubelet, aks-agentpool-34733733-3 Created container
Normal Started 7m21s (x3 over 9m1s) kubelet, aks-agentpool-34733733-3 Started container
Warning Unhealthy 6m55s (x7 over 8m35s) kubelet, aks-agentpool-34733733-3 Readiness probe failed: Get http://10.240.0.15:9600/: dial tcp 10.240.0.15:9600: connect: connection refused
Warning Unhealthy 6m52s (x7 over 8m32s) kubelet, aks-agentpool-34733733-3 Liveness probe failed: Get http://10.240.0.15:9600/: dial tcp 10.240.0.15:9600: connect: connection refused
Normal Killing 4m2s (x6 over 8m12s) kubelet, aks-agentpool-34733733-3 Killing container with id docker://logstash:Container failed liveness probe.. Container will be killed and recreated.

kubectl logs shows:
june@Azure:~$ kubectl logs fantastic-spaniel-logstash-0
2019/03/07 06:27:45 Setting 'path.config' from environment.
2019/03/07 06:27:45 Setting 'queue.max_bytes' from environment.
2019/03/07 06:27:45 Setting 'queue.drain' from environment.
2019/03/07 06:27:45 Setting 'http.port' from environment.
2019/03/07 06:27:45 Setting 'http.host' from environment.
2019/03/07 06:27:45 Setting 'path.data' from environment.
2019/03/07 06:27:45 Setting 'queue.checkpoint.writes' from environment.
2019/03/07 06:27:45 Setting 'queue.type' from environment.
2019/03/07 06:27:45 Setting 'config.reload.automatic' from environment.
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N

I have tried killed the pod multiples time but never succeed.

What you expected to happen:
The logstash should workd otherwise the helm charts is not working.

How to reproduce it (as minimally and precisely as possible):
Deploy a k8s with version 1.12.4 and helm install stable/elastic-stack

Anything else we need to know:
Any comments will be highly appreciated

lifecyclstale

Source

JuneZhao

Most helpful comment

Hi,

I was able to avoid the probe failed by setting an initialDelaySeconds of 60.

readinessProbe:
  httpGet:
    path: /
    port: monitor
  initialDelaySeconds: 60
  # periodSeconds: 30
  timeoutSeconds: 30
  # failureThreshold: 6
  # successThreshold: 1
livenessProbe:
  httpGet:
    path: /
    port: monitor
  initialDelaySeconds: 60
  # periodSeconds: 30
  timeoutSeconds: 30
  # failureThreshold: 6
  # successThreshold: 1

Proof it started:

[2019-09-28T21:08:27,234][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2019-09-28T21:08:28,119][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

kylos101 on 28 Sep 2019

👍4

All 9 comments

@rendhalver@jar361@christian-roggia

JuneZhao on 7 Mar 2019

Same here!

melalj on 12 Mar 2019

👍3

same here with stable/logstash forcing version to docker.elastic.co/logstash/logstash-oss:6.6.0

Was working fine last month.

mleneveut on 13 Mar 2019

On working stable/logstash :

chart: logstash-1.5.0
image: docker.elastic.co/logstash/logstash-oss:6.6.0
- env:
        - name: HTTP_HOST
          value: 0.0.0.0
        - name: HTTP_PORT
          value: "9600"
livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: monitor
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
resources: {}

curl http://localhost:9600
{"host":"logstash-0.logstash.logging.svc.cluster.local","version":"6.6.0","http_address":"0.0.0.0:9600","id":"00390d51-8040-404d-9f7e-676a8b7b224b","name":"logstash-0.logstash.logging.svc.cluster.local","build_date":"2019-01-24T12:13:56+00:00","build_sha":"e4390be7e4d511af9d48bc503c9dcc15b03d3bce","build_snapshot":false}

On non working stable/logstash :

chart: logstash-1.5.2
image: docker.elastic.co/logstash/logstash-oss:6.6.0
- env:
        - name: HTTP_HOST
          value: 0.0.0.0
        - name: HTTP_PORT
          value: "9600"
livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: monitor
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

curl http://localhost:9600
curl: (7) Failed connect to localhost:9600; Connection refused

mleneveut on 13 Mar 2019

I tried several things, and removing the probes I noticed I had a creating queue problem. It was due to Azure not deleting entirely the Azure disk of file behind the Kubernetes PVC. So a new PVC with the same as using the same Azure disk and there was a problem reusing the existing queue.

I deleted the disks/files, and retried. It still was crash looping, so I removed the probes again, and the logs show that logstash was ready after about 1 minute and 30 seconds. So the 20 seconds delay configured is not enough in my case.

Try to set initialDelaySeconds: 120 or remove the probes and watch the logs.

$ kubectl logs logstash-0 -n logging
2019/03/13 14:18:33 Setting 'queue.max_bytes' from environment.
...
[2019-03-13T14:20:00,295][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

mleneveut on 13 Mar 2019

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] on 12 Apr 2019

This issue is being automatically closed due to inactivity.

stale[bot] on 26 Apr 2019

Hello, I tried the solution and it worked well (rebmote the liveness probe).
But I don't understand why the port 9600 doesn't work anymore more. I did the deployment one month ago and it was working.
Does anyone have an explaination why the port 9600 is not working anymore?

fredericksales on 2 Aug 2019

Hi,

I was able to avoid the probe failed by setting an initialDelaySeconds of 60.

readinessProbe:
  httpGet:
    path: /
    port: monitor
  initialDelaySeconds: 60
  # periodSeconds: 30
  timeoutSeconds: 30
  # failureThreshold: 6
  # successThreshold: 1
livenessProbe:
  httpGet:
    path: /
    port: monitor
  initialDelaySeconds: 60
  # periodSeconds: 30
  timeoutSeconds: 30
  # failureThreshold: 6
  # successThreshold: 1

Proof it started:

[2019-09-28T21:08:27,234][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2019-09-28T21:08:28,119][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

kylos101 on 28 Sep 2019

👍4

Was this page helpful?

0 / 5 - 0 ratings