Pulsar: k8s/EKS some Proxies fail to get Broker info from Zookeeper

Created on 5 Jan 2020  路  7Comments  路  Source: apache/pulsar

image

1 - seeing a lot of these errors during cluster init
2 - this is a response to e.g. bin/pulsar-admin topics list cb/test, which on the console returns HTTP 403 Forbidden

This only happens on 1 or 2 out of 4 total Proxies in the cluster. The healthy Proxies return a correct response.

componenk8s triagweek-2 typbug

Most helpful comment

Same issue happens for me with 2.4.2 cluster. Proxies don't work until manual restart.

All 7 comments

Same issue happens for me with 2.4.2 cluster. Proxies don't work until manual restart.

Thanks for the reporting. we will look into it. @Lanayx @youurayy Which yaml file is used to start your K8S?

I'm using the helm template, so proxy's yaml is here

@wolfstudy @zymap This looks like a start ordering issue. would you please help check this?

I have tried the following work-around in the helm template pulsar-manager-deployment.yaml file after wait-zookeeper-ready, and it does seem to be working:

        # This init container will wait for brokers to be ready before
        # deploying the proxies
      - name: wait-broker-ready
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        command: ["bash", "-c"]
        args:
          - >-
            for i in {0..{{ .Values.broker.replicaCount }}}; do
              if [[ `nslookup {{ template "pulsar.fullname" . }}-{{ .Values.broker.component }} | grep Name | wc -l` -ge {{ .Values.broker.replicaCount }} ]]; then
                break
              fi
              sleep 30;
            done;

Anyone has suggestions as to how this can be improved?

The approach is great. A small improvement you can make is to change -ge {{ .Values.broker.replicaCount }} to -ge 1.

in addition to the borker-waiting script above, I'm currently using this to wait for ZooKeeper:

{{- define "pulsar.waitZookeeperReady" -}}
set -o pipefail; CMD="bin/pulsar zookeeper-shell -server {{ template "pulsar.fullname" . }}-{{ .Values.zookeeper.component }} ls /admin/clusters"; until [ $( $CMD 2>&1 | tail -n 1 | grep -c {{ template "pulsar.fullname" . }} ) -eq 1 ]; do echo "waiting"; sleep 3; done;
{{- end }}

and using it with bash:

      - name: wait-zookeeper-ready
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        command: ["bash", "-c"]
        args:
          - >-
            {{ template "pulsar.waitZookeeperReady" . }}

the fact is that it's still very brittle and should really be done straight from Java with the use of java ZK client lib

p.s. my experience with the nslookup <service> method above is that it sometimes gets completely stuck

Was this page helpful?
0 / 5 - 0 ratings