
1 - seeing a lot of these errors during cluster init
2 - this is a response to e.g. bin/pulsar-admin topics list cb/test, which on the console returns HTTP 403 Forbidden
This only happens on 1 or 2 out of 4 total Proxies in the cluster. The healthy Proxies return a correct response.
Same issue happens for me with 2.4.2 cluster. Proxies don't work until manual restart.
Thanks for the reporting. we will look into it. @Lanayx @youurayy Which yaml file is used to start your K8S?
I'm using the helm template, so proxy's yaml is here
@wolfstudy @zymap This looks like a start ordering issue. would you please help check this?
I have tried the following work-around in the helm template pulsar-manager-deployment.yaml file after wait-zookeeper-ready, and it does seem to be working:
# This init container will wait for brokers to be ready before
# deploying the proxies
- name: wait-broker-ready
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: ["bash", "-c"]
args:
- >-
for i in {0..{{ .Values.broker.replicaCount }}}; do
if [[ `nslookup {{ template "pulsar.fullname" . }}-{{ .Values.broker.component }} | grep Name | wc -l` -ge {{ .Values.broker.replicaCount }} ]]; then
break
fi
sleep 30;
done;
Anyone has suggestions as to how this can be improved?
The approach is great. A small improvement you can make is to change -ge {{ .Values.broker.replicaCount }} to -ge 1.
in addition to the borker-waiting script above, I'm currently using this to wait for ZooKeeper:
{{- define "pulsar.waitZookeeperReady" -}}
set -o pipefail; CMD="bin/pulsar zookeeper-shell -server {{ template "pulsar.fullname" . }}-{{ .Values.zookeeper.component }} ls /admin/clusters"; until [ $( $CMD 2>&1 | tail -n 1 | grep -c {{ template "pulsar.fullname" . }} ) -eq 1 ]; do echo "waiting"; sleep 3; done;
{{- end }}
and using it with bash:
- name: wait-zookeeper-ready
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: ["bash", "-c"]
args:
- >-
{{ template "pulsar.waitZookeeperReady" . }}
the fact is that it's still very brittle and should really be done straight from Java with the use of java ZK client lib
p.s. my experience with the nslookup <service> method above is that it sometimes gets completely stuck
Most helpful comment
Same issue happens for me with 2.4.2 cluster. Proxies don't work until manual restart.