Pulsar: Bookies failing to start with Helm chart

Created on 11 Jun 2020  Â·  5Comments  Â·  Source: apache/pulsar

Describe the bug
The 2.6.0 candidate load cannot be upgraded using the project Helm chart. After upgrading, the bookies will not start because the startup parameters are invalid for the init container. This is the error message seen in the pulsar-bookkeeper-verify-clusterid container :

JMX enabled by default                                                                                                                                                 │
│ Error: Could not find or load main class "         

To Reproduce
Steps to reproduce the behavior:

Deploy a minimal setup using the project Helm chart using this file:

affinity:
  anti_affinity: false
components:
  functions: false
  proxy: false
  toolset: false
  pulsar_manager: false
namespace: default
monitoring:
  prometheus: false
  grafana: false
  node_exporter: false
  alert_manager: false
bookkeeper:
  replicaCount: 2
broker:
  replicaCount: 2

And this command:

helm install pulsar -f pulsar/overrides.yaml --set namespace=default ./pulsar/

This will bring up a cluster running 2.5.0.

Then try to upgrade just the bookies to a 2.6.0 candidate image by adding this to the override file:

images:
  bookie:
      repository: kafkaesqueio/pulsar-all-v2.6.0-candidate-1
      tag: latest
      pullPolicy: IfNotPresent

And running this command:

helm upgrade pulsar -f pulsar/overrides.yaml --set namespace=default ./pulsar/ 

The the bookie will not start because the init container never completes successfully.

Expected behavior
The upgrade should succeed.

Additional context
This is caused by https://github.com/apache/pulsar/pull/6579. With this change, the environment variable values are no longer automatically applied to bkenv.sh when calling apply-config-from-env.py. However, it looks like the bookkeeper shell whatisinstanceid depends on these variables to be set.

This pattern of calling the bookeeper shell in the init container is used throughout the Helm chart, so this will actually fail most of the components of the chart. I just noticed it first on bookies.

typbug

Most helpful comment

Just got this issue with clean 2.6.0 installation, but only in pulsar-recovery pod (inside init container)

020-06-21T21:57:17.304727748Z [conf/bookkeeper.conf] Applying config httpServerEnabled = true
2020-06-21T21:57:17.30478306Z [conf/bookkeeper.conf] Applying config httpServerPort = 8000
2020-06-21T21:57:17.304789783Z [conf/bookkeeper.conf] Applying config statsProviderClass = org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider
2020-06-21T21:57:17.30479629Z [conf/bookkeeper.conf] Applying config useHostNameAsBookieID = true
2020-06-21T21:57:17.304801054Z [conf/bookkeeper.conf] Applying config zkLedgersRootPath = /ledgers
2020-06-21T21:57:17.304805717Z [conf/bookkeeper.conf] Applying config zkServers = pu-pulsar-zookeeper:2181
2020-06-21T21:57:17.454698773Z JMX enabled by default
2020-06-21T21:57:19.346682488Z Error: Could not find or load main class "
2020-06-21T21:57:22.383129167Z JMX enabled by default

All 5 comments

With this change, the environment variable values are no longer automatically applied to bkenv.sh when calling apply-config-from-env.py.

apply-config-from-env.py is taking environment variables and applying them to the bkenv.sh. Since these variables are already environment variables, they can be imported when the script sources the bkenv.sh file. We should just use the bash scripts to take the system environments instead of using our scripts to apply those changes.

If there is a problem with the helm chart, we should update the helm chart.

Sure, the Helm chart can be updated.

Just got this issue with clean 2.6.0 installation, but only in pulsar-recovery pod (inside init container)

020-06-21T21:57:17.304727748Z [conf/bookkeeper.conf] Applying config httpServerEnabled = true
2020-06-21T21:57:17.30478306Z [conf/bookkeeper.conf] Applying config httpServerPort = 8000
2020-06-21T21:57:17.304789783Z [conf/bookkeeper.conf] Applying config statsProviderClass = org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider
2020-06-21T21:57:17.30479629Z [conf/bookkeeper.conf] Applying config useHostNameAsBookieID = true
2020-06-21T21:57:17.304801054Z [conf/bookkeeper.conf] Applying config zkLedgersRootPath = /ledgers
2020-06-21T21:57:17.304805717Z [conf/bookkeeper.conf] Applying config zkServers = pu-pulsar-zookeeper:2181
2020-06-21T21:57:17.454698773Z JMX enabled by default
2020-06-21T21:57:19.346682488Z Error: Could not find or load main class "
2020-06-21T21:57:22.383129167Z JMX enabled by default

@sijie @cdbartholomew it seems a similar issue was raised earlier https://github.com/apache/pulsar/issues/6355 and was fixed by changing - changing PULSAR_MEM to BOOKIE_MEM for bookie and auto-recovery configmap?. But in the current master bookkeeper is again using PULSAR_MEM... could this be the reason we are again seeing this issue.

@Lanayx @rvashishth: Please use the latest master of https://github.com/apache/pulsar-helm-chart which includes the change of apache/pulsar-helm-chart#26. We will release a helm chart and publish it to https://pulsar.apache.org/charts.

Was this page helpful?
0 / 5 - 0 ratings