Charts: [stable/mongodb-replicaset] 2nd Pod stuck at bootstrap/add to replicaset (replicas=3 in my case)

Created on 29 Aug 2018 · 12Comments · Source: helm/charts

Is this a request for help?: Yes

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Version of Helm and Kubernetes:

Kubernetes (GKE):
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.6-gke.2", GitCommit:"384b4eaa132ca9a295fcb3e5dfc74062b257e7df", GitTreeState:"clean", BuildDate:"2018-08-15T00:10:14Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Helm:
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}

Which chart:

stable/mongodb-replicaset

What happened:

Summary: 1st Pod started correct. 2nd Pod is stuck at "Init:2/3"

Details:
I wanna setup a mongodb replicaset with 3 replicas. I want to use authentication (secrets providing usernames / passwords already exists.) and TLS with X509 for authentication.
Here are the contents of my values.yaml file used for helm install:

replicas: 3
port: 27018

replicaSetName: rs0

podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 2

auth:
  enabled: true
  adminUser: admin
  adminPassword: pass1234
  metricsUser: metrics
  metricsPassword: pass1234
  key: abcdefghijklmnopqrstuvwxyz1234567890
  #existingKeySecret:
  #existingAdminSecret:
  #exisitingMetricsSecret:

# Specs for the Docker image for the init container that establishes the replica set
installImage:
  repository: k8s.gcr.io/mongodb-install
  tag: 0.6

  pullPolicy: IfNotPresent

# Specs for the MongoDB image
image:
  repository: mongo
  #tag: 3.6
  tag: latest
  pullPolicy: IfNotPresent

# Additional environment variables to be set in the container
extraVars: {}
# - name: TCMALLOC_AGGRESSIVE_DECOMMIT
#   value: "true"

# Prometheus Metrics Exporter
metrics:
  enabled: false
  image:
    repository: ssalaues/mongodb-exporter
    tag: 0.6.1
    pullPolicy: IfNotPresent
  port: 9216
  path: "/metrics"
  socketTimeout: 3s
  syncTimeout: 1m
  prometheusServiceDiscovery: true
  resources: {}

# Annotations to be added to MongoDB pods
podAnnotations: {}

securityContext:
  runAsUser: 999
  fsGroup: 999
  runAsNonRoot: true

resources:
  limits:
  #   cpu: 100m
    memory: 512Mi
  requests:
#   cpu: 100m
    memory: 256Mi

## Node selector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector: {}

affinity: {}

tolerations: []

extraLabels: {}

persistentVolume:
  enabled: true
  ## mongodb-replicaset data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: "standard"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {}

# Annotations to be added to the service
serviceAnnotations: {}

tls:
  # Enable or disable MongoDB TLS support
  enabled: true
  # Please generate your own TLS CA by generating it via:
  # $ openssl genrsa -out ca.key 2048
  # $ openssl req -x509 -new -nodes -key ca.key -days 10000 -out ca.crt -subj "/CN=mydomain.com"
  # After that you can base64 encode it and paste it here:
  # $ cat ca.key | base64 -w0
  cacert: base64 encoded ca certificate goes here
  cakey: base64 encoded ca key goes here

# Entries for the MongoDB config file
configmap:
  storage:
    dbPath: /data/db
  net:
    port: 27018
    ssl:
      mode: requireSSL
      CAFile: /data/configdb/tls.crt
      PEMKeyFile: /work-dir/mongo.pem
  replication:
    replSetName: rs0
  security:
    authorization: enabled
    clusterAuthMode: x509
    keyFile: /data/configdb/key.txt

# Readiness probe
readinessProbe:
  initialDelaySeconds: 5
  timeoutSeconds: 1
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

# Liveness probe
livenessProbe:
  initialDelaySeconds: 30
  timeoutSeconds: 5
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

I've installed the chart using:

helm install --name mongo-test -f values.yaml stable/mongodb-replicaset

helm installs first without any problems - no error messages during install:

NAME:   mongo-test
LAST DEPLOYED: Wed Aug 29 16:40:43 2018
NAMESPACE: b2c
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                                   TYPE               DATA  AGE
mongo-test-mongodb-replicaset-admin    Opaque             2     0s
mongo-test-mongodb-replicaset-ca       kubernetes.io/tls  2     0s
mongo-test-mongodb-replicaset-keyfile  Opaque             1     0s

==> v1/ConfigMap
NAME                                   DATA  AGE
mongo-test-mongodb-replicaset-init     1     0s
mongo-test-mongodb-replicaset-mongodb  1     0s
mongo-test-mongodb-replicaset-tests    1     0s

==> v1/Service
NAME                           TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)    AGE
mongo-test-mongodb-replicaset  ClusterIP  None        <none>       27018/TCP  0s

==> v1beta2/StatefulSet
NAME                           DESIRED  CURRENT  AGE
mongo-test-mongodb-replicaset  3        1        0s

==> v1/Pod(related)
NAME                             READY  STATUS   RESTARTS  AGE
mongo-test-mongodb-replicaset-0  0/1    Pending  0         0s


NOTES:
1. After the statefulset is created completely, one can check which instance is primary by running:

    $ for ((i = 0; i < 3; ++i)); do kubectl exec --namespace b2c mongo-test-mongodb-replicaset-$i -- sh -c 'mongo --eval="printjson(rs.isMaster())"'; done

2. One can insert a key into the primary instance of the mongodb replica set by running the following:
    MASTER_POD_NAME must be replaced with the name of the master found from the previous step.

    $ kubectl exec --namespace b2c MASTER_POD_NAME -- mongo --eval="printjson(db.test.insert({key1: 'value1'}))"

3. One can fetch the keys stored in the primary or any of the slave nodes in the following manner.
    POD_NAME must be replaced by the name of the pod being queried.

    $ kubectl exec --namespace b2c POD_NAME -- mongo --eval="rs.slaveOk(); db.test.find().forEach(printjson)"

The 1st Pod is then started correctly and without any problems.
2nd Pod is stuck at "Init:2/3"

NAME                                          READY     STATUS     RESTARTS   AGE
po/mongo-test-mongodb-replicaset-0            1/1       Running    0          5m
po/mongo-test-mongodb-replicaset-1            0/1       Init:2/3   0          5m

when I connect to mongo-test-mongodb-replicaset-1 -c bootstrap I can find the following inside /work-dir/log.txt:

mongodb@mongo-test-mongodb-replicaset-1:/work-dir$ more log.txt
[2018-08-29T14:41:51,684293796+00:00] [on-start.sh] Bootstrapping MongoDB replica set member: mongo-test-mongodb-replicaset-1
[2018-08-29T14:41:51,687394595+00:00] [on-start.sh] Reading standard input...
[2018-08-29T14:41:51,688594499+00:00] [on-start.sh] Generating certificate
[2018-08-29T14:41:51,951181683+00:00] [on-start.sh] Peers: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:51,952080311+00:00] [on-start.sh] Starting a MongoDB instance...
[2018-08-29T14:41:51,953075555+00:00] [on-start.sh] Waiting for MongoDB to be ready...
2018-08-29T14:41:52.020+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] MongoDB starting : pid=30 port=27017 dbpath=/data/db 64-bit host=mongo-test-mongodb-replicaset-1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] db version v4.0.1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] modules: none
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] build environment:
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] options: { config: "/data/configdb/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017, ssl: { CAFile: "/data/configdb/tls.crt", PEMKeyFile: "/work-dir/mongo.pem", mode: "requireSS
L" } }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/data/configdb/key.txt" }, storage: { dbPath: "/data/db" } }
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten]
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-29T14:41:52.048+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1337M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=jou
rnal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
[2018-08-29T14:41:52,083645436+00:00] [on-start.sh] Retrying...
2018-08-29T14:41:52.789+0000 I STORAGE  [initandlisten] WiredTiger message [1535553712:789699][30:0x7fd33c091a00], txn-recover: Set global recovery timestamp: 0
2018-08-29T14:41:52.800+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2018-08-29T14:41:52.819+0000 I STORAGE  [initandlisten] createCollection: local.startup_log with generated UUID: 2a15f25a-5f7b-47d3-b1a3-2338677428d0
2018-08-29T14:41:52.832+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-08-29T14:41:52.833+0000 I STORAGE  [initandlisten] createCollection: local.replset.oplogTruncateAfterPoint with generated UUID: ae87a21f-d2dc-4474-b4de-d70d95b7a2a8
2018-08-29T14:41:52.847+0000 I STORAGE  [initandlisten] createCollection: local.replset.minvalid with generated UUID: ffaf5c08-356c-4ed0-b4e0-1b3d9cdeea92
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local voted for document at startup.
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local Rollback ID document at startup. Creating one.
2018-08-29T14:41:52.866+0000 I STORAGE  [initandlisten] createCollection: local.system.rollback.id with generated UUID: 6e3e4fc7-b821-4df6-9c32-7db2af4a3bc4
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Initialized the rollback ID to 1
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I NETWORK  [initandlisten] waiting for connections on port 27017 ssl
2018-08-29T14:41:54.148+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50022 #1 (1 connection now open)
2018-08-29T14:41:54.154+0000 I ACCESS   [conn1] note: no users configured in admin.system.users, allowing localhost access
2018-08-29T14:41:54.154+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:50022 conn1: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.157+0000 I ACCESS   [conn1] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("95760764-878a-4dd0-8ce6-470182c48a3a") } ], $db: "admin" }
2018-08-29T14:41:54.158+0000 I NETWORK  [conn1] end connection 127.0.0.1:50022 (0 connections now open)
[2018-08-29T14:41:54,162283712+00:00] [on-start.sh] Initialized.
[2018-08-29T14:41:54,267062979+00:00] [on-start.sh] Found master: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:54,268004950+00:00] [on-start.sh] Adding myself (mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local) to replica set...
2018-08-29T14:41:54.368+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50550 #2 (1 connection now open)
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50550 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
2018-08-29T14:41:54.372+0000 I NETWORK  [conn2] end connection 10.43.67.35:50550 (0 connections now open)
2018-08-29T14:41:54.375+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50552 #3 (1 connection now open)
2018-08-29T14:41:54.378+0000 I NETWORK  [conn3] received client metadata from 10.43.67.35:50552 conn3: { driver: { name: "NetworkInterfaceTL", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16
.04" } }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50552 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
[2018-08-29T14:41:57,388632734+00:00] [on-start.sh] Waiting for replica to reach SECONDARY state...
2018-08-29T14:41:57.441+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50032 #4 (2 connections now open)
2018-08-29T14:41:57.446+0000 I NETWORK  [conn4] received client metadata from 127.0.0.1:50032 conn4: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50032 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:57.450+0000 I ACCESS   [conn4] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("38098891-2b34-46eb-aef2-53a69416671f") } ], $db: "admin" }
2018-08-29T14:41:57.451+0000 I NETWORK  [conn4] end connection 127.0.0.1:50032 (1 connection now open)
2018-08-29T14:41:58.504+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50034 #5 (2 connections now open)
2018-08-29T14:41:58.508+0000 I NETWORK  [conn5] received client metadata from 127.0.0.1:50034 conn5: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:58.509+0000 I ACCESS   [conn5] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:58.510+0000 I ACCESS   [conn5] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50034 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:58.511+0000 I ACCESS   [conn5] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("e1089881-4c47-450c-9721-6b291c6f0e50") } ], $db: "admin" }
2018-08-29T14:41:58.512+0000 I NETWORK  [conn5] end connection 127.0.0.1:50034 (1 connection now open)
2018-08-29T14:41:59.574+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50036 #6 (2 connections now open)
2018-08-29T14:41:59.582+0000 I NETWORK  [conn6] received client metadata from 127.0.0.1:50036 conn6: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }

Meanwhile the other - successfully started - pod mongo-test-mongodb-replicaset-0 has this entries in its logfile:

2018-08-29T14:41:22.973+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-29T14:41:22.980+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-test-mongodb-replicaset-0
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] db version v4.0.1
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] modules: none
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] build environment:
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] options: { config: "/data/configdb/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017, ssl: { CAFile: "/data/configdb/tls.crt", PEMKeyFile: "/work-dir/mongo.pem", mode: "requireSSL" } }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/data/configdb/key.txt" }, storage: { dbPath: "/data/db" } }
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten]
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1337M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-08-29T14:41:23.949+0000 I STORAGE  [initandlisten] WiredTiger message [1535553683:949243][1:0x7fee8d75da00], txn-recover: Main recovery loop: starting at 1/49280
2018-08-29T14:41:23.949+0000 I STORAGE  [initandlisten] WiredTiger message [1535553683:949915][1:0x7fee8d75da00], txn-recover: Recovering log 1 through 2
2018-08-29T14:41:24.043+0000 I STORAGE  [initandlisten] WiredTiger message [1535553684:43680][1:0x7fee8d75da00], txn-recover: Recovering log 2 through 2
2018-08-29T14:41:24.119+0000 I STORAGE  [initandlisten] WiredTiger message [1535553684:119960][1:0x7fee8d75da00], txn-recover: Set global recovery timestamp: 5b86b08600000004
2018-08-29T14:41:24.153+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1535553670, 4)
2018-08-29T14:41:24.153+0000 I STORAGE  [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1535553670, 4) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1535553670, 4)
2018-08-29T14:41:24.165+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-08-29T14:41:24.166+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 10 records totaling to 2325 bytes
2018-08-29T14:41:24.166+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2018-08-29T14:41:24.209+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Rollback ID is 1
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1535553670, 4) (top of oplog: { ts: Timestamp(1535553670, 4), t: 1 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1535553670, 4)
2018-08-29T14:41:24.212+0000 I REPL     [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
2018-08-29T14:41:24.212+0000 I REPL     [replexec-0] New replica set config in use: { _id: "rs0", version: 1, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5b86b0825d695f321fd3c7e6') } }
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] This node is mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 in the config
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] transition to STARTUP2 from STARTUP
2018-08-29T14:41:24.213+0000 I NETWORK  [initandlisten] waiting for connections on port 27017 ssl
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] Starting replication storage threads
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] transition to RECOVERING from STARTUP2
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] Starting replication fetcher thread
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] Starting replication applier thread
2018-08-29T14:41:24.215+0000 I REPL     [replexec-0] Starting replication reporter thread
2018-08-29T14:41:24.214+0000 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for rs0/mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017
2018-08-29T14:41:24.215+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37744 #2 (1 connection now open)
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] Starting oplog application
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] transition to SECONDARY from RECOVERING
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] conducting a dry run election to see if we could be elected. current term: 1
2018-08-29T14:41:24.216+0000 I REPL     [replexec-0] dry election run succeeded, running for election in term 2
2018-08-29T14:41:24.218+0000 I REPL     [replexec-0] election succeeded, assuming primary role in term 2
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] transition to PRIMARY from SECONDARY
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Entering primary catch-up mode.
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Exited primary catch-up mode.
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Stopping replication producer
2018-08-29T14:41:24.222+0000 I NETWORK  [conn2] received client metadata from 10.43.67.35:37744 conn2: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:24.222+0000 I NETWORK  [LogicalSessionCacheRefresh] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (1 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 5 second timeout)
2018-08-29T14:41:24.222+0000 W NETWORK  [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2018-08-29T14:41:24.723+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:25.224+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:25.724+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:26.217+0000 I REPL     [rsSync-0] transition to primary complete; database writes are now permitted
2018-08-29T14:41:26.225+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37748 #4 (2 connections now open)
2018-08-29T14:41:26.226+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37750 #6 (3 connections now open)
2018-08-29T14:41:26.233+0000 I NETWORK  [conn6] received client metadata from 10.43.67.35:37750 conn6: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:26.233+0000 I NETWORK  [LogicalSessionCacheRefresh] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (1 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 0 second timeout)
2018-08-29T14:41:26.233+0000 I NETWORK  [conn4] received client metadata from 10.43.67.35:37748 conn4: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:26.234+0000 I NETWORK  [LogicalSessionCacheReap] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (2 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 0 second timeout)
2018-08-29T14:41:26.234+0000 I ACCESS   [conn6]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:26.235+0000 I ACCESS   [conn6] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:37750 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.235+0000 I NETWORK  [LogicalSessionCacheRefresh] scoped connection to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 not being returned to the pool
2018-08-29T14:41:26.235+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.235+0000 I ACCESS   [conn4]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:26.236+0000 I ACCESS   [conn4] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:37748 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.236+0000 I NETWORK  [LogicalSessionCacheReap] scoped connection to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 not being returned to the pool
2018-08-29T14:41:26.236+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.236+0000 I NETWORK  [conn6] end connection 10.43.67.35:37750 (2 connections now open)
2018-08-29T14:41:26.236+0000 I NETWORK  [conn4] end connection 10.43.67.35:37748 (1 connection now open)
2018-08-29T14:41:30.295+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52056 #7 (2 connections now open)
2018-08-29T14:41:30.300+0000 I NETWORK  [conn7] received client metadata from 127.0.0.1:52056 conn7: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:30.304+0000 I NETWORK  [conn7] end connection 127.0.0.1:52056 (1 connection now open)
2018-08-29T14:41:40.300+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52072 #8 (2 connections now open)
2018-08-29T14:41:40.304+0000 I NETWORK  [conn8] received client metadata from 127.0.0.1:52072 conn8: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:40.308+0000 I NETWORK  [conn8] end connection 127.0.0.1:52072 (1 connection now open)
2018-08-29T14:41:50.293+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52090 #9 (2 connections now open)
2018-08-29T14:41:50.297+0000 I NETWORK  [conn9] received client metadata from 127.0.0.1:52090 conn9: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:50.301+0000 I NETWORK  [conn9] end connection 127.0.0.1:52090 (1 connection now open)
2018-08-29T14:41:54.219+0000 I NETWORK  [listener] connection accepted from 10.43.66.21:49738 #10 (2 connections now open)
2018-08-29T14:41:54.224+0000 I NETWORK  [conn10] received client metadata from 10.43.66.21:49738 conn10: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.256+0000 I ACCESS   [conn10] Successfully authenticated as principal admin on admin
2018-08-29T14:41:54.261+0000 I NETWORK  [conn10] end connection 10.43.66.21:49738 (1 connection now open)
2018-08-29T14:41:54.325+0000 I NETWORK  [listener] connection accepted from 10.43.66.21:49740 #11 (2 connections now open)
2018-08-29T14:41:54.330+0000 I NETWORK  [conn11] received client metadata from 10.43.66.21:49740 conn11: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.362+0000 I ACCESS   [conn11] Successfully authenticated as principal admin on admin
2018-08-29T14:41:54.366+0000 I REPL     [conn11] replSetReconfig admin command received from client; new config: { _id: "rs0", version: 2, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5b86b0825d695f321fd3c7e6') } }
2018-08-29T14:41:54.372+0000 I NETWORK  [conn11] can't authenticate to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 as internal user, error: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:54.372+0000 I REPL     [conn11] replSetReconfig config object with 2 members parses ok
2018-08-29T14:41:54.372+0000 I ASIO     [Replication] Connecting to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017
2018-08-29T14:41:54.380+0000 W REPL     [replexec-1] Got error (Unauthorized: command replSetHeartbeat requires authentication) response on heartbeat request to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017; { ok: 1.0, hbmsg: "" }
2018-08-29T14:41:54.380+0000 E REPL     [conn11] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017; the following nodes did not respond affirmatively: mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 failed with command replSetHeartbeat requires authentication
2018-08-29T14:41:54.383+0000 I NETWORK  [conn11] end connection 10.43.66.21:49740 (1 connection now open)
2018-08-29T14:41:58.387+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52106 #14 (2 connections now open)
2018-08-29T14:41:58.391+0000 I NETWORK  [conn14] received client metadata from 127.0.0.1:52106 conn14: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:58.395+0000 I NETWORK  [conn14] end connection 127.0.0.1:52106 (1 connection now open)
2018-08-29T14:42:00.407+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52118 #15 (2 connections now open)
2018-08-29T14:42:00.416+0000 I NETWORK  [conn15] received client metadata from 127.0.0.1:52118 conn15: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }

from what I understand, everything goes well until the second mongo instance tries to become part of the replica set - when the primary tries to connect to the new cluster member, this connection fails and therefore the whole setup process is stuck

What you expected to happen:

Setup process should finish and successfully setup a replica set consisting of 3 (in this case) replicas.

How to reproduce it (as minimally and precisely as possible):

take my values.yaml and just run the helm install command (adapt names if needed)

Anything else we need to know:
I've also tried with Mongo 3.6 - same outcome.

lifecyclstale

Source

rmuehlbauer

👍11

Most helpful comment

I can confirm - commenting out "clusterAuthMode: x509" prevents running into this problem but of course also disables x509 member authentication...

But maybe I have some kind of good news for you:
I think I've found the root cause for this issue.
During the init/bootstrap process x509 certificates are generated - those certificates are also used for member authentication in the replicaset.
The generated certificates are built with openssl and the subject parameter looks like "-subj "/CN=$my_hostname" (you can find this in "on-start.sh" line 90) - mongo documentation is saying the following about certificates used for member authentication:

"...Certificat Requirements: The Distinguished Name (DN), found in the member certificate’s subject, must specify a non-empty value for at least one of the following attributes: Organization (O), the Organizational Unit (OU) or the Domain Component (DC)."

(taken from: https://docs.mongodb.com/manual/tutorial/configure-x509-member-authentication/)

But wait - we only put a "CN=..." into our subject line!
So I changed the subject parameter in "on-start.sh" (line 90 again) to something like "-subj "/CN=$my_hostname/OU=it/O=your company" which makes the complete openssl command look like:

openssl req -new -key mongo.key -out mongo.csr -subj "/CN=$my_hostname/OU=it/O=your company" -config openssl.cnf

and fired up helm again (this time of course I'm using the locally patched chart for installation and not stable/mongodb-replicaset)

This time the chart worked liked intended - the replicaset was correctly bootstraped and in the end (in my case) there were 3 replicas up & running which are using x509 for membership and client authentication.

Maybe you guys could try and see if this hack also works in your case?

rmuehlbauer on 18 Sep 2018

👍5 🎉2

All 12 comments

seems not many people are facing this issue....I've also posted this problem over there on Stackoverflow - also zero response there

rmuehlbauer on 5 Sep 2018

I think this is probably a Duplicate of #6522

euan-tilley on 5 Sep 2018

I've also found #6522 but I think its a different issue. In #6522 they cant get any mongodb pod running at all.
My problem is that the second (and third) pods/mongodb's are not able to join the successfully created replicaset because there is something wrong about the user that is used to establish the connection.

rmuehlbauer on 5 Sep 2018

I think its an issue, have the same problem

ralphkocher on 13 Sep 2018

I'm also running into the same issue and I'm spending the day trying to debug it. Any luck or advice since posting this?

ghost on 17 Sep 2018

I was able to get this working by commenting out clusterAuthMode: x509:

configmap:
  storage:
    dbPath: /data/db
  net:
    port: 27018
    ssl:
      mode: requireSSL
      CAFile: /data/configdb/tls.crt
      PEMKeyFile: /work-dir/mongo.pem
  replication:
    replSetName: rs0
  security:
    authorization: enabled
    # clusterAuthMode: x509
    keyFile: /data/configdb/key.txt

ghost on 17 Sep 2018

Opened up PR #7772 which is a simple fix to take care of this. We had this issue as well and was super annoying. Would appreciate hearing from others as well that this fix solves the issue.

Looks like @rmuehlbauer had the exact same issue that we did (you can find the Quorum check failed in the posted logs.

ssalaues on 17 Sep 2018

I can confirm - commenting out "clusterAuthMode: x509" prevents running into this problem but of course also disables x509 member authentication...

"...Certificat Requirements: The Distinguished Name (DN), found in the member certificate’s subject, must specify a non-empty value for at least one of the following attributes: Organization (O), the Organizational Unit (OU) or the Domain Component (DC)."

(taken from: https://docs.mongodb.com/manual/tutorial/configure-x509-member-authentication/)

openssl req -new -key mongo.key -out mongo.csr -subj "/CN=$my_hostname/OU=it/O=your company" -config openssl.cnf

and fired up helm again (this time of course I'm using the locally patched chart for installation and not stable/mongodb-replicaset)

Maybe you guys could try and see if this hack also works in your case?

rmuehlbauer on 18 Sep 2018

👍5 🎉2

Well done, @rmuehlbauer - that makes sense and resolved the issue for me as well. Cheers! 🔥

ghost on 18 Sep 2018

I'm glad that this works also for you @TR-jboykin!
Next step would most likely be to neatly integrate the fix in all the scripts and config files of the helm chart.
I will try to have a look on this maybe next week - if someone else wants to volunteer, just go ahead :-)

rmuehlbauer on 19 Sep 2018

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] on 19 Oct 2018

Hi, I am having this same issue, but not using auth mode: 509x, it is user/password mode.
Using one shard and three replicas running on Kubernetes for long time it was running without any issue and suddenly container crashed and needed to restart the pod. after restarting the pod stacks at bootstrapping container for ever (since two days).
the primary has around 380 GB data and I see that the bootstraping container syncing as volume is increasing upto like 350GB and even more and then again start from 0GB.
MongoDB version 3.6

2020-07-09T10:10:08.360+0000 W NETWORK  [replSetDistLockPinger] Unable to reach primary for set config0
2020-07-09T10:10:08.360+0000 I NETWORK  [replSetDistLockPinger] Cannot reach any nodes for set config0. Please check network connectivity and the status of the set. This has happened for 491 checks in a row.
2020-07-09T10:10:08.360+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: ShutdownInProgress: Server is shutting down
2020-07-09T10:10:38.186+0000 W NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Unable to reach primary for set config0
2020-07-09T10:10:38.213+0000 W NETWORK  [shard registry reload] Unable to reach primary for set config0
2020-07-09T10:10:38.213+0000 I SHARDING [shard registry reload] Periodic reload of shard registry failed  :: caused by :: ShutdownInProgress: could not get updated shard list from config server due to Server is shutting down; will retry after 30s
2020-07-09T10:10:38.236+0000 W NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Unable to reach primary for set shard0
2020-07-09T10:10:38.361+0000 W NETWORK  [replSetDistLockPinger] Unable to reach primary for set config0
2020-07-09T10:10:38.361+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: ShutdownInProgress: Server is shutting down
2020-07-09T10:10:58.052+0000 I -        [repl writer worker 6]   xxxxxx.yyyy_5a0f4c64-dddf-4d59-94a7-0a7c93ab5bec collection clone progress: 780449/913793 85% (documents copied)

gruppferi on 9 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[stable/grafana] Provide automatic redeployment option on configuration change

vsliouniaev · 3Comments

[stable/prometheus-mysql-exporter] Add support for Prometheus Operator's ServiceMonitor

tklovett · 3Comments

BUG: rabbitmq-ha chart fails to deploy on K8S 1.9.4 due to ConfigMaps now being mounted RO

tdcox · 3Comments

prometheus-operator additionalServiceMonitors attempts to create a ServiceMonitor with an empty name

gajus · 3Comments

[stable/fluent-bit] No service created

donkeyx · 3Comments