Followup to #2797
Cluster setup fails when request to / is not made before finish_cluster.
Assuming that 192.168.56.101 is setup node:
#curl http://192.168.56.101:5984/
curl http://192.168.56.102:5984/
curl http://user:[email protected]:5984/_cluster_setup
curl --request POST
--url http://user:[email protected]:5984/_cluster_setup \
--header 'content-type: application/json' \
--data '{
"action": "enable_cluster",
"bind_address": "0.0.0.0",
"username": "user",
"password": "pass",
"port": 5984,
"node_count": 2,
"remote_node": "192.168.56.102",
"remote_current_user": "user",
"remote_current_password": "pass"
}'
curl --request POST \
--url http://user:[email protected]:5984/_cluster_setup \
--header 'content-type: application/json' \
--data '{
"action": "add_node",
"host": "192.168.56.102",
"port": 5984,
"username": "user",
"password": "pass",
"singlenode": false
}'
curl --request POST \
--url http://user:[email protected]:5984/_cluster_setup \
--header 'content-type: application/json' \
--data '{ "action": "finish_cluster" }'
curl http://user:[email protected]:5984/_cluster_setup
{"couchdb":"Welcome","version":"3.0.0-ebdfbba","git_sha":"ebdfbba","uuid":"18b03233b7265d32443a8d576041a981","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
{"state":"cluster_enabled"}
{"ok":true}
{"ok":true}
{"error":"setup_error","reason":"Cluster setup unable to sync admin passwords"}
{"state":"cluster_enabled"}
Cluster setup should succeed without visiting /.
Docker containers built with couchdb-docker/dev executed on 2 virtualbox vms
Here are the corresponding logs of the setup node:
Without GET /
Lines of note:
[error] 2020-05-01T09:04:07.246153Z [email protected] <0.714.0> 3e471a2ab6 setup sync_admin results [{badrpc,{'EXIT',{badarg,[{config,set,5,[{file,"src/config.erl"},{line,186}]},{rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,197}]}]}}}] errors []
[notice] 2020-05-01T09:04:07.255094Z [email protected] <0.714.0> 3e471a2ab6 192.168.56.101:5984 192.168.56.101 user POST /_cluster_setup 500 ok 43
what happens if you add sleep 3 instead of the curl to /?
sleeping 3s
{"couchdb":"Welcome","version":"3.0.0-ebdfbba","git_sha":"ebdfbba","uuid":"8b6a1310bd6d5751724e6abaffd2da3a","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
{"state":"cluster_enabled"}
{"ok":true}
{"ok":true}
sleeping 3s
{"error":"setup_error","reason":"Cluster setup unable to sync admin passwords"}
{"state":"cluster_enabled"}
Still
[error] 2020-05-01T23:54:31.282087Z [email protected] <0.543.0> 671c155668 setup sync_admin results [{badrpc,{'EXIT',{badarg,[{config,set,5,[{file,"src/config.erl"},{line,186}]},{rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,197}]}]}}}] errors []
[notice] 2020-05-01T23:54:31.292029Z [email protected] <0.543.0> 671c155668 192.168.56.101:5984 192.168.56.101 user POST /_cluster_setup 500 ok 26
Hey there, I stumbled upon the same issue using CouchDB 3.1.0 and I think #2797 should not have been closed with #2798.
The log message uuid set to ... caught my eye when using the workaround via GET /:
couchdb1_1 | [notice] 2020-07-24T20:46:41.473471Z [email protected] <0.88.0> -------- config: [couchdb] uuid set to 84a3191a25a2955130956d131232f98b for reason nil
couchdb1_1 | [notice] 2020-07-24T20:46:41.481822Z [email protected] <0.1868.0> 2bc6d8040a localhost:15984 172.19.0.1 admin GET / 200 ok 20
When manually setting a uuid on the coordinating node everything worked fine (no GET / or sleep 3 necessary).
So, an unset or empty uuid on the coordinating node seems to make the finish_cluster action fail. A fresh node even throws an error when trying to get its uuid:
curl --user admin:password 'http://localhost:5984/_node/_local/_config/couchdb/uuid'
{"error":"not_found","reason":"unknown_config_value"}
I still have exactly the same problem with CouchDB 3.1.1.
Thanks @gitsnaf
3.1.1 does not include the fix. It seems that this change has not been backported to 3.x yet
Most helpful comment
Hey there, I stumbled upon the same issue using CouchDB 3.1.0 and I think #2797 should not have been closed with #2798.
The log message
uuid set to ...caught my eye when using the workaround viaGET /:When manually setting a uuid on the coordinating node everything worked fine (no
GET /orsleep 3necessary).So, an unset or empty uuid on the coordinating node seems to make the
finish_clusteraction fail. A fresh node even throws an error when trying to get its uuid: