Is this a question, feature request, or bug report?
BUG REPORT
Please supply the header (i.e. the first few lines) of your most recent
log file for each node in your cluster. On most unix-based systems
running with defaults, this boils down to the output of
grep -F '[config]' cockroach-data/logs/cockroach.log
When log files are not available, supply the output of cockroach version
and all flags/environment variables passed to cockroach start instead.
# grep -F '[config]' cockroach-data/logs/cockroach.log
grep: cockroach-data/logs/cockroach.log: No such file or directory
# cockroach version
Build Tag: v2.0.0
Build Time: 2018/04/03 20:56:09
Distribution: CCL
Platform: linux amd64 (x86_64-unknown-linux-gnu)
Go Version: go1.10
C Compiler: gcc 6.3.0
Build SHA-1: a6b498b7aff14234bcde23107b9e7fa14e6a34a8
Build Type: release
# uname -a
Linux cockroach-us-central1-f 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
# cockroach init --insecure
E180413 17:59:19.184767 1 cli/error.go:109 unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Error: unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Failed running "init"
Tried to initialize a cluster
Startup as experienced on many other systems
rpc errors with no code and a stacktrace that only points back to an error handling function.
I get the same error when running with certs dir or host (even localhost):
# cockroach init --certs-dir=/etc/cockroachdb/certs --host=cockroach-us-central1-f.us-central1-f.c.**********.internal
E180413 17:59:02.041910 1 cli/error.go:109 unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Error: unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Failed running "init"
# cockroach init --host localhost --insecure
E180413 18:02:35.749030 1 cli/error.go:109 unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Error: unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Failed running "init"
FYI, here's how I installed cockroachdb:
#!/usr/bin/env bash
set -euxo pipefail
# Install CockroachDB
wget -qO- https://binaries.cockroachdb.com/cockroach-v2.0.0.linux-amd64.tgz | tar xvz
cp -i cockroach-v2.0.0.linux-amd64/cockroach /usr/local/bin
# Create SSL cert storage dirs
CERT_DIR="/etc/cockroachdb/certs"
CERT_PRIVATE_DIR="/etc/cockroachdb/private"
ROACH_BIN="/usr/local/bin/cockroach"
mkdir -p "${CERT_DIR}"
mkdir -p "${CERT_PRIVATE_DIR}"
# Per-Process Limit Modification
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "session required pam_limits.so" >> /etc/pam.d/common-session-noninteractive
echo "* soft nofile 35000" >> /etc/security/limits.conf
echo "* hard nofile 35000" >> /etc/security/limits.conf
# systemd
cat >/etc/systemd/system/cockroachdb.service << EOL
[Unit]
Description=Cockroach DB
[Install]
WantedBy=multi-user.target
[Service]
Environment=COCKROACH_HOST=\$(ghostname)
Environment=COCKROACH_CERTS_DIR=${CERT_DIR}
ExecStart=${ROACH_BIN} start --store=/var/data/cockroachdb/ --port=26257 --http-port=76224 --logtostderr=ERROR
ExecStop=${ROACH_BIN} quit
SyslogIdentifier=cockroachdb
Restart=always
LimitNOFILE=35000" > /etc/systemd/system/cockroachdb.service
EOL
glibc and libtinfo appear installed:
# apt-get install libtinfo-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libtinfo-dev is already the newest version (6.0+20161126-1+deb9u2).
libtinfo-dev set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
# apt-get install libtinfo5
Reading package lists... Done
Building dependency tree
Reading state information... Done
libtinfo5 is already the newest version (6.0+20161126-1+deb9u2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
# apt-get install libc6
Reading package lists... Done
Building dependency tree
Reading state information... Done
libc6 is already the newest version (2.24-11+deb9u3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
# apt-get install libc6-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libc6-dev is already the newest version (2.24-11+deb9u3).
libc6-dev set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Just built from source and getting this error as well.
Also getting same error on Debian Jessie:
# cockroach init --certs-dir=/etc/cockroachdb/certs --host=cockroach-us-east1-b.us-east1-b.c.**********.internal
E180413 18:50:23.764308 1 cli/error.go:109 unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Error: unable to connect or connection lost.
Please check the address and credentials such as certificates (if attempting to
communicate with a secure cluster).
initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Failed running "init"
# uname -a
Linux cockroach-us-east1-b 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 GNU/Linux
This ended up being me taking the directions too literally. I was under the impression that init would also start the server. It does not. You need to start it first.
The errors still are unusable though.
Reopening this to track the need to improve the error messages. We should get the underlying failure ("connection refused" in this case. Other possibilities include cert errors, timeouts, etc) instead of GRPC's unhelpful "all SubConns are in TransientFailure"
cockroach init without a server now fails with:
Error: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.
Thanks!
Most helpful comment
Reopening this to track the need to improve the error messages. We should get the underlying failure ("connection refused" in this case. Other possibilities include cert errors, timeouts, etc) instead of GRPC's unhelpful "all SubConns are in TransientFailure"