Origin: oc cluster up with metrics fails preflight check - unable to access master url

Created on 18 Nov 2016 · 13Comments · Source: openshift/origin

This works on one machine I have, but not on another (most times - 9 times out of 10 it fails)

Running sudo oc cluster up --metrics the preflight checks fail.

When I look at the openshift-infra project, I only see one pod named "metrics-deployer-pod" and it is in an error state. The logs for this pod is below.

Version

oc version oc v1.4.0-alpha.1+3712aec-204-dirty kubernetes v1.4.0+776c994 features: Basic-Auth

Steps To Reproduce

Start the cluster with metrics:

sudo /home/my-openshift/oc cluster up --metrics

Add the cluster role to admin user so I can look at the openshift-infra project in the console

sudo /home/my-openshift/oc login -u system:admin
sudo /home/my-openshift/oc adm policy add-cluster-role-to-user cluster-admin admin

Go to the browser, log into the console as admin user, go to the openshift-infra project and see there is only one pod named "metrics-deployer-pod" and it is red - look at the logs for the preflight check errors (see below).

Current Result

````
++ parse_bool false CONTINUE_ON_ERROR
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false

continue_on_error=false
'[' false == false ']'
set -eu
deployer_mode=deploy
image_prefix=openshift/origin-
image_version=v1.4.0-alpha.1
master_url=https://kubernetes.default.svc:443
[[ 3 == \/ ]]
++ parse_bool false REDEPLOY
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
redeploy=false
'[' false == true ']'
mode=deploy
'[' deploy = redeploy ']'
++ parse_bool false IGNORE_PREFLIGHT
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
ignore_preflight=false
cassandra_nodes=1
++ parse_bool false USE_PERSISTENT_STORAGE
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
use_persistent_storage=false
++ parse_bool false DYNAMICALLY_PROVISION_STORAGE
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
dynamically_provision_storage=false
cassandra_pv_size=10Gi
metric_duration=7
user_write_access=false
heapster_node_id=nodename
metric_resolution=10s
project=openshift-infra
master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
dir=/etc/deploy/_output
secret_dir=/secret
rm -rf /etc/deploy/_output
mkdir -p /etc/deploy/_output
chmod 700 /etc/deploy/_output
mkdir -p /secret
chmod 700 /secret
chmod: changing permissions of '/secret': Read-only file system
:
hawkular_metrics_hostname=metrics-openshift-infra.192.168.1.2.xip.io
hawkular_metrics_alias=hawkular-metrics
hawkular_cassandra_alias=hawkular-cassandra
++ date +%s
openshift admin ca create-signer-cert --key=/etc/deploy/_output/ca.key --cert=/etc/deploy/_output/ca.crt --serial=/etc/deploy/_output/ca.serial.txt --name=metrics-signer@1479422003
'[' -n 1 ']'
oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc:443
cluster "master" set.
++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibWV0cmljcy1kZXBsb3llci10b2tlbi1uaXIyeCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLWRlcGxveWVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYTdkNWZhNDMtYWQxNS0xMWU2LWE5MDktMDAyNTY0OGRkMGYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTptZXRyaWNzLWRlcGxveWVyIn0.Cio3uBr3bTkunSuhmGWY0IBoM3fn12y-QeOsxN5YEXSG20pPyFXpz_PpKRMHiUIESZ0FD0qe7N-LEwcEc2F9cmRWnTQA4hdpJ7KIPk_KrWXltnahW-ItkWvFPlxprQ1O-eMnaoeQAkjuzEOwUC3QzsMfDBF74-I8Nbu-IGGtPycrqi29jciYXH3HnBeSaG8oWjL4L_ThpUZhTJ3ptZxzB6ikwBmMAd-B6wT0lk1uh4WvI-IwuWYNiddKOvPE5_oJYHFq4P63_dtE7HTBLrNHG7I91ytIU2QOhiEUZtMU0Xu5F0G9iGGsm3Rx9r8Q0CykLVmqq9bv8SvCvwXclu7JgQ
user "account" set.
oc config set-context current --cluster=master --user=account --namespace=openshift-infra
context "current" set.
oc config use-context current
switched to context "current".
old_kc=/etc/deploy/.kubeconfig
KUBECONFIG=/etc/deploy/_output/kube.conf
'[' -z 1 ']'
oc config set-cluster deployer-master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc:443
cluster "deployer-master" set.
++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
oc config set-credentials deployer-account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibWV0cmljcy1kZXBsb3llci10b2tlbi1uaXIyeCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLWRlcGxveWVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYTdkNWZhNDMtYWQxNS0xMWU2LWE5MDktMDAyNTY0OGRkMGYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTptZXRyaWNzLWRlcGxveWVyIn0.Cio3uBr3bTkunSuhmGWY0IBoM3fn12y-QeOsxN5YEXSG20pPyFXpz_PpKRMHiUIESZ0FD0qe7N-LEwcEc2F9cmRWnTQA4hdpJ7KIPk_KrWXltnahW-ItkWvFPlxprQ1O-eMnaoeQAkjuzEOwUC3QzsMfDBF74-I8Nbu-IGGtPycrqi29jciYXH3HnBeSaG8oWjL4L_ThpUZhTJ3ptZxzB6ikwBmMAd-B6wT0lk1uh4WvI-IwuWYNiddKOvPE5_oJYHFq4P63_dtE7HTBLrNHG7I91ytIU2QOhiEUZtMU0Xu5F0G9iGGsm3Rx9r8Q0CykLVmqq9bv8SvCvwXclu7JgQ
user "deployer-account" set.
oc config set-context deployer-context --cluster=deployer-master --user=deployer-account --namespace=openshift-infra
context "deployer-context" set.
'[' -n 1 ']'
oc config use-context deployer-context
switched to context "deployer-context".
case $deployer_mode in
'[' false '!=' true ']'
validate_preflight
set +x

PREFLIGHT CHECK FAILED

validate_master_accessible:
unable to access master url https://kubernetes.default.svc:443
See the error from 'curl https://kubernetes.default.svc:443' below for details:
curl: (28) timed out before SSL handshake
Deployment has been aborted prior to starting, as these failures often indicate fatal problems.
Please evaluate any error messages above and determine how they can be addressed.
To ignore this validation failure and continue, specify IGNORE_PREFLIGHT=true.
PREFLIGHT CHECK FAILED
````

Expected Result

Preflight checks pass and all the openshift-infra pods are created.

Additional Information

This works on my other machine (a laptop that is faster). I suspect the preflight check might need to increase a timeout or perform some retries - especially when running on a slower machine.

Note that my firewalld has been turned off, so that isn't in the way.

componencomposition kinbug prioritP2

Source

jmazzitelli

Most helpful comment

hmm, so for the logging one, I guess it's a matter of changing the pod spec's PullPolicy to "Never". On the metrics side, it looks like it needs an additional environment variable ("STARTUP_TIMEOUT"). I will take a closer look at this as soon as I get a chance.

csrwng on 2 Feb 2017

👍2

All 13 comments

I'm facing a similar issue. But in my case the error is slightly different. See my error output:

-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for registry.access.redhat.com/openshift3/metrics-deployer:v3.4.0.39-2 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ...
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using public hostname IP 172.17.42.1 as the host IP
   Using 172.17.42.1 as the server IP
-- Starting OpenShift container ...
   Creating initial OpenShift configuration
FAIL
   Error: could not create OpenShift configuration
   Caused By:
     Error: Docker run error rc=1
     Details:
       Image: registry.access.redhat.com/openshift3/metrics-deployer:v3.4.0.39-2
       Entrypoint: []
       Command: [start --images=registry.access.redhat.com/openshift3/metrics-deployer-${component}:v3.4.0.39-2 --volume-dir=/var/lib/origin/openshift.local.volumes --dns=0.0.0.0:8053 --write-config=/var/lib/origin/openshift.local.config --master=172.17.42.1 --public-master=https://172.17.42.1:8443 --hostname=172.17.42.1]
       Error Output:
         ++ parse_bool false CONTINUE_ON_ERROR
         ++ local v=false
         ++ '[' false '!=' true -a false '!=' false ']'
         ++ echo false
         + continue_on_error=false
         + '[' false == false ']'
         + set -eu
         + deployer_mode=deploy
         + image_prefix=openshift/origin-
         + image_version=latest
         + master_url=https://kubernetes.default.svc:8443
         + [[ 3 == \/ ]]
         ++ parse_bool false REDEPLOY
         ++ local v=false
         ++ '[' false '!=' true -a false '!=' false ']'
         ++ echo false
         + redeploy=false
         + '[' false == true ']'
         + mode=deploy
         + '[' deploy = redeploy ']'
         ++ parse_bool false IGNORE_PREFLIGHT
         ++ local v=false
         ++ '[' false '!=' true -a false '!=' false ']'
         ++ echo false
         + ignore_preflight=false
         + cassandra_nodes=1
         ++ parse_bool true USE_PERSISTENT_STORAGE
         ++ local v=true
         ++ '[' true '!=' true -a true '!=' false ']'
         ++ echo true
         + use_persistent_storage=true
         ++ parse_bool false DYNAMICALLY_PROVISION_STORAGE
         ++ local v=false
         ++ '[' false '!=' true -a false '!=' false ']'
         ++ echo false
         + dynamically_provision_storage=false
         + cassandra_pv_size=10Gi
         + metric_duration=7
         + user_write_access=false
         + heapster_node_id=nodename
         + metric_resolution=15s
         + project=openshift-infra
         + master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
         + token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
         + dir=/etc/deploy/_output
         + secret_dir=/secret
         + rm -rf /etc/deploy/_output
         rm: cannot remove '/etc/deploy/_output': Permission denied
         + :
         + mkdir -p /secret
         mkdir: cannot create directory '/secret': Permission denied
         + :
         + hawkular_metrics_hostname=hawkular-metrics.example.com
         + hawkular_metrics_alias=hawkular-metrics
         + hawkular_cassandra_alias=hawkular-cassandra
         ++ date +%!s(MISSING)
         + openshift admin ca create-signer-cert --key=/etc/deploy/_output/ca.key --cert=/etc/deploy/_output/ca.crt --serial=/etc/deploy/_output/ca.serial.txt --name=metrics-signer@1486056191
         error: open /etc/deploy/_output/ca.crt: permission denied"

rafaeltuelho on 2 Feb 2017

@rafaeltuelho what's your platform? If Linux, you'll need sudo to run cluster up.

csrwng on 2 Feb 2017

@csrwng , I'm running on a Fedora 25 Box.

My user has docker group associated. So I can run docker commands without sudo. Anyway I also tested with sudo, but no success.

rafaeltuelho on 2 Feb 2017

See my full oc cluster up command:

sudo /home/rsoares/bin/oc cluster up \                                        
--public-hostname 172.17.42.1 \
--routing-suffix apps.172.17.42.1 \
--host-data-dir /home/rsoares/.oc/profiles/demo-full/data \
--host-config-dir /home/rsoares/.oc/profiles/demo-full/config \
--use-existing-config \
-e TZ=BRT \
--logging=true \
--metrics=true \
--routing-suffix=172.17.42.1.xip.io \
--version=v3.4.0.39-2 \
--image=registry.access.redhat.com/openshift3/ose \
--image=registry.access.redhat.com/openshift3/logging-deployment \
--image=registry.access.redhat.com/openshift3/metrics-deployer

If I do not inform the images'prefix and specific version (in above command), it startsthe cluster but logging and metrics PODs referers to wrong images that does not exists on redhat's registry:

    "Back-off pulling image "registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39"
    "Back-off pulling image "registry.access.redhat.com/openshift3/ose-metrics-deployer:v3.4.0.39"

rafaeltuelho on 2 Feb 2017

@rafaeltuelho so if you don't use --metrics=true and --logging=true ... it starts up ok?

csrwng on 2 Feb 2017

yep!

rafaeltuelho on 2 Feb 2017

@rafaeltuelho so the problem with the image names is known (https://bugzilla.redhat.com/show_bug.cgi?id=1416240) and we're working on it.

As a workaround, you could pull the images yourself using the current name and retag them locally with the name that cluster up expects:

docker pull registry.access.redhat.com/openshift3/logging-deployment:v3.4.0.39
docker tag registry.access.redhat.com/openshift3/logging-deployment:v3.4.0.39 registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39

And do not include the additional --image arguments on your start command.

/cc @stevekuznetsov

csrwng on 2 Feb 2017

@csrwng , thanks for your inputs.

I tried your workaround but for some reason it is still trying to pull that image, even having the "correct" tags

ocker tag registry.access.redhat.com/openshift3/logging-deployer:v3.4.0.39-2 registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39-2

docker images  | grep ose-logging
registry.access.redhat.com/openshift3/ose-logging-deployment   v3.4.0.39-2          910537ac6658        2 weeks ago         763 MB
registry.access.redhat.com/openshift3/ose-logging-deployment   v3.4.0.39            d7efd0e669f7        6 months ago        750.9 MB

screenshot_20170202_162025

On metrics side, the error is different:
It starts the PODs, but fails with this error

...
[Storing /etc/deploy/_output/hawkular-metrics.truststore]
Adding password for user hawkular
Generating the JGroups Keystore
Creating the Hawkular Metrics Secrets configuration json file
Creating the Hawkular Metrics Certificate Secrets configuration json file
Creating the Hawkular Metrics User Account Secrets
Creating the Cassandra Secrets configuration file
Creating the Cassandra Certificate Secrets configuration json file
Creating Hawkular Metrics & Cassandra Secrets
secret "hawkular-metrics-secrets" created
secret "hawkular-metrics-certificate" created
secret "hawkular-metrics-account" created
secret "hawkular-cassandra-secrets" created
secret "hawkular-cassandra-certificate" created
Creating Hawkular Metrics & Cassandra Templates
template "hawkular-metrics" created
template "hawkular-cassandra-services" created
template "hawkular-cassandra-node-pv" created
template "hawkular-cassandra-node-dynamic-pv" created
template "hawkular-cassandra-node-emptydir" created
template "hawkular-support" created
Deploying Hawkular Metrics & Cassandra Components
scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable
error: no objects passed to create

rafaeltuelho on 2 Feb 2017

csrwng on 2 Feb 2017

👍2

Rafael,
Download the latest ocp client as this was fixed in later tags.

v3.4.0.40

jorgemoralespou on 2 Feb 2017

tested with oc v3.4.1.5 and it's still not working :-\

the issue is it's referencing a wrong image name. it refers to ose-metrics-deployer:<tag version> but the name present on rh registry is metrics-deployer:<tag version>

rafaeltuelho on 10 Feb 2017

The image naming issue is tracked in the BZ linked above -- not entirely sure when that will be resolved but we should be able to push out correctly named images soon.