This works on one machine I have, but not on another (most times - 9 times out of 10 it fails)
Running sudo oc cluster up --metrics the preflight checks fail.
When I look at the openshift-infra project, I only see one pod named "metrics-deployer-pod" and it is in an error state. The logs for this pod is below.
oc version
oc v1.4.0-alpha.1+3712aec-204-dirty
kubernetes v1.4.0+776c994
features: Basic-Auth
Start the cluster with metrics:
sudo /home/my-openshift/oc cluster up --metricsAdd the cluster role to admin user so I can look at the openshift-infra project in the console
sudo /home/my-openshift/oc login -u system:adminsudo /home/my-openshift/oc adm policy add-cluster-role-to-user cluster-admin adminGo to the browser, log into the console as admin user, go to the openshift-infra project and see there is only one pod named "metrics-deployer-pod" and it is red - look at the logs for the preflight check errors (see below).
````
++ parse_bool false CONTINUE_ON_ERROR
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
validate_master_accessible:
unable to access master url https://kubernetes.default.svc:443
See the error from 'curl https://kubernetes.default.svc:443' below for details:
curl: (28) timed out before SSL handshake
Deployment has been aborted prior to starting, as these failures often indicate fatal problems.
Please evaluate any error messages above and determine how they can be addressed.
To ignore this validation failure and continue, specify IGNORE_PREFLIGHT=true.
PREFLIGHT CHECK FAILED
````
Preflight checks pass and all the openshift-infra pods are created.
This works on my other machine (a laptop that is faster). I suspect the preflight check might need to increase a timeout or perform some retries - especially when running on a slower machine.
Note that my firewalld has been turned off, so that isn't in the way.
I'm facing a similar issue. But in my case the error is slightly different. See my error output:
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for registry.access.redhat.com/openshift3/metrics-deployer:v3.4.0.39-2 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ...
WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ...
Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using public hostname IP 172.17.42.1 as the host IP
Using 172.17.42.1 as the server IP
-- Starting OpenShift container ...
Creating initial OpenShift configuration
FAIL
Error: could not create OpenShift configuration
Caused By:
Error: Docker run error rc=1
Details:
Image: registry.access.redhat.com/openshift3/metrics-deployer:v3.4.0.39-2
Entrypoint: []
Command: [start --images=registry.access.redhat.com/openshift3/metrics-deployer-${component}:v3.4.0.39-2 --volume-dir=/var/lib/origin/openshift.local.volumes --dns=0.0.0.0:8053 --write-config=/var/lib/origin/openshift.local.config --master=172.17.42.1 --public-master=https://172.17.42.1:8443 --hostname=172.17.42.1]
Error Output:
++ parse_bool false CONTINUE_ON_ERROR
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
+ continue_on_error=false
+ '[' false == false ']'
+ set -eu
+ deployer_mode=deploy
+ image_prefix=openshift/origin-
+ image_version=latest
+ master_url=https://kubernetes.default.svc:8443
+ [[ 3 == \/ ]]
++ parse_bool false REDEPLOY
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
+ redeploy=false
+ '[' false == true ']'
+ mode=deploy
+ '[' deploy = redeploy ']'
++ parse_bool false IGNORE_PREFLIGHT
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
+ ignore_preflight=false
+ cassandra_nodes=1
++ parse_bool true USE_PERSISTENT_STORAGE
++ local v=true
++ '[' true '!=' true -a true '!=' false ']'
++ echo true
+ use_persistent_storage=true
++ parse_bool false DYNAMICALLY_PROVISION_STORAGE
++ local v=false
++ '[' false '!=' true -a false '!=' false ']'
++ echo false
+ dynamically_provision_storage=false
+ cassandra_pv_size=10Gi
+ metric_duration=7
+ user_write_access=false
+ heapster_node_id=nodename
+ metric_resolution=15s
+ project=openshift-infra
+ master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
+ dir=/etc/deploy/_output
+ secret_dir=/secret
+ rm -rf /etc/deploy/_output
rm: cannot remove '/etc/deploy/_output': Permission denied
+ :
+ mkdir -p /secret
mkdir: cannot create directory '/secret': Permission denied
+ :
+ hawkular_metrics_hostname=hawkular-metrics.example.com
+ hawkular_metrics_alias=hawkular-metrics
+ hawkular_cassandra_alias=hawkular-cassandra
++ date +%!s(MISSING)
+ openshift admin ca create-signer-cert --key=/etc/deploy/_output/ca.key --cert=/etc/deploy/_output/ca.crt --serial=/etc/deploy/_output/ca.serial.txt --name=metrics-signer@1486056191
error: open /etc/deploy/_output/ca.crt: permission denied"
@rafaeltuelho what's your platform? If Linux, you'll need sudo to run cluster up.
@csrwng , I'm running on a Fedora 25 Box.
My user has docker group associated. So I can run docker commands without sudo. Anyway I also tested with sudo, but no success.
See my full oc cluster up command:
sudo /home/rsoares/bin/oc cluster up \
--public-hostname 172.17.42.1 \
--routing-suffix apps.172.17.42.1 \
--host-data-dir /home/rsoares/.oc/profiles/demo-full/data \
--host-config-dir /home/rsoares/.oc/profiles/demo-full/config \
--use-existing-config \
-e TZ=BRT \
--logging=true \
--metrics=true \
--routing-suffix=172.17.42.1.xip.io \
--version=v3.4.0.39-2 \
--image=registry.access.redhat.com/openshift3/ose \
--image=registry.access.redhat.com/openshift3/logging-deployment \
--image=registry.access.redhat.com/openshift3/metrics-deployer
If I do not inform the images'prefix and specific version (in above command), it startsthe cluster but logging and metrics PODs referers to wrong images that does not exists on redhat's registry:
"Back-off pulling image "registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39"
"Back-off pulling image "registry.access.redhat.com/openshift3/ose-metrics-deployer:v3.4.0.39"
@rafaeltuelho so if you don't use --metrics=true and --logging=true ... it starts up ok?
yep!
@rafaeltuelho so the problem with the image names is known (https://bugzilla.redhat.com/show_bug.cgi?id=1416240) and we're working on it.
As a workaround, you could pull the images yourself using the current name and retag them locally with the name that cluster up expects:
docker pull registry.access.redhat.com/openshift3/logging-deployment:v3.4.0.39
docker tag registry.access.redhat.com/openshift3/logging-deployment:v3.4.0.39 registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39
And do not include the additional --image arguments on your start command.
/cc @stevekuznetsov
@csrwng , thanks for your inputs.
I tried your workaround but for some reason it is still trying to pull that image, even having the "correct" tags
ocker tag registry.access.redhat.com/openshift3/logging-deployer:v3.4.0.39-2 registry.access.redhat.com/openshift3/ose-logging-deployment:v3.4.0.39-2
docker images | grep ose-logging
registry.access.redhat.com/openshift3/ose-logging-deployment v3.4.0.39-2 910537ac6658 2 weeks ago 763 MB
registry.access.redhat.com/openshift3/ose-logging-deployment v3.4.0.39 d7efd0e669f7 6 months ago 750.9 MB

On metrics side, the error is different:
It starts the PODs, but fails with this error
...
[Storing /etc/deploy/_output/hawkular-metrics.truststore]
Adding password for user hawkular
Generating the JGroups Keystore
Creating the Hawkular Metrics Secrets configuration json file
Creating the Hawkular Metrics Certificate Secrets configuration json file
Creating the Hawkular Metrics User Account Secrets
Creating the Cassandra Secrets configuration file
Creating the Cassandra Certificate Secrets configuration json file
Creating Hawkular Metrics & Cassandra Secrets
secret "hawkular-metrics-secrets" created
secret "hawkular-metrics-certificate" created
secret "hawkular-metrics-account" created
secret "hawkular-cassandra-secrets" created
secret "hawkular-cassandra-certificate" created
Creating Hawkular Metrics & Cassandra Templates
template "hawkular-metrics" created
template "hawkular-cassandra-services" created
template "hawkular-cassandra-node-pv" created
template "hawkular-cassandra-node-dynamic-pv" created
template "hawkular-cassandra-node-emptydir" created
template "hawkular-support" created
Deploying Hawkular Metrics & Cassandra Components
scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable
error: no objects passed to create
hmm, so for the logging one, I guess it's a matter of changing the pod spec's PullPolicy to "Never". On the metrics side, it looks like it needs an additional environment variable ("STARTUP_TIMEOUT"). I will take a closer look at this as soon as I get a chance.
Rafael,
Download the latest ocp client as this was fixed in later tags.
v3.4.0.40
tested with oc v3.4.1.5 and it's still not working :-\
the issue is it's referencing a wrong image name. it refers to ose-metrics-deployer:<tag version> but the name present on rh registry is metrics-deployer:<tag version>
The image naming issue is tracked in the BZ linked above -- not entirely sure when that will be resolved but we should be able to push out correctly named images soon.
This should be fixed, please reopen if still an issue.
Most helpful comment
hmm, so for the logging one, I guess it's a matter of changing the pod spec's PullPolicy to "Never". On the metrics side, it looks like it needs an additional environment variable ("STARTUP_TIMEOUT"). I will take a closer look at this as soon as I get a chance.