Openshift-ansible: Metrics installation using playbook does not end up with a working installation (3.7)

Created on 10 Apr 2018  ·  23Comments  ·  Source: openshift/openshift-ansible

Description

On a new install on a 3.7 cluster the metrics playbook succesfully completes.

But when i go check the openshift-infra project the hawkular pods will not start.

Version

Please put the following version information in the code block
indicated below.

  • Your ansible version per ansible --version
ansible 2.4.2.0
  config file = /home/ansibleuser/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/ansibleuser/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

If you're operating from a git clone:

  • The output of git describe
    openshift-ansible-3.7.42-1-34-g2474d22
Steps To Reproduce
  1. launch 3.7 playbook
    ansible-playbook -i ./hosts/cluster-installation playbooks/byo/openshift-cluster/openshift-metrics.yml
Expected Results

Cluster up and running and metrics configured

Observed Results

the hawkular pods show these logs :

2018-04-10 16:14:58,460 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-04-10 16:14:58,460 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Trying again in 10000 ms

I guess something went wrong on the cassandra level, but i'm not familiar with that database.
It just seem to me that some initialisation process did not occur and the cassandra pod is just started without some minimal configuration.
I suppose i could manually create it but i don't wanna mess with things i don't quite understand.

Here are the cassandra pod boot logs :

https://gist.github.com/ahmadou/9b35d5e534d451555e0a11ac2cd93ce0

Additional Information

Provide any additional information which may help us diagnose the
issue.
CentOS Linux release 7.4.1708

My config file

#Configuration globale cluster
[OSEv3:children]
masters
etcd
nodes
glusterfs
glusterfs_registry

#VARIABLES GLOBALES CLUSTER
[OSEv3:vars]
#etcd
openshift_use_etcd_system_container=True

#ansible
ansible_ssh_user=ansibleuser
ansible_become=true
ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"

#checks disk
openshift_check_min_host_disk_gb=13
#firewall
os_firewall_use_firewalld=True

#deployment configuration
openshift_deployment_type=origin
#openshift_version=3.9.0
#openshift_pkg_version=3.7.1
#containerized=true

#configuration glusterfs
openshift_storage_glusterfs_namespace=glusterfs
openshift_storage_glusterfs_name=storage

#configuration registry interne
openshift_hosted_registry_storage_kind=glusterfs
openshift_registry_selector="region=infranodes"
openshift_hosted_registry_replicas=3
openshift_hosted_registry_storage_volume_size=190Gi

#configuration routers
openshift_router_selector="region=routingnodes"

#configuration noeuds standard
osm_default_node_selector="region=standardnodes"

#configuration points d'acces master et api
openshift_master_cluster_hostname=master-lb.mycompany.internal
openshift_master_cluster_public_hostname=console.mycompany.com
openshift_master_default_subdomain=mycompany.com
openshift_master_api_port=8443
openshift_master_console_port=8443
openshift_master_session_name=ssn
openshift_public_ip="xx.xx.xx.xx"

#configuration du certificats des routeurs
openshift_hosted_router_certificate={"certfile": "/home/ansibleuser/openshift-ansible/customCertificates/STAR_mycompany.crt", "keyfile": "/home/ansibleuser/openshift-ansible/customCertificates/mycompany.key", "cafile": "/home/ansibleuser/openshift-ansible/customCertificates/COMODORSADomainValidationSecureServerCA.crt"}

#configuration du ldap
openshift_master_identity_providers=[{'name': 'picv4_ldap', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': 'uid=ldapbind,cn=users,cn=accounts,dc=ggd,dc=mycompany', 'bindPassword': 'tetetetetetge', 'ca': '', 'insecure': 'true', 'url': 'ldap://ldap.picv4.mycompany:389/cn=users,cn=accounts,dc=picv4,dc=mycompany?uid'}]

#configuration de la politique d'audit
openshift_master_audit_config={"enabled": true, "auditFilePath": "/var/log/openpaas-oscp-audit/openpaas-oscp-audit.log", "maximumFileRetentionDays": 14, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 5}

#configuration logs cluster
openshift_logging_install_logging="true"
openshift_logging_es_pvc_dynamic="true"
openshift_logging_es_pvc_size="100G"
openshift_logging_curator_default_days="2"
openshift_logging_curator_run_hour="24"
openshift_master_logging_public_url="https://logs.mycompany.com"

openshift_logging_es_nodeselector="region=infranodes"
openshift_logging_kibana_ops_nodeselector="region=infranodes"
openshift_logging_curator_ops_nodeselector="region=infranodes"

#configuration metrics

openshift_metrics_master_url="https://master.xxxxxx:8443"
openshift_metrics_install_metrics="true"
openshift_metrics_cassandra_storage_type="dynamic"
openshift_metrics_duration=7
openshift_metrics_cassandra_pvc_size="20G"
openshift_metrics_cassandra_replicas=1
openshift_metrics_cassandra_limits_memory="2Gi"
openshift_metrics_cassandra_limits_cpu="2000m"
openshift_metrics_cassandra_nodeselector="{'region':'infra'}"
openshift_master_metrics_public_url="metrics.xxxxxx.com"
openshift_metrics_hawkular_hostname="hawkular.xxxxx.com"
openshift_metrics_hawkular_nodeselector="{'region':'infra'}"
openshift_metrics_cassandra_replicas=1
openshift_metrics_heapster_limits_cpu="2000m"
openshift_metrics_heapster_nodeselector="{'region':'infra'}"
openshift_metrics_hawkular_ca="/home/ansibleuser/openshift-ansible/customCertificates/xxxx.crt"
openshift_metrics_hawkular_cert="/home/ansibleuser/openshift-ansible/customCertificates/xxx-xxxx.crt"
openshift_metrics_hawkular_key="/home/ansibleuser/openshift-ansible/customCertificates/xxX.key"

#NOEUDS GLUSTER FS 
[glusterfs]
storage01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.31
storage02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.32
storage03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.33
storage04.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.34
#config glusterfs
[glusterfs:vars]
openshift_storage_glusterfs_nodeselector="glusterfs=standardstorage"
openshift_storage_glusterfs_wipe="true"

#NOEUDS GLUSTER FS DEDIES  AU REGISTRY INTERNE
[glusterfs_registry]
storage-registry01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.41
storage-registry02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.42
storage-registry03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.43

#NOEUDS DU CLUSTER

#Groupe des VMS Master
[masters]
master0[1:2].mycompany.internal

#noeuds etcd
[etcd]
etcd01.mycompany.internal
etcd02.mycompany.internal
etcd03.mycompany.internal

# Noeuds Openshift
[nodes]

#Infra Nodes
infranode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'infranodes'}" openshift_schedulable=true

#Pic nodes
picnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'picnodes'}" openshift_schedulable=true

#Compilation nodes
compilnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'compilnodes'}" openshift_schedulable=true

#routing nodes
routeur0[1:2].mycompany.internal openshift_node_labels="{'region' : 'routingnodes'}"

#standard nodes
node0[1:2].mycompany.internal openshift_node_labels="{'region' : 'standardnodes'}" openshift_schedulable=true

#masters
master0[1:2].mycompany.internal openshift_node_labels="{'region' : 'masters'}" openshift_schedulable=true

#glusterfs nodes
storage0[1:4].mycompany.internal openshift_node_labels="{'region' : 'standardstorage'}"

#glusterfs registry nodes
storage-registry0[1:3].mycompany.internal openshift_node_labels="{'region' : 'registrystorage'}"

#variables specifiques noeuds openshift
[nodes:vars]
openshift_docker_options=--log-driver json-file --log-opt max-size=1M --log-opt max-file=3 --selinux-enabled

EXTRA INFORMATION GOES HERE
lifecyclrotten

Most helpful comment

I've seen this issue a couple of times, this is usually triggered when the cassandra pod doesn't have a persistent volume so the initial setup gets lost when the pod is restarted, in order to fix it is necessary to run the job that comes with the installation, this will create some necessary data structures inside cassandra, use the following commands to accomplish with this task

Export current job yaml

# oc project openshift-infra
# oc get --export job hawkular-metrics-schema -o yaml > job.yaml

Delete old Job

# oc delete job hawkular-metrics-schema

Scale down hawkular metrics

 oc scale rc hawkular-metrics --replicas=0

Create new job instance

# oc create -f job.yml
# oc get job

After the job success, scale up hawkular metrics

 oc scale rc hawkular-metrics --replicas=1

This should to fix the issue related to the missing Keyspace hawkular_metrics

All 23 comments

Having same issue.

2018-04-10 22:55:07,646 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-04-10 22:55:07,646 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-04-10 22:55:17,648 INFO [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist

No persistent storage implemented at this time.
I see following entries in cassandra log

INFO [SharedPool-Worker-12] 2018-04-10 23:45:02,263 MigrationManager.java:309 - Create new Keyspace: KeyspaceMetadata{name=openshift_metrics, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}}, tables=[], views=[], functions=[], types=[]}

Should it be named as name=hawkular_metrics instead name=openshift_metrics?

EDITED:
Control node: A
ansible 2.4.2.0 installed.
playbook run successfully.
metrics pod yelling "Keyspace hawkular_metrics does not exist"

Control node:B
ansible 2.4.1.0 installed.
playbook run successfully.
metrics up and running without issue.

Same issue...

@ewolinetz PTAL

@jsanda have you seen this before?

Same problem here.
For what i could see in the log, cassandra is creating openshift_metrics keyspace and hawkular is looking for hawkular_metrics keyspace.
Anybody knows how to change this config?

Same issue. I was fighting with this since last 2 days. Please let us know if any changes were made it will be a great help.
Version 3.7.

2018-04-14 03:53:52,615 INFO [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-04-14 03:53:52,615 INFO [sun.misc.Version] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-04-14 03:54:02,619 INFO [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-04-14 03:54:02,619 INFO [sun.misc.Version] (metricsservice-lifecycle-thread) Trying again in 10000 ms

I just got it working from changing the hawkular-metrics docker image version from docker.io/openshift/origin-metrics-hawkular-metrics:latest to docker.io/openshift/origin-metrics-hawkular-metrics:v3.7.1. I saw the latest image was pushed 17 days ago. I tried the this version specific and everything works fine. I suspect something with the new image. Please try hope It works for you all too.

Edited: hawkular-metrics yaml
Changed : docker.io/openshift/origin-metrics-hawkular-metrics:latest to docker.io/openshift/origin-metrics-hawkular-metrics:v3.7.1

You can handle it in a more proper way adding the following line to your hosts file
openshift_metrics_image_version=v3.7.1

@adawolfs thinks , openshift origin version 3.7, must openshift_metrics_image_version=v3.7.1...

and you should nerver forget set Readiness Probe delay, initstal db need some times.

openshift 3.10

metrics v3.10.0-rc.0

oc delete pod hawkular-cassandra-1-6cdlf  hawkular-metrics-ck8gg heapster-x8724

Remove the metrics related pod and automatically create a pod exception

Missing part of the initialization data is suspected

2018-09-03 09:51:23,409 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-03 09:51:23,410 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-03 09:51:33,534 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-03 09:51:33,535 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-03 09:51:43,550 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-03 09:51:43,551 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-03 09:51:53,854 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-03 09:51:53,854 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-03 09:52:03,920 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-03 09:52:03,920 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms

I'm hitting the same problem. I have the same versions as ss75710541:

Openshift 3.10
metrics v3.10.0-rc.0

Anyone found a resolution yet?

I have just installed a multi master cluster 3.11 and have the same issues. The images with tag v3.11.0 are just an hour old.
Would really appreciate some hints on how to fix this.

I am seeing the same here with Openshift 3.10. - using images v3.11 for hawkular and friends due to lack of official 3.10 images.

I'm seeing same issue with latest 3.11 images as well

@openshift-bot can you assign ?

I'm seeing same issue with latest 3.11 images as well

same here

is there any feedback about the solution?

I've seen this issue a couple of times, this is usually triggered when the cassandra pod doesn't have a persistent volume so the initial setup gets lost when the pod is restarted, in order to fix it is necessary to run the job that comes with the installation, this will create some necessary data structures inside cassandra, use the following commands to accomplish with this task

Export current job yaml

# oc project openshift-infra
# oc get --export job hawkular-metrics-schema -o yaml > job.yaml

Delete old Job

# oc delete job hawkular-metrics-schema

Scale down hawkular metrics

 oc scale rc hawkular-metrics --replicas=0

Create new job instance

# oc create -f job.yml
# oc get job

After the job success, scale up hawkular metrics

 oc scale rc hawkular-metrics --replicas=1

This should to fix the issue related to the missing Keyspace hawkular_metrics

thanks for your response, i'll have a try.


发件人:Alvin Estrada notifications@github.com
发送时间:2019年8月6日(星期二) 13:12
收件人:openshift/openshift-ansible openshift-ansible@noreply.github.com
抄 送:gordonzhu zhuyanjin@aliyun.com; Comment comment@noreply.github.com
主 题:Re: [openshift/openshift-ansible] Metrics installation using playbook does not end up with a working installation (3.7) (#7883)

I've seen this issue a couple of times, this is usually triggered when the cassandra pod doesn't have a persistent volume so the initial setup gets lost when the pod is restarted, in order to fix it is necessary to run the job that comes with the installation, this will create some necessary data structures inside cassandra, use the following commands to accomplish with this task
Export current job yaml

oc project openshift-infra

oc get --export job hawkular-metrics-schema -o yaml > job.yaml

Delete old Job

oc delete job hawkular-metrics-schema

Scale down hawkular metrics
oc scale rc hawkular-metrics --replicas=0

Create new job instance

oc create -f job.yml

oc get job

After the job success, scale up hawkular metrics
oc scale rc hawkular-metrics --replicas=1

This should to fix the issue related to the missing Keyspace hawkular_metrics

You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings