Openshift-ansible: 3.9 Service Catalog health check failing

Created on 21 Apr 2018 · 10Comments · Source: openshift/openshift-ansible

With a fresh v3.9 install I see my playbook fails like so:

fatal: [m01.example.com]: FAILED! => {"attempts": 4, "changed": false, "connection": "close", "content": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed\n", "content_length": "180", "content_type": "text/plain; charset=utf-8", "date": "Fri, 20 Apr 2018 21:50:25 GMT", "msg": "Status code was 500 and not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}

lifecyclrotten

Source

ewolinetz

👍1

Most helpful comment

This was definitely an issue previously and should be fixed in the latest versions.

michaelgugino on 12 Jun 2018

👍2

All 10 comments

I am getting this issue as well. I found the following information:

that the template being used to create the daemonsets in my case seems to be using the wrong etcd value for etcd_servers. it looks like its using the msaters[0] host but my etcd isnt co-located.
I have been unable to overwrite this etcd_servers variable in the inventory file.
This issue didnt occur when our etcd servers were colocated on masters as masters[0] was a valid etcd url.
When i manually change the incorrect value used in the daemonSet to a valid etcd url the healthcheck passes

EDIT: here is the line i believe the error is making it in through: https://github.com/openshift/openshift-ansible/blob/2fc279f23c7d0a7375b6cfd5083d1f810a87e003/roles/openshift_service_catalog/tasks/install.yml#L121

JayKayy on 10 May 2018

@JayKayy, @ewolinetz - do you recall if this issue was seen in Origin or OCP or what specific version you were working with at the time? Or have you seen it since? We had several fixes in this area, the latest was fixed in openshift-ansible-3.9.21-1 which was tagged on April 13.

https://github.com/openshift/openshift-ansible/pull/7915 - Add oo_etcd_to_config to service_catalog
https://github.com/openshift/openshift-ansible/pull/7568 - Remove etcd_hosts and etcd_urls from openshift_facts

@michaelgugino can you review, does this look like the issues found prior to #7887 (delivered to 3.9 as #7915)

jboyd01 on 12 Jun 2018

@jboyd01 I had this issue on OCPv3.9.14. I have not performed another install to verify a fix unfortunately. I will update when i do perform another although no more are currently planned in the coming weeks.

JayKayy on 12 Jun 2018

@JayKayy that's a great answer, thanks very much.

jboyd01 on 12 Jun 2018

@jboyd01 I can't remember at this time, unfortunately. I did find that I was having some DNS related issues in my local environment, so I'm not sure if they were due to that or something else.

I will try to do another fresh 3.9 install and report back.

ewolinetz on 12 Jun 2018

👍1

This was definitely an issue previously and should be fixed in the latest versions.

michaelgugino on 12 Jun 2018

👍2

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 25 May 2020

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 24 Jun 2020

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 24 Jul 2020

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.