With a fresh v3.9 install I see my playbook fails like so:
fatal: [m01.example.com]: FAILED! => {"attempts": 4, "changed": false, "connection": "close", "content": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed\n", "content_length": "180", "content_type": "text/plain; charset=utf-8", "date": "Fri, 20 Apr 2018 21:50:25 GMT", "msg": "Status code was 500 and not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}
I am getting this issue as well. I found the following information:
etcd_servers. it looks like its using the msaters[0] host but my etcd isnt co-located.etcd_servers variable in the inventory file.When i manually change the incorrect value used in the daemonSet to a valid etcd url the healthcheck passes
EDIT: here is the line i believe the error is making it in through: https://github.com/openshift/openshift-ansible/blob/2fc279f23c7d0a7375b6cfd5083d1f810a87e003/roles/openshift_service_catalog/tasks/install.yml#L121
@JayKayy, @ewolinetz - do you recall if this issue was seen in Origin or OCP or what specific version you were working with at the time? Or have you seen it since? We had several fixes in this area, the latest was fixed in openshift-ansible-3.9.21-1 which was tagged on April 13.
https://github.com/openshift/openshift-ansible/pull/7915 - Add oo_etcd_to_config to service_catalog
https://github.com/openshift/openshift-ansible/pull/7568 - Remove etcd_hosts and etcd_urls from openshift_facts
@michaelgugino can you review, does this look like the issues found prior to #7887 (delivered to 3.9 as #7915)
@jboyd01 I had this issue on OCPv3.9.14. I have not performed another install to verify a fix unfortunately. I will update when i do perform another although no more are currently planned in the coming weeks.
@JayKayy that's a great answer, thanks very much.
@jboyd01 I can't remember at this time, unfortunately. I did find that I was having some DNS related issues in my local environment, so I'm not sure if they were due to that or something else.
I will try to do another fresh 3.9 install and report back.
This was definitely an issue previously and should be fixed in the latest versions.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen.
Mark the issue as fresh by commenting/remove-lifecycle rotten.
Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
This was definitely an issue previously and should be fixed in the latest versions.