What did you do?
I am trying to an Ansible k8s_exec module, which allows running the equivalent of kubectl exec commands to exec a command on a Pod via Ansible through the Python Kubernetes library. This allows me to write a task like:
- name: Test a simple command.
k8s_exec:
namespace: '{{ meta.namespace }}'
pod: '{{ tower_pod_name }}'
command: date
Instead of installing kubectl on my operator image (added COPY --from=lachlanevenson/k8s-kubectl:v1.16.2 /usr/local/bin/kubectl /usr/local/bin/kubectl to my build/Dockerfile) and writing a task like:
- name: Test kubectl exec.
command: >
kubectl exec -n {{ meta.namespace }} {{ tower_pod_name }} date
What did you expect to see?
When I run the same task as above on my system Ansible against a Kubernetes cluster, or even inside of the operator Pod's ansible container using ansible-playbook to run it, it executes successfully and registers the result of the command that was executed.
What did you see instead? Under which circumstances?
When it is run via the operator/ansible-runner using the operator's proxy, it results in the following error:
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 200 OK
It should be getting a 101 response from the Kubernetes API websocket.
Full error message from the failed task:
TASK [tower : Test a simple command.] ******************************************
task path: /opt/ansible/roles/tower/tasks/main.yml:38
line 136, in <module>\n File \"/tmp/ansible_k8s_exec_payload_x8PPAd/__main__.py\", line 123, in main\n File \"/usr/lib/python2.7/site-packages/kubernetes/stream/stream.py\", line 32, in stream\n return func(*args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py\", line 835, in connect_get_namespaced_pod_exec\n (data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs)\n File \"/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py\", line 935, in connect_get_namespaced_pod_exec_with_http_info\n collection_formats=collection_formats)\n File \"/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py\", line 321, in call_api\n _return_http_data_only, collection_formats, _preload_content, _request_timeout)\n File \"/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py\", line 155, in __call_api\n _request_timeout=_request_timeout)\n File \"/usr/lib/python2.7/site-packages/kubernetes/str
eam/stream.py\", line 27, in _intercept_request_call\n return ws_client.websocket_call(config, *args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/kubernetes/stream/ws_client.py\", line 255, in websocket_call\n raise ApiException(status=0, reason=str(e))\nkubernetes.client.rest.ApiException: (0)\nReason: Handshake status 200 OK\n\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
Environment
v0.11.0
N/A
1.16.2
Molecule
ansible
Possible Solution
N/A
Additional context
Relates to: https://github.com/geerlingguy/tower-operator/issues/5
Note that using kubectl exec works fine inside the operator, even if I set the KUBECONFIG environment variable to the same as the ansible tasks are using with:
- name: Test kubectl exec.
command: >
kubectl exec -n {{ meta.namespace }} {{ tower_pod_name }} date
environment:
KUBECONFIG: '{{ lookup("env", "KUBECONFIG") }}'
Hi @geerlingguy,
Before any further analyse and check could you please check it with the latest version of SDK. I mean, could you upgrade your project to use SDK 0.12? Or let us know if you are able to reproduce this scenario using the Memcached sample?
Also, I checked that:
File \"/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py\
So, note that the python version was upgrade to 3. Please, could you ensure that your project was upgraded properly and you are using python 3 in the env where it has been executed?
@camilamacedo86 - Thanks for the suggestion! I'll definitely upgrade and test things (see linked issue above)—I hope to get to this soon.
@camilamacedo86 - I just reproduced the same error on v0.12.0, as well as the current latest version, v0.14.0. Steps to reproduce (requires Molecule, Ansible, and Minikube installed locally):
$ git clone https://github.com/geerlingguy/tower-operator.git
$ git checkout k8s_exec
$ minikube start --memory 6g --cpus 4
$ molecule test -s test-minikube
# while that's running, when you get to reconciliation, in another terminal, run:
$ kubectl logs -f -l name=tower-operator -c ansible
The operator playbook runs but keeps failing at the k8s_exec task with a message that ends like:
...
File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/stream.py\", line 27, in _intercept_request_call
return ws_client.websocket_call(config, *args, **kwargs)
File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py\", line 255, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 200 OK
I was speaking with @fabianvf on Slack and he mentioned that the likely problem is the Ansible Operator HTTP proxy that is injected between Kubernetes' API and the operator itself is not handling websockets requests correctly (thus we get this error with the 200 OK handshake—it should be continuing on and streaming the response to Python, which it is not).
If found this issue upstream in the client-go/rest package: https://github.com/kubernetes/client-go/issues/45 — it seems that issue went stale and was automatically closed.
There's a 2017 blog post linked with a workaround: Writing a Custom Kubectl Exec Command, and there's an HTTPWrappersForConfig function that is "exposed to allow more clients that need HTTP-like behavior but then must hijack the underlying connection (like WebSocket or HTTP2 clients)."
It would be nice if we could make proxy.go (https://github.com/operator-framework/operator-sdk/blob/master/pkg/ansible/proxy/proxy.go) work with websockets, and also add a test case that uses exec on a pod in https://github.com/operator-framework/operator-sdk/blob/master/pkg/ansible/proxy/proxy_test.go