What did you do?
I created a hello-world style ansible operator, which does nothing but failing the play with ansible's fail task.
I deployed the Operator, CRD and Custom Resource.
What did you expect to see?
I expect to see the operator firing consecutive reconcile loops with exponential back-off.
What did you see instead? Under which circumstances?
The operator failed once and did no further reconciles.
Environment
v0.13.0
Uses docker image from v0.13.0 operator SDK, did not test on the host
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+70132b0f13", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Minikube
Ansible
Additional context
This was also discussed in #2198
cc @fabianvf @camilamacedo86
here is the repository containing the PoC i used: https://github.com/glothriel/testoperator
I did:
kubectl apply -f deploy/crds/glothriel.com_testcrs_crd.yaml
kubectl apply -f deploy/
...waited till operator is running...
kubectl apply -f deploy/crds/glothriel.com_v1alpha1_testcr_cr.yaml
kubectl logs testoperator-79cb78ddfd-jpkc6 ansible -f
Hi @glothriel,
If you check here in the sample you will see that has few commands and checks that you can do to troubleshooting the operator. Could you please provide the following info?
kubectl get all -n <namesoace> kubectl logs deployment.apps/<projectname>-operator -c operator -n <namespace> and kubectl logsdeployment.apps/<projectname>-operator -c ansible -n <namespace>)PS.: You can check all steps to getting started with Ansible operators and do a small test here as check an example of a make target that will apply all files required for the ansible sample here. I will give a look in your POC as well as soon as possible.
Just to register here important info provided by fabian regards this scenario:

Following the logs from the POC provided by the user:
ANSIBLE
$ kubectl logs deployment.apps/testoperator -c ansible -n memcached
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/memcached/example-testcr/artifacts/6129484611666145821//stdout
ansible-playbook 2.9.2
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
PLAYBOOK: 26ce01ae023445d0939150b4e5a2c8c7 *************************************
1 plays in /tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/memcached/example-testcr/project/26ce01ae023445d0939150b4e5a2c8c7
PLAY [localhost] ***************************************************************
TASK [Gathering Facts] *********************************************************
ok: [localhost]
META: ran handlers
TASK [testcr : fail] ***********************************************************
task path: /opt/ansible/roles/testcr/tasks/main.yml:4
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failing the task!"}
PLAY RECAP *********************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
OPERATOR
$ kubectl logs deployment.apps/testoperator -c operator -n memcached
{"level":"info","ts":1578511996.250356,"logger":"cmd","msg":"Go Version: go1.13.5"}
{"level":"info","ts":1578511996.2504823,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1578511996.2505329,"logger":"cmd","msg":"Version of operator-sdk: v0.13.0"}
{"level":"info","ts":1578511996.2506163,"logger":"cmd","msg":"Watching namespace.","Namespace":"memcached"}
{"level":"info","ts":1578511996.6046686,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1578511996.6052597,"logger":"watches","msg":"Environment variable not set; using default value","envVar":"WORKER_TESTCR_GLOTHRIEL_COM","default":1}
{"level":"info","ts":1578511996.6054509,"logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_TESTCR_GLOTHRIEL_COM","default":2}
{"level":"info","ts":1578511996.6057112,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"glothriel.com","Options.Version":"v1alpha1","Options.Kind":"TestCR"}
{"level":"info","ts":1578511996.6059678,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1578511996.980931,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1578511996.985406,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1578511997.7137773,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"testoperator-metrics","Service.Namespace":"memcached"}
{"level":"info","ts":1578511997.7169285,"logger":"proxy","msg":"Starting to serve","Address":"127.0.0.1:8888"}
{"level":"info","ts":1578511997.7170482,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1578511997.7202744,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"testcr-controller","source":"kind source: glothriel.com/v1alpha1, Kind=TestCR"}
{"level":"info","ts":1578511997.72134,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"testcr-controller"}
{"level":"info","ts":1578511997.827706,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"testcr-controller","worker count":1}
{"level":"info","ts":1578512000.9034858,"logger":"logging_event_handler","msg":"[playbook task]","name":"example-testcr","namespace":"memcached","gvk":"glothriel.com/v1alpha1, Kind=TestCR","event_type":"playbook_on_task_start","job":"6129484611666145821","EventData.Name":"Gathering Facts"}
{"level":"info","ts":1578512003.2753904,"logger":"logging_event_handler","msg":"[playbook task]","name":"example-testcr","namespace":"memcached","gvk":"glothriel.com/v1alpha1, Kind=TestCR","event_type":"playbook_on_task_start","job":"6129484611666145821","EventData.Name":"testcr : fail"}
{"level":"error","ts":1578512003.2929928,"logger":"logging_event_handler","msg":"","name":"example-testcr","namespace":"memcached","gvk":"glothriel.com/v1alpha1, Kind=TestCR","event_type":"runner_on_failed","job":"6129484611666145821","EventData.Task":"fail","EventData.TaskArgs":"","EventData.FailedTaskPath":"/opt/ansible/roles/testcr/tasks/main.yml:4","error":"[playbook task failed]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/events.loggingEventHandler.Handle\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/events/log_events.go:84"}
{"level":"error","ts":1578512003.4869735,"logger":"runner","msg":"ansible-playbook 2.9.2\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\n\r\nPLAYBOOK: 26ce01ae023445d0939150b4e5a2c8c7 *************************************\n1 plays in /tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/memcached/example-testcr/project/26ce01ae023445d0939150b4e5a2c8c7\n\r\nPLAY [localhost] ***************************************************************\n\r\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\nMETA: ran handlers\n\r\nTASK [testcr : fail] ***********************************************************\r\ntask path: /opt/ansible/roles/testcr/tasks/main.yml:4\nfatal: [localhost]: FAILED! => {\"changed\": false, \"msg\": \"Failing the task!\"}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 \r\n\n","job":"6129484611666145821","name":"example-testcr","namespace":"memcached","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:217"}
See that the imp is in : (roles/testcr/tasks/main.yml)
---
# tasks file for testcr
- fail:
msg: Failing the task!
See: ,"event_type":"runner_on_failed","job":"6129484611666145821","EventData.Task":"fail","
@fabianvf I also understand that is a bug. Am I right? Since the EventData is returning fail it should not it retrigger the reconcile?
@camilamacedo86 yeah that's a bug, it should definitely requeue
This is also a convenient feature during development :), would it be possible to make this behavior optional?
Then I think we cannot do both ( fix and rfe ) in the same PR. Could you please create a new issue with your request for we are able to discuss this matter there?
Thanks @camilamacedo86, great work. Is this fix already released in some version? If not, when will it be?
Most helpful comment
@camilamacedo86 yeah that's a bug, it should definitely requeue