how to implement a specific feature
What did you do?
When deleting a Custom Resource(CR) object the relevant playbook is executed (as per my finalizer dict config in watches.yml) however the CR is deleted from k8s no matter what happens in the playbook i.e. even if it fails. This causes inconsistent state, in which CR is no longer present in k8s, but the underlaying/related resources which live outside of the cluster are not really cleaned up properly. I was hoping that if the playbook fails, then the finalizer is not going to be removed, which will keep the resource object in k8s. If that would be the case, then I could use k8_status from the playbook to flag to end user that deletion and related cleanup actions didn't succeed and the user would be able to manually clean up things and then edit the resource and remove the finalizer manually for the delete process to complete. Could you please advise how to achieve this?
BTW> I'm using "manageStatus: False"
What did you expect to see?
CR not deleted from k8s in case finalizer linked playbook fails
What did you see instead? Under which circumstances?
CR is deleted from k8s no matter that finalizer linked playbook failed to complete all its actions
Environment
insert release or Git SHA here
Kubernetes version information:
insert output of kubectl version here
Kubernetes cluster kind:
Additional context
Add any other context about the question here.
Testing this further it looks like behavior may depend on the type of a problem during Ansible playbook run. The behavior I described above was when there was Ansible playbook error with syntax, however when there's no error but just a task failure, then it works as expected.
So you have established that when the Ansible runs successfully but with an error, it behaves as you expect. But the fact that you had a syntax error is interesting. While a syntax error is the most likely culprit, I doubt it is the only way that it could fail.
There are two scenarios where ansible-runner (ansible-playbook under the covers) fails that should be considered:
What concerns me is that it appears that the run is being treated as successful when the actual command fails.
The only question that remains for me is what should the expected behavior be when the ansible command fails? My guess is that the ansible controller _should_:
return reconcileResult, errIn general one should thoroughly test the playbook, so that playbook fatal errors don't happen.. but on the other hand you never know, and there could always be this one corner case like something in the environment changed, something you didn't thought about, which would cause Ansible to error out... It would be lot safer if the framework could catch any type of errors and flag it to the user through the resource status, without deleting it silently.
Hi @tomsucho,
Could you please advise how to achieve this?
BTW> I'm using "manageStatus: False"
Are you using manageStatus: false? Am I right? Then, it means that the operator will ignore any event error faced. See:

Ref: https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/user-guide.md
However, if you face the same whiteout this configuration, could you please provide a simple POC like the Memcached example for we can see and reproduce your scenario? Or just the steps with snippets of the code of the setup that you are doing. And then, let us know what you are facing and what would you like to see instead of?
@camilamacedo86 I think @djzager summed it up nicely. It looks like we may need some extra handling of errors (other than just logging) when the ansible-runner process exits with a non-zero exist status.
/cc @fabianvf
HI @joelanford and @djzager,
In the first comment on this issue, we have the affirmation that they are using "manageStatus: False".Then, it will make the operator ignore any event which came from the tasks that face failures and will finalize as he describes. See:
The only question that remains for me is what should the expected behavior be when the ansible command fails? My guess is that the ansible controller should:
Get notified of the error by the runner (ie. runner needs to be updated to bubble up the failure)
Put something on the status of the resource because it failed
Get requeued return reconcileResult, err
Then, if the ansible command fails I understand that it means that no event.Status was returned. Am I right? So, see follows that it will be marked with Failed to get ansible-runner stdout and retrigger the reconcile.
In this way, if I misunderstand something here, could you please provide the steps to reproduce the scenario where the "manageStatus: False" is NOT used and the playbooks fails whiteout re-trigger the reconcile?
@camilamacedo86 I do have "manageStatus: False". However from that I can see in my testing when there is something wrong and task fails, then the resource is not deleted and reconciliation loop still goes on. I think this principle works also when creating resources, no matter the manageStatus setting and basically if something fails, then the reconciliation will keep going on retrying (even if there is no reconcilePeriod set explicitly). This is great to have as then I can simply catch the failure in my playbook and use k8_status module to update resource Status Conditions accordingly(since it is not auto-managed), to hint the user that deletion didn't work fine. However when I made indentation issue, which wasn't flagged by my editor which resulted in ansible ERROR like follows:
`task path: /opt/ansible/my-playbook.yml:6
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/my-role/tasks/main.yml': line 13, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something here
^ here
PLAY RECAP ***********************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0`
then the resource was deleted from k8s despite my resource removal playbook didn't even start as per above output..
I wanted to flag this since while this type of errors I caused myself, can be easily catched during dev cycle, maybe there can be some other error situations out there that will happen only during runtime and then not catching these errors and silently deleting resources without really executing the playbook will lead to inconsistencies. And I suspect same situation when creating resources, if same type of Ansible ERROR happened. It would be good to cover for this even if manageStatus: false I think.
Hi @tomsucho,
Really tks for your info and I will try to create a POC for we reproduce the scenario and check it as well.
c/c @joelanford @djzager @fabianvf
Hi @tomsucho,
I did a POC with a syntax error in the Ansible task, and I could check that:
Failed to get ansible-runner stdout will be faced as described in https://github.com/operator-framework/operator-sdk/issues/2546#issuecomment-585638370manageStatus: False in this scenario. You can see all the details below.
Then, could you please input here the content of your build/Dockerfile?
PS.: I am asking that because in the 0.14.0 version of SDK we did a fix related to the issue of the reconcile not be re-trigged in failure scenarios. See in the CHANGELOG. You described that you are using the operator-sdk version 0.15.1. However, was the project scaffolded with this version and/or upgrade to use the 0.15.1 ansible image?
Following my POC.
roles/testcr/tasks/main.yml)---
# tasks file for testcr
- msg: invalid sintax
include_tasks: include_tasks:
- version: v1alpha1
group: glothriel.com
kind: TestCR
role: testcr
Then, if the ansible command fails I understand that it means that no
event.Statuswas returned. Am I right? So, see follows that it will be marked with Failed to get ansible-runner stdout and retrigger the reconcile.
Following the logs (kubectl logs deployment.apps/testoperator -c operator -n default)
$ kubectl logs deployment.apps/testoperator -c operator -n default
{"level":"info","ts":1581591309.5070565,"logger":"cmd","msg":"Go Version: go1.13.3"}
{"level":"info","ts":1581591309.5071228,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1581591309.5071518,"logger":"cmd","msg":"Version of operator-sdk: v0.15.0+git"}
{"level":"info","ts":1581591309.507189,"logger":"cmd","msg":"Watching namespace.","Namespace":"default"}
{"level":"info","ts":1581591309.828779,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1581591309.8295484,"logger":"watches","msg":"Environment variable not set; using default value","envVar":"WORKER_TESTCR_GLOTHRIEL_COM","default":1}
{"level":"info","ts":1581591309.8299675,"logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_TESTCR_GLOTHRIEL_COM","default":2}
{"level":"info","ts":1581591309.8303409,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"glothriel.com","Options.Version":"v1alpha1","Options.Kind":"TestCR"}
{"level":"info","ts":1581591309.8307326,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1581591310.1490538,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1581591310.1549094,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1581591310.7842658,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"testoperator-metrics","Service.Namespace":"default"}
{"level":"info","ts":1581591311.0901892,"logger":"cmd","msg":"Could not create ServiceMonitor object","Namespace":"default","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1581591311.0908258,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","Namespace":"default","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1581591311.0921226,"logger":"proxy","msg":"Starting to serve","Address":"127.0.0.1:8888"}
{"level":"info","ts":1581591311.092865,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"testcr-controller","source":"kind source: glothriel.com/v1alpha1, Kind=TestCR"}
{"level":"info","ts":1581591311.09327,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"testcr-controller"}
{"level":"info","ts":1581591311.094092,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1581591311.1940362,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"testcr-controller","worker count":1}
{"level":"error","ts":1581591313.8124533,"logger":"runner","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! Syntax Error while loading YAML.\r\n mapping values are not allowed here\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n- msg: invalid sintax\r\n include_tasks: include_tasks:\r\n ^ here\r\n","job":"6129484611666145821","name":"example-testcr","namespace":"default","error":"exit status 4","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:223"}
{"level":"error","ts":1581591313.8128512,"logger":"reconciler","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! Syntax Error while loading YAML.\r\n mapping values are not allowed here\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n- msg: invalid sintax\r\n include_tasks: include_tasks:\r\n ^ here\r\n","job":"6129484611666145821","name":"example-testcr","namespace":"default","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:197\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1581591313.8130338,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"testcr-controller","request":"default/example-testcr","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1581591317.3889155,"logger":"runner","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! Syntax Error while loading YAML.\r\n mapping values are not allowed here\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n- msg: invalid sintax\r\n include_tasks: include_tasks:\r\n ^ here\r\n","job":"4037200794235010051","name":"example-testcr","namespace":"default","error":"exit status 4","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:223"}
{"level":"error","ts":1581591317.3891814,"logger":"reconciler","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! Syntax Error while loading YAML.\r\n mapping values are not allowed here\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n- msg: invalid sintax\r\n include_tasks: include_tasks:\r\n ^ here\r\n","job":"4037200794235010051","name":"example-testcr","namespace":"default","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:197\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1581591317.3893642,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"testcr-controller","request":"default/example-testcr","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
Look that the logs has:
: nERROR! Syntax Error while loading YAML
And
,"error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.
And then, see that the reconcile still been retrigged forever by checking the Ansible logs too.
( kubectl logs deployment.apps/testoperator -c ansible -n default )
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/3510942875414458836//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! Syntax Error while loading YAML.
mapping values are not allowed here
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/2933568871211445515//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! Syntax Error while loading YAML.
mapping values are not allowed here
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/4324745483838182873//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! Syntax Error while loading YAML.
mapping values are not allowed here
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/2610529275472644968//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! Syntax Error while loading YAML.
mapping values are not allowed here
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/2703387474910584091//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! Syntax Error while loading YAML.
mapping values are not allowed here
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 5, column 18, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- msg: invalid sintax
include_tasks: include_tasks:
^ here
ManageStatus=False as well. (watches.yaml)---
- version: v1alpha1
group: glothriel.com
kind: TestCR
role: testcr
manageStatus: False
c/c @joelanford @djzager
@camilamacedo86 what did the last Ansible line in PLAY RECAP looked like in your case? Was it same as mine?
PLAY RECAP *********************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
as you can see it looks like everything went fine, though it did not of course. When I tried to do a syntax error in other places it worked as well, like it would be treated just as a task fail not really general Ansible error. Maybe to be on same page you can try a block like I did:
- name: Do something
block:
include_tasks: other-tasks.yml
Regarding:
You described that you are using the operator-sdk version 0.15.1. However, was the project scaffolded with this version and/or upgrade to use the 0.15.1 ansible image?
project was scaffolded with v0.13.0 I think and then I just bumped up the image tag to 0.15.1 on the operator. I guess that was not the right thing to do? Was going to ask about that in another question, but forgot in the end occupied with testing other stuff. How would I upgrade existing project then?
Hi @tomsucho,
I did the test with your snippet code and it also worked as my POC which means that the reconcile was re-trigged. Then, all lead me to believe that the root cause here is the issue solved at 0.14.0 and the fact that you did not upgrade your project.

So, could you please add here the content of your build/Dockerfile?
Following the logs with your snippet code
{"level":"info","ts":1581601032.9016247,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","Namespace":"default","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1581601032.903348,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"testcr-controller","source":"kind source: glothriel.com/v1alpha1, Kind=TestCR"}
{"level":"info","ts":1581601032.9043787,"logger":"proxy","msg":"Starting to serve","Address":"127.0.0.1:8888"}
{"level":"info","ts":1581601032.9045417,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1581601033.0064297,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"testcr-controller"}
{"level":"info","ts":1581601033.0066123,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"testcr-controller","worker count":1}
{"level":"error","ts":1581601035.620032,"logger":"runner","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! 'include_tasks' is not a valid attribute for a Block\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n\r\n- name: Do something\r\n ^ here\r\n","job":"6129484611666145821","name":"example-testcr","namespace":"default","error":"exit status 4","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:223"}
{"level":"error","ts":1581601035.6203415,"logger":"reconciler","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! 'include_tasks' is not a valid attribute for a Block\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n\r\n- name: Do something\r\n ^ here\r\n","job":"6129484611666145821","name":"example-testcr","namespace":"default","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:197\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1581601035.620502,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"testcr-controller","request":"default/example-testcr","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1581601039.4677804,"logger":"runner","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! 'include_tasks' is not a valid attribute for a Block\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n\r\n- name: Do something\r\n ^ here\r\n","job":"4037200794235010051","name":"example-testcr","namespace":"default","error":"exit status 4","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:223"}
{"level":"error","ts":1581601039.4680433,"logger":"reconciler","msg":"ansible-playbook 2.9.4\r\n config file = /etc/ansible/ansible.cfg\r\n configured module search path = ['/usr/share/ansible/openshift']\r\n ansible python module location = /usr/local/lib/python3.6/site-packages/ansible\r\n executable location = /usr/local/bin/ansible-playbook\r\n python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]\r\nUsing /etc/ansible/ansible.cfg as config file\r\nERROR! 'include_tasks' is not a valid attribute for a Block\r\n\r\nThe error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may\r\nbe elsewhere in the file depending on the exact syntax problem.\r\n\r\nThe offending line appears to be:\r\n\r\n\r\n- name: Do something\r\n ^ here\r\n","job":"4037200794235010051","name":"example-testcr","namespace":"default","error":"did not receive playbook_on_stats event","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:197\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachiner
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/6334824724549167320//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/605394647632969758//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/1443635317331776148//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/894385949183117216//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/2775422040480279449//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/4751997750760398084//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/7504504064263669287//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/1976235410884491574//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
The error appears to be in '/opt/ansible/roles/testcr/tasks/main.yml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/3510942875414458836//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'include_tasks' is not a valid attribute for a Block
a) You will always need to upgrade the build/Dockerfile with the image of the release
b) You may need or would like to also upgrade scaffold files since they can be changed. Note that, sometimes they can indeed have some breaking change.
IMPORTANT: We had a daily to add the steps in the migration guide for the 0.14 and 0.15 release. However, you can check the open pr's for that. See: https://github.com/operator-framework/operator-sdk/pull/2516 and https://github.com/operator-framework/operator-sdk/pull/2519.
IHMO: The easy and more accurate path is to scaffold a new Ansible project and then, just compare the files to see what changed and be able to get the new changes.
Hi @tomsucho,
Please, let us know if you could upgrade your project to use 0.15.1 and confirm that the issue this behaviour is no longer faced.
Well, like I wrote, I did just update Dockerfile to use latest release image i.e. v0.15.1
$ cat build/Dockerfile
FROM quay.io/operator-framework/ansible-operator:v0.15.1
USER root
COPY watches.yaml ${HOME}/watches.yaml
COPY *-playbook.yml ${HOME}/
COPY roles/ ${HOME}/roles/
I also examined the file structure and it looks like all that gets scaffolded is just watches file, plus some other files for deployment&build and ansible roles, none of which should have any impact on how operator behaves at runtime with regards to the issue I observed... so I don't think difference in the behavior comes from the fact I did create initially the project with v0.13.0.
Also scaffolding it once again would require to redo lot of manual changes I had to do to, like renaming playbooks, roles and etc. This kind of upgrade is sort of a no-go for me.. I thought the only thing will be to bump up version on ansible-operator image, where all the code changes and bug-fixes really go, and scaffolding would be only some file structure/naming like changes or deployment files, which I could always do manually of course if required. I did upgrade my sdk to v0.15.1 and run a test project creation and honestly can't see any new file added that could change runtime behavior, again just bunch of folders and yaml files?
15:58 $ operator-sdk new test --type ansible --kind Test --api-version test.io/v1alpha1
INFO[0000] Creating new Ansible operator 'test'.
INFO[0000] Created deploy/service_account.yaml
INFO[0000] Created deploy/role.yaml
INFO[0000] Created deploy/role_binding.yaml
INFO[0000] Created deploy/crds/test.io_v1alpha1_test_cr.yaml
INFO[0000] Created build/Dockerfile
INFO[0000] Created roles/test/README.md
INFO[0000] Created roles/test/meta/main.yml
INFO[0000] Created roles/test/files/.placeholder
INFO[0000] Created roles/test/templates/.placeholder
INFO[0000] Created roles/test/vars/main.yml
INFO[0000] Created molecule/test-local/playbook.yml
INFO[0000] Created roles/test/defaults/main.yml
INFO[0000] Created roles/test/tasks/main.yml
INFO[0000] Created molecule/default/molecule.yml
INFO[0000] Created build/test-framework/Dockerfile
INFO[0000] Created molecule/test-cluster/molecule.yml
INFO[0000] Created molecule/default/prepare.yml
INFO[0000] Created molecule/default/playbook.yml
INFO[0000] Created build/test-framework/ansible-test.sh
INFO[0000] Created molecule/default/asserts.yml
INFO[0000] Created molecule/test-cluster/playbook.yml
INFO[0000] Created roles/test/handlers/main.yml
INFO[0000] Created watches.yaml
INFO[0000] Created deploy/operator.yaml
INFO[0000] Created .travis.yml
INFO[0000] Created molecule/test-local/molecule.yml
INFO[0000] Created molecule/test-local/prepare.yml
INFO[0000] Generated CustomResourceDefinition manifests.
INFO[0000] Project creation complete.
@camilamacedo86 so maybe the issue then only happens when deleting the resource? I can't see finalizers in your watches files? The issue I originally logged here was about the delete case, not create. That's the only difference I can see, the other one is that I'm also using "reconcilePeriod: 5m" but that shouldn't be related I guess...
Hi @tomsucho,
Let's step by step. We could confirm that: ( just to summary )
quay.io/operator-framework/ansible-operator:v0.15.1, we can confirm that you are using the 0.15.1Then, I also created a POC to verify the behaviour with the finalizer. All worked as expected. However, in my tests, I could find out and reproduce your scenario.
In your case, the first task, not that one which is called in the finalizer, is that one which has the syntax issue. So, the operator does not see any blocker/error to remove the CR and it shows the expected behaviour so far. Following the details
Poc with finalizer
1) Create one role that is working ( IMPORTANT to ensure that the role is working with success and has no ansible error/syntax errors )
---
# tasks file for testcr
- name: task A
debug:
msg: "Watch task!"
---
- name: Do something
block:
include_tasks: other-tasks.yml
3) Create a watch which will call the playbook with the syntax error playbook in the finalizer.
- version: v1alpha1
group: glothriel.com
kind: TestCR
role: testcr
finalizer:
name: finalizer.glothriel.com
playbook: playbook.yaml
4) Add the playbook line in the Dockerfile
COPY playbook.yaml ${HOME}/playbook.yaml
Result: The CR is not deleted !
$ kubectl logs deployment.apps/testoperator -c ansible -n default
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/6129484611666145821//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
PLAYBOOK: f1e97fe0cf2f4dea8b8ea2e577c3cbec *************************************
1 plays in /tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/project/f1e97fe0cf2f4dea8b8ea2e577c3cbec
PLAY [localhost] ***************************************************************
TASK [Gathering Facts] *********************************************************
ok: [localhost]
META: ran handlers
TASK [testcr : task A] *********************************************************
task path: /opt/ansible/roles/testcr/tasks/main.yml:4
ok: [localhost] => {
"msg": "Watch task!"
}
META: ran handlers
META: ran handlers
PLAY RECAP *********************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
camilamacedo@Camilas-MacBook-Pro ~/go/src/testoperator (master) $ kubectl delete -f deploy/crds/glothriel.com_v1alpha1_testcr_cr.yaml
testcr.glothriel.com "example-testcr" deleted
See here that the playbook task with errors of syntax is called many times.
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/6334824724549167320//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/605394647632969758//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/1443635317331776148//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/894385949183117216//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/2775422040480279449//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/4751997750760398084//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/7504504064263669287//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
/tmp/ansible-operator/runner/glothriel.com/v1alpha1/TestCR/default/example-testcr/artifacts/1976235410884491574//stdout
ansible-playbook 2.9.4
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
ERROR! 'block' is not a valid attribute for a Play
The error appears to be in '/opt/ansible/playbook.yaml': line 4, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Do something
^ here
Also, I found a point that in POV we need to address. I'd expected to see the error in the CR status and I am not. I will raise a task for we check it. However, it has no relation at all with the question/issue raised here. See: https://github.com/operator-framework/operator-sdk/issues/2565
Is it make sense? Could you please let me know if we can close this one as sorted out?
the only other thing I can think of now which is different in my setup to yours is that I call a role from the main playbook and that role has a block which fails with ERR. Can't think of why it would cause any difference though.. Also you did not show PALY RECAP line from Ansible. If it has failed=1 then it will work fine as it works fine for me too. The only problem is when Ansible reports this:
PLAY RECAP ********************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Anyways, I don't want to bother you forever with this case, it might be something weird I implemented and so it will be hard for you to reproduce. If you say code review looks correct and proper handling is there for errors, we should be fine. If there's a problem it will surface again sooner or later, and maybe then it will be easier to debug (hopefully). So feel free to close this. Thanks for your help!
HI @tomsucho,
Sorry, if the above information was not clear enough. Note that I could figure out your scenario and reproduce it besides verify the behaviour of the finalizer and watches when the Ansible task has a syntax error as described in the above comments for 0.15.1.
Check that the Ansible task which has the syntax error is NOT the task called by the finalizer.
The syntax error faced in your scenario came from the Ansible task which is trigged when you create the CR.
K8S Finalizers are actions (playbook/role) performed to allow delete the CR and then, when this playbook/role be finished with success the CR will be deleted.
Following the detailed explanation over your scenario.
Note that you have a watch such as:
---
- version: v1alpha1
group: example.com
kind: TestCR
role: testcr -> First task that will be called which has the syntax error
finalizer:
name: example.com
playbook: playbook.yaml -> Task that is called before the CR be removed which finishes successfully
When you apply the CR: The task with syntax error is called. So, if you check the logs you will see that it will re-trigger the reconcile forever since this error will not be fixed.
When you delete the CR: The task that has NOT a syntax error will be called. Since it will be accomplished with success it is NOT a blocker to delete the CR and then, it is deleted by the K8S API which is the expected behaviour.
In this way, I am closing this one as sorted out. However, please feel free to let us know if you think that has any reason to keep this open.
I'm confused, so were you able to reproduce the issue or not?
With regards to:
When you apply the CR: The task with syntax error is called. So, if you check the logs you will see that it will re-trigger the reconcile forever since this error will not be fixed.
When you delete the CR: The task that has NOT a syntax error will be called. Since it will be accomplished with success it is NOT a blocker to delete the CR and then, it is deleted by the K8S API which is the expected behaviour.
so you should have tested the other way around - as per the ticket title and my explanations, the issue was when deleting the CR, so the playbook that is called due to finalizer should have a syntax error. At least that was the case I opened this ticket for. I only suspected that when creating there might be same issue, but I never really tested if that's the case. From what you verified it looks like there's no issue in case syntax error during creation of the CR.
However could you please confirm you were able to reproduce the issue when doing CR delete and having syntax error in the finalizer linked playbook/role?
Hi @tomsucho,
I'm confused, so were you able to reproduce the issue or not?
See all steps and logs in the comment https://github.com/operator-framework/operator-sdk/issues/2546#issuecomment-585858355 which shows that the CR will NOT be deleted if a syntax error is faced in the Ansible playbook/role called by the Finalizer.
finalizer:
name: example.com
playbook: playbook.yaml -> Task that is called before the CR be removed which finishes successfully
Then, note the all logs and description provide by you matching exacly with the scenario described https://github.com/operator-framework/operator-sdk/issues/2546#issuecomment-586321185 which is NOT an error but the expected behaviour which was reproduced by me and I could see exactly the same logs.
so you should have tested the other way around - as per the ticket title and my explanations, the issue was when deleting the CR, so the playbook that is called due to finalizer should have a syntax error. At least that was the case I opened this ticket for. I only suspected that when creating there might be same issue, but I never really tested if that's the case. From what you verified it looks like there's no issue in case syntax error during creation of the CR.
It will be up to your rule business. However, the important information to keep in mind is that the finalizer has the purpose to perform operations before the CR be deleted. So, the CR will delete always when these actions are performed successfully. I mean, it has no relation with the result of the playbook/role which is called when the CR is created.
Please, let me know if the information provides attend your expectations and if we can close this one.
Reopening this as I'm very confused still how this should work.. So now got into a situation when the CR is not deleted even though my finalizer related playbook finished like this:
PLAY RECAP *********************************************************************
localhost : ok=18 changed=2 unreachable=0 failed=0 skipped=6 rescued=1 ignored=0
Could you please describe what controls when the CR will be deleted from the k8s with regards to playbook execution results? Up until this point I thought that any Ansible playbook fail (or syntax error) should prevent CR deletion and the retry loop should kick in. Upon successful playbook run it should be removed automatically (note I use manageStatus: false)
Hi @tomsucho,
Can you please raise a new issue with your new scenario and describe the steps we are able to reproduce in the order we are able to help you with better? If possible could you please let us know what change we should to do in the Memcached to face the same behaviour? Also, could you please provide the watch implementation made by you for we have a better understand?
I think it could happen because of one block task being "rescued"?
Now when I reworked it to not use "rescue" at all it was deleted. Last line of playbook log:
```
PLAY RECAP ***********************
localhost : ok=18 changed=2 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
Hi @tomsucho,
If the playbook and/or roled called in:
finalizer:
name: example.com
**playbook: playbook.yaml -> **Task that is called before the CR be removed which need finishes successfully to allow the CR to be deleted.**
finish successfully then, the CR will be deleted otherwise NOT.
So, if you are facing any scenario that you think that is not following this behaviour and/or you need help to understand could you please raise a new issue with the detailed information for we check your scenario and reproduce it and then, be able to properly help with?
Also, could you please let me know if you are doing as described in the Ansible doc in the task called in the finalizer in this new issue ok?
PS.: Please, ping me in the new issue that I will help you with.
Most helpful comment
So you have established that when the Ansible runs successfully but with an error, it behaves as you expect. But the fact that you had a syntax error is interesting. While a syntax error is the most likely culprit, I doubt it is the only way that it could fail.
There are two scenarios where
ansible-runner(ansible-playbookunder the covers) fails that should be considered:What concerns me is that it appears that the run is being treated as successful when the actual command fails.
The only question that remains for me is what should the expected behavior be when the ansible command fails? My guess is that the ansible controller _should_:
return reconcileResult, err