Are you asking about community best practices, how to implement a specific feature, or about general context and help around the operator-sdk?
Help around the operator-sdk
What did you do?
operator-sdk new visitors-frontend-operator --type=ansible --api-version=example.com/v1 --kind=VisitorsApp.operator-sdk run --local.What did you expect to see?
The CR resources allocated when I deploy my CR file, as when running the operator locally.
What did you see instead? Under which circumstances?
When I deploy the CR, nothing happens. These are the (verbose) logs of the ansible container of the operator pod.
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
/tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/artifacts/605394647632969758//stdout
ansible-playbook 2.9.5
config file = /etc/ansible/ansible.cfg
configured module search path = ['/usr/share/ansible/openshift']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
Using /etc/ansible/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory/hosts as it did not pass its verify_file() method
script declined parsing /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory/hosts as it did not pass its verify_file() method
auto declined parsing /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory/hosts as it did not pass its verify_file() method
Set default localhost to localhost
Parsed /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory/hosts inventory source with ini plugin
Loading callback plugin awx_display of type stdout, v2.0 from /usr/local/lib/python3.6/site-packages/ansible_runner/callbacks/awx_display.py
LAYBOOK: c8d4c1032cd84d38b06aef83dbe58ddd *************************************
Positional arguments: /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/project/c8d4c1032cd84d38b06aef83dbe58ddd
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('/tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory',)
extra_vars: ('@/tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/env/extravars',)
forks: 5
PLAYBOOK: c8d4c1032cd84d38b06aef83dbe58ddd *************************************
Positional arguments: /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/project/c8d4c1032cd84d38b06aef83dbe58ddd
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('/tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/inventory',)
extra_vars: ('@/tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/env/extravars',)
forks: 5
1 plays in /tmp/ansible-operator/runner/example.com/v1/VisitorsApp/myprj/visitorsapp/project/c8d4c1032cd84d38b06aef83dbe58ddd
[WARNING]: Found variable using reserved name: name
LAY [localhost] ***************************************************************
TASK [Gathering Facts] *********************************************************
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: 1000540000
<localhost> EXEC /bin/sh -c 'echo ~1000540000 && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /.ansible/tmp/ansible-tmp-1584107823.0948596-231277174841977 `" && echo ansible-tmp-1584107823.0948596-231277174841977="` echo /.ansible/tmp/ansible-tmp-1584107823.0948596-231277174841977 `" ) && sleep 0'
fatal: [localhost]: UNREACHABLE! => {
"changed": false,
"msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"` echo /.ansible/tmp/ansible-tmp-1584107823.0948596-231277174841977 `\" && echo ansible-tmp-1584107823.0948596-231277174841977=\"` echo /.ansible/tmp/ansible-tmp-1584107823.0948596-231277174841977 `\" ), exited with result 1, stderr output: mkdir: cannot create directory ‘/.ansible’: Permission denied\n",
"unreachable": true
}
I can confirm, by opening a remote shell inside the container, that the current user (that is not root) cannot create the /.ansible directory.
watches.yaml file
---
- version: v1
group: example.com
kind: VisitorsApp
role: /opt/ansible/roles/visitorsapp
build/Dockerfile
FROM quay.io/operator-framework/ansible-operator:v0.15.2
COPY watches.yaml ${HOME}/watches.yaml
COPY roles/ ${HOME}/roles/
Environment
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.2", GitCommit:"94e669a", GitTreeState:"clean", BuildDate:"2020-02-03T23:11:39Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
crc version:
crc version: 1.7.0+fa7e558
OpenShift version: 4.3.1 (embedded in binary)
oc version:
Client Version: v4.4.0
Server Version: 4.3.1
Kubernetes Version: v1.16.2
Additional context
Linux Fedora 31 amd64
Hi @daquinoaldo,
Did you build the project with operator-sdk buil <image:tag>? If not can you try and see if the error will change? If the resulting change please let us know and add here the logs if you start to face another issue.
Hi @camilamacedo86,
thank you for responding so quickly.
I build the image with operator-sdk build visitors-frontend-operator. Is the tag important?
Also, docker is installed via dnf, latest version of docker-ce.
I have also a visitors-backend-operator, implemented in Go, that is build and deployed in the same way, and it works.
Hi @daquinoaldo,
Thank you for let us know. I am unable to reproduce it with the memcached ansible sample and minikube. Could you share the repo link of your project? Can we try to run it?
@camilamacedo86
Sure, I've uploaded the code in my repository visitors-splitted.
You can find the working Go implementation in visitors-backend-operator and the Ansible implementation that caused me the problem in visitors-frontend-operator.
The namespace on OpenShift is myprj.
All the commands I used to run and deploy it are in the README.
I'm hitting this same issue on an unrelated in-development operator. I've just generated it and begun development using operator-sdk v0.16.0. I'm also using crc 1.7.0.
I'm really not familiar with the insides of operator-sdk or the ansible-operator image, but wanted to share what I see.
It looks like something is expecting $HOME to be /opt/ansible, but (and you can see this in the original logs, too) this is getting run as uid 1000540000, which has a $HOME of '/'; not as ansible-operator (uid 1001), which has a $HOME of '/opt/ansible'. The /opt/ansible/.ansible directory gets chmod'd intentionally during the image build:
FROM quay.io/operator-framework/ansible-operator:v0.16.0
COPY requirements.yml ${HOME}/requirements.yml
RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
&& chmod -R ug+rwx ${HOME}/.ansible
...
Not sure if this helps or is simply a long +1, but thank you for looking at this!
For whatever it is worth, and to temporarily unblock anyone else that runs across this issue while a proper fix is unavailable, I am able to work around the above reported error by simply adding "ln -s ${HOME}/.ansible /.ansible" to the end of the RUN line in my build/Dockerfile:
RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
&& chmod -R ug+rwx ${HOME}/.ansible \
&& ln -s ${HOME}/.ansible /.ansible
For what it's worth I saw this on a 4.3.1 cluster I was diagnosing for someone, but not on a 4.3.8 cluster I tried to reproduce it on. I also didn't see it on two other testing clusters running 4.3.8. I don't know why cluster version would come into play, but thought I would mention it.
Confirmed that this behavior disappears on crc 1.8.0 (OpenShift 4.3.8).
I wonder if this could be related to https://github.com/cri-o/cri-o/issues/3186
/cc @fabianvf
I am facing exactly the same issue at same step. I will probably look for Openshift 4.3.8 cluster. Can you tell me @jmontleon , where 4.3.8 cluster is available?
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
I landed here debugging a similar problem experimenting with the new release of operator-sdk 1.0.0
Only way I could find to get past the problem was to do the following:
Create ansible.cfg, based on the version already in the container, but add local_tmp and remote_tmp declarations:
[defaults]
roles_path = /opt/ansible/roles
library = /usr/share/ansible/openshift
local_tmp = /tmp/.ansible
remote_tmp = /tmp/.ansible
Update Dockerfile to put this new config file in the image:
RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
&& chmod -R ug+rwx ${HOME}/.ansible
# Workaround for https://github.com/operator-framework/operator-sdk/issues/2648
# Override detault tmp directory by setting local_tmp & remote_tmp = /tmp/.ansible
# instead of /.ansible/tmp, which does not exist
COPY ansible.cfg /etc/ansible/ansible.cfg
I tried to do the same fix from this comment, but docker build fails when trying to create /.ansible symlink.
I hit a similar problem in my testing inside a GitHub Actions CI environment and found the issue was the permissions on the files that were copied into the image during the image build. I had to change ownership on the files before running the build to work around this issue.
@durera
It doesn't work for me.
I created an Ansible conf file and I set the tmp dir.
[defaults]
local_tmp = /tmp/.ansible
remote_tmp = /tmp/.ansible
host_key_checking = false
But it seems that on Openshift the random user applied to the container is anyway pointing to /.ansible. And at some point Ansible tries to create a cp folder.
TASK [Register date from master server] ****************************************
fatal: [ocp4exp-master-0.ocp4exp.testing.local]: FAILED! => {"msg": "Unable to create local directories(/.ansible/cp): [Errno 13] Permission denied: b'/.ansible/cp'"}
/remove-lifecycle rotten
/priority important-soon
This bug is related to cri-o, and has been fixed in a more recent version that is used in recent versions of OCP.
Most helpful comment
Confirmed that this behavior disappears on crc 1.8.0 (OpenShift 4.3.8).