Awx: Upgrade from 9.0.1 to 9.1.0 breaks the system

Created on 17 Dec 2019  路  14Comments  路  Source: ansible/awx

ISSUE TYPE
  • Bug Report
SUMMARY

Upgrading from 9.0.1 to 9.1.0 broke the upgrade

ENVIRONMENT
  • AWX version: 9.0.1 to 9.1.0
  • AWX install method: k8s
STEPS TO REPRODUCE

Upgrade from 9.0.1 to 9.1.0 using the ansible-playbook installer.

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedColumn: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
                                                             ^
HINT:  Perhaps you meant to reference the column "main_projectupdate.job_type".


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
    result = self.run_callable(body)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
    return _call(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/tasks.py", line 19, in run_task_manager
    TaskManager().schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 643, in schedule
    self._schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 605, in _schedule
    all_sorted_tasks = self.get_tasks()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 69, in get_tasks
    project_updates = [p for p in ProjectUpdate.objects.filter(status__in=status_list, job_type='check').prefetch_related('instance_group')]
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/polymorphic/query.py", line 56, in _polymorphic_iterator
    o = next(base_iter)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
    cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
EXPECTED RESULTS

Upgrade to finish successfully and without an issue.

ACTUAL RESULTS

System broke.

ADDITIONAL INFORMATION
api high bug

Most helpful comment

Yes I did.

I found a way to fix it though. After spinning the whole cluster up. I used

$ docker exec -it <task> bash
$ awx-manage migrate

After this everything was good.

I don't deploy on OpenShift but use selfhosted K8S and I noticed that the database awx is not in the ownership of awx but of postgres user. All the tables are in the ownership of awx except the database itself.

A new deployment failed because it's trying to call the endpoint for version on the Get Kubernetes API version with

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

I temporary modified the playbook to the snippet bellow instead of the original

- name: Get Kubernetes API version
  command: |
    {{ kubectl_or_oc }} version -o json
  register: kube_version

- name: Extract server version from command output
  set_fact:
    kube_api_version: "{{ (kube_version.stdout | from_json).serverVersion.gitVersion[1:] }}"

After that it just hangs on TASK [kubernetes : Migrate database], and I cannot continue

I don't have this problem when running the installation for 9.0.1

All 14 comments

Can you confirm that you followed the steps here: https://github.com/ansible/awx/blob/devel/INSTALL.md#upgrading-from-previous-versions

I am unable to reproduce.

In the installer logs, under Migrate database, you should see something like:

Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
  Applying main.0099_v361_license_cleanup... OK
  Applying main.0100_v370_projectupdate_job_tags... OK

Yes I did.

I found a way to fix it though. After spinning the whole cluster up. I used

$ docker exec -it <task> bash
$ awx-manage migrate

After this everything was good.

I don't deploy on OpenShift but use selfhosted K8S and I noticed that the database awx is not in the ownership of awx but of postgres user. All the tables are in the ownership of awx except the database itself.

A new deployment failed because it's trying to call the endpoint for version on the Get Kubernetes API version with

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

I temporary modified the playbook to the snippet bellow instead of the original

- name: Get Kubernetes API version
  command: |
    {{ kubectl_or_oc }} version -o json
  register: kube_version

- name: Extract server version from command output
  set_fact:
    kube_api_version: "{{ (kube_version.stdout | from_json).serverVersion.gitVersion[1:] }}"

After that it just hangs on TASK [kubernetes : Migrate database], and I cannot continue

I don't have this problem when running the installation for 9.0.1

I've the exactly same problem performing the upgrade. The just hangs on:
TASK [kubernetes : Migrate database]

Running on Openshift and I'm having the same problem too.
Playbook hanging on Migrate Database.

I've tried to perform the migration manually through the management pod but it just hangs.
Any solution / hotfix for this problem?

Running On Openshift as well and I'm having the same problem too. Same issue as above, playbooks stops on Migration Database.

Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
result = self.run_callable(body)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
return _call(args, *kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/tasks.py", line 19, in run_task_manager
TaskManager().schedule()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 643, in schedule
self._schedule()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 605, in _schedule
all_sorted_tasks = self.get_tasks()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 69, in get_tasks
project_updates = [p for p in ProjectUpdate.objects.filter(status__in=status_list, job_type='check').prefetch_related('instance_group')]
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
self._fetch_all()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/polymorphic/query.py", line 56, in _polymorphic_iterator
o = next(base_iter)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
^
HINT: Perhaps you meant to reference the column "main_projectupdate.job_type".

I ran into this problem on vanilla k8s as well, and I suspect the changes from https://github.com/ansible/awx/pull/5239 introduced this issue.
Manually running kubectl -n awx exec -it ansible-tower-management -- bash -c "awx-manage migrate -v 3" produces this output and then hangs:

Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
~~snip~~
Running pre-migrate handlers for application main
2019-12-26 21:26:36,531 DEBUG    awx.main.dispatch publish awx.main.tasks.set_migration_flag(c689cb1b-c7da-46cb-bdc0-f6c90c627ae0, queue=tower_broadcast_all)

IIUC, it's trying to send out a message to rabbitmq before the migration starts (as per here and here), but the pod is configured to try to connect to localhost, which doesn't play nice on kubernetes since the pods have separate network namespaces.

There are two workarounds that I tested:

  • Once awx-0 is running (even if it's not fully functional), you can run the same command through the awx-web container, which does have access to rabbit on localhost:
    kubectl -n awx exec -it awx-0 -c awx-web -- bash -c "awx-manage migrate --noinput". Then you can comment out the Migrate database task in installer/roles/kubernetes/tasks/main.yml to make the playbook functional.
  • Edit the aforementioned credentials.py.j2 to change the rabbit hostname from localhost to rabbitmq.{{ kubernetes_namespace }}.svc, so that k8s's DNS can magically route it to the right IP, and re-run the installation playbook.

The latter method is probably the cleanest and should work long-term in most setups, if people are comfortable with it I can submit a PR.

I was able successfully upgrade from 9.0.1 to 9.1.0. I used second method.Thank you for posting workaround.

@ilijamt @smuth4 @wbieniek (and others):

We think a recent change in AWX caused this issue. We're about to roll back the change here: https://github.com/ansible/awx/pull/5579

Any of you interested in giving this a try?

Testing this w/ downstream tower openshift upgrades -- will update on progress soon

This is now working downstream w/ openshift upgrades which were experiencing same issue as awx upgrade. Going to close, but any comments from @ilijamt @smuth4 or @wbieniek are welcome for awx upgrades to devel

@ryanpetrello @kdelee I too just ran into this issue upgrading 9.0.1 to 9.1.0 on k8s. The fix you mention in your comment relating to #5579 will that fix be in forthcoming AWX release?

Thanks!

9.1.1 will be released sometime within the next day or so.

Yes I did.

I found a way to fix it though. After spinning the whole cluster up. I used

$ docker exec -it <task> bash
$ awx-manage migrate

After this everything was good.

I don't deploy on OpenShift but use selfhosted K8S and I noticed that the database awx is not in the ownership of awx but of postgres user. All the tables are in the ownership of awx except the database itself.

A new deployment failed because it's trying to call the endpoint for version on the Get Kubernetes API version with

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

I temporary modified the playbook to the snippet bellow instead of the original

- name: Get Kubernetes API version
  command: |
    {{ kubectl_or_oc }} version -o json
  register: kube_version

- name: Extract server version from command output
  set_fact:
    kube_api_version: "{{ (kube_version.stdout | from_json).serverVersion.gitVersion[1:] }}"

After that it just hangs on TASK [kubernetes : Migrate database], and I cannot continue

I don't have this problem when running the installation for 9.0.1

awx: 13.0.0
docker: 18.06.03-ce
First install 锛実ot this error
psycopg2.errors.UndefinedTable: relation "main_instance" does not exist
Do these resolved.

docker exec -it awx_task bash
awx-manage migrate
Was this page helpful?
0 / 5 - 0 ratings