Awx: No instance found with the current cluster host id on AWS ECS

Created on 22 May 2018 · 12Comments · Source: ansible/awx

ISSUE TYPE

Bug Report

COMPONENT NAME

awx-task container

SUMMARY

I have installed awx on aws using ECS service provided by aws. It is running fine and I can see web console and login into it. But on the container running awx-task I am seeing following error in logs related to celery. Is this something which should happen on the first install or I might be doing something wrong. I have only one container instance running which is running 4 containers awx-task, awx-web, rabbitmq and memcached. Postgres is setup in RDS which is a database service provided by aws.

Please note that all containers being used are 3 weeks old downloaded from docker-hub so I have not done anything on them.

2018-05-22 09:49:01,317 INFO spawned: 'celery' with pid 28543
2018-05-22 09:49:02,318 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-22 09:49:02,741 INFO awx.main.tasks Syncing Schedules
2018-05-22 09:49:02,824 DEBUG awx.main.tasks Registering celery routes for celery@task
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/bin/celery", line 11, in
sys.exit(main())
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/__main__.py", line 30, in main
main()
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main
cmd.execute_from_commandline(argv)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 793, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/base.py", line 311, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 785, in handle_argv
return self.execute(command, argv)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 717, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv
return self(args, options)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/base.py", line 274, in __call__
ret = self.run(args, *kwargs)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run
state_db=self.node_format(state_db, hostname), *kwargs
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/worker/__init__.py", line 96, in __init__
self.on_before_init(kwargs)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/apps/worker.py", line 120, in on_before_init
conf=self.app.conf, options=kwargs,
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
response = receiver(signal=self, sender=sender, **named)
File "/usr/lib/python2.7/site-packages/awx/main/tasks.py", line 245, in handle_update_celery_routes
(changed, instance) = Instance.objects.get_or_register()
File "/usr/lib/python2.7/site-packages/awx/main/managers.py", line 106, in get_or_register
return (False, self.me())
File "/usr/lib/python2.7/site-packages/awx/main/managers.py", line 88, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2018-05-22 09:49:02,923 INFO exited: celery (exit status 1; not expected)
2018-05-22 09:49:03,926 INFO spawned: 'celery' with pid 28552
2018-05-22 09:49:04,926 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

ENVIRONMENT

AWX install method: amazon ecs using docker

bug

Source

shivam99aa

Most helpful comment

For anyone who might face same issue in future I was able to solve this problem by changing hostname of awx-task container to awx in task definition. This resolved this issue for me.

shivam99aa on 23 May 2018

👍4

All 12 comments

In my awx-task container I checked /etc/tower/settings.py file and in that CLUSTER_HOST_ID = "awx". Can someone please update how any instance is mapped against this host id or how should I do it.

Do I need to set hostname of my container instance to awx ?

shivam99aa on 22 May 2018

austinkurtz on 22 May 2018

It looks like you are not using our install playbook to set things up? If you're setting this up as a new deployment type then this isn't the appropriate place to raise an issue as this is for bugs directly related to things we support.

If you are setting this up as a new deployment type then you should follow along with what the setup playbook (in installer/) does and how it configures the various services.

I saw you asked the question in irc, which is a good place to talk about things but you left before anyone was able to answer. I would strongly recommend you look and see what the installer is doing when configuring the system.

matburt on 23 May 2018

No matburt I am not building images from scratch since they are already present on docker hub. I am setting it up on docker but not using docker-compose instead using ECS.

Anyhow thanks for pointing me to the right direction, I will check playbooks present in installer and try to replicate same.

shivam99aa on 23 May 2018

For anyone who might face same issue in future I was able to solve this problem by changing hostname of awx-task container to awx in task definition. This resolved this issue for me.

shivam99aa on 23 May 2018

👍4

I'm facing the same issue, but i use k8s, and the error message occurs after an upgrade, not sure if it's related but... I've checked my setting.py for the CLUSTER_HOST_ID , and value is awx

Can you provide more details @shivam99aa ?

jmnguye on 24 Sep 2018

@shivam99aa thanks friend. Worked for me too!

aka-cafu on 21 Mar 2019

@matburt Can you reopen this? 6.0.0 is unusable due to this issue, regardless of installation method. Besides the official installer does not do absolutely anything related to the cluster ID, that value is hard-coded into settings.py and appears to be unrelated to. First thing you see after applying migrations:

Creating instance group tower
(changed: True)
2019-07-17 10:33:06,536 INFO RPC interface 'supervisor' initialized
2019-07-17 10:33:06,536 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-07-17 10:33:06,536 INFO supervisord started with pid 174
2019-07-17 10:33:07,540 INFO spawned: 'awx-config-watcher' with pid 177
2019-07-17 10:33:07,542 INFO spawned: 'channels-worker' with pid 178
2019-07-17 10:33:07,544 INFO spawned: 'callback-receiver' with pid 179
2019-07-17 10:33:07,546 INFO spawned: 'dispatcher' with pid 180
READY
2019-07-17 10:33:08,649 INFO success: awx-config-watcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)2019-07-17 10:33:08,649 INFO success: channels-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:08,649 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:08,649 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:11,100 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:193
2019-07-17 10:33:11,100 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:193
2019-07-17 10:33:11,108 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:194
2019-07-17 10:33:11,108 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:194
2019-07-17 10:33:11,118 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:195
2019-07-17 10:33:11,118 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:195
2019-07-17 10:33:11,129 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:196
2019-07-17 10:33:11,129 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:196
Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 11, in <module>
    load_entry_point('awx==6.0.0.0', 'console_scripts', 'awx-manage')()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py", line 140, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 123, in handle
    reaper.reap()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py", line 36, in reap
    me = instance or Instance.objects.me()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 116, in me
    raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id

megakoresh on 17 Jul 2019

This is still an ongoing issue, even in 7.0.0. @shivam99aa where did you change the hostname? Can you please provide details to those of us still hitting this issue?

Clean install of 7.0.0 with external postgres throw this over and over in the awx_task docker logs:

Traceback (most recent call last): File "/usr/bin/awx-manage", line 11, in <module> load_entry_point('awx==7.0.0.0', 'console_scripts', 'awx-manage')() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py", line 142, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, **cmd_options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(*args, **options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 123, in handle reaper.reap() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py", line 36, in reap me = instance or Instance.objects.me() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 116, in me raise RuntimeError("No instance found with the current cluster host id")

Nascentes on 9 Oct 2019

I "change" the container hostname in docker compose file, follow the example bellow
https://raw.githubusercontent.com/aka-cafu/ecs/master/awx/docker-compose-ecs.yml

aka-cafu on 9 Oct 2019

This is likely a duplicate of https://github.com/ansible/awx/issues/4294

This should be resolved in the next major release of AWX, but in the meantime, you can try out this patch that landed in devel: https://github.com/ansible/awx/pull/4268

ryanpetrello on 9 Oct 2019

@aka-cafu - thank you. That did not fix it (as it was already defaulted to 'awx'), but I appreciate the clarification.

@ryanpetrello - Thank you! The patch you pointed me to for reaper.py resolved the issue! I was boggled as we really only have once instance. No k8 or anything. A fresh pull of 7.0.0 and the only modifications being external postgres vars in the inventory. Anyway, we are up and running now. Thank you, again.

Nascentes on 10 Oct 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings