Awx: No instance found with the current cluster host id on AWS ECS

Created on 22 May 2018  路  12Comments  路  Source: ansible/awx

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • awx-task container
SUMMARY

I have installed awx on aws using ECS service provided by aws. It is running fine and I can see web console and login into it. But on the container running awx-task I am seeing following error in logs related to celery. Is this something which should happen on the first install or I might be doing something wrong. I have only one container instance running which is running 4 containers awx-task, awx-web, rabbitmq and memcached. Postgres is setup in RDS which is a database service provided by aws.

Please note that all containers being used are 3 weeks old downloaded from docker-hub so I have not done anything on them.

2018-05-22 09:49:01,317 INFO spawned: 'celery' with pid 28543
2018-05-22 09:49:02,318 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-22 09:49:02,741 INFO awx.main.tasks Syncing Schedules
2018-05-22 09:49:02,824 DEBUG awx.main.tasks Registering celery routes for celery@task
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/bin/celery", line 11, in
sys.exit(main())
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/__main__.py", line 30, in main
main()
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main
cmd.execute_from_commandline(argv)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 793, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/base.py", line 311, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 785, in handle_argv
return self.execute(command, argv)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/celery.py", line 717, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv
return self(args, options)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/base.py", line 274, in __call__
ret = self.run(
args, *kwargs)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run
state_db=self.node_format(state_db, hostname), *
kwargs
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/worker/__init__.py", line 96, in __init__
self.on_before_init(
kwargs)
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/apps/worker.py", line 120, in on_before_init
conf=self.app.conf, options=kwargs,
File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
response = receiver(signal=self, sender=sender, **named)
File "/usr/lib/python2.7/site-packages/awx/main/tasks.py", line 245, in handle_update_celery_routes
(changed, instance) = Instance.objects.get_or_register()
File "/usr/lib/python2.7/site-packages/awx/main/managers.py", line 106, in get_or_register
return (False, self.me())
File "/usr/lib/python2.7/site-packages/awx/main/managers.py", line 88, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2018-05-22 09:49:02,923 INFO exited: celery (exit status 1; not expected)
2018-05-22 09:49:03,926 INFO spawned: 'celery' with pid 28552
2018-05-22 09:49:04,926 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

ENVIRONMENT
  • AWX install method: amazon ecs using docker
bug

Most helpful comment

For anyone who might face same issue in future I was able to solve this problem by changing hostname of awx-task container to awx in task definition. This resolved this issue for me.

All 12 comments

In my awx-task container I checked /etc/tower/settings.py file and in that CLUSTER_HOST_ID = "awx". Can someone please update how any instance is mapped against this host id or how should I do it.

Do I need to set hostname of my container instance to awx ?

+1

It looks like you are not using our install playbook to set things up? If you're setting this up as a new deployment type then this isn't the appropriate place to raise an issue as this is for bugs directly related to things we support.

If you are setting this up as a new deployment type then you should follow along with what the setup playbook (in installer/) does and how it configures the various services.

I saw you asked the question in irc, which is a good place to talk about things but you left before anyone was able to answer. I would strongly recommend you look and see what the installer is doing when configuring the system.

No matburt I am not building images from scratch since they are already present on docker hub. I am setting it up on docker but not using docker-compose instead using ECS.

Anyhow thanks for pointing me to the right direction, I will check playbooks present in installer and try to replicate same.

For anyone who might face same issue in future I was able to solve this problem by changing hostname of awx-task container to awx in task definition. This resolved this issue for me.

I'm facing the same issue, but i use k8s, and the error message occurs after an upgrade, not sure if it's related but... I've checked my setting.py for the CLUSTER_HOST_ID , and value is awx

Can you provide more details @shivam99aa ?

@shivam99aa thanks friend. Worked for me too!

@matburt Can you reopen this? 6.0.0 is unusable due to this issue, regardless of installation method. Besides the official installer does not do absolutely anything related to the cluster ID, that value is hard-coded into settings.py and appears to be unrelated to. First thing you see after applying migrations:

Creating instance group tower
(changed: True)
2019-07-17 10:33:06,536 INFO RPC interface 'supervisor' initialized
2019-07-17 10:33:06,536 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-07-17 10:33:06,536 INFO supervisord started with pid 174
2019-07-17 10:33:07,540 INFO spawned: 'awx-config-watcher' with pid 177
2019-07-17 10:33:07,542 INFO spawned: 'channels-worker' with pid 178
2019-07-17 10:33:07,544 INFO spawned: 'callback-receiver' with pid 179
2019-07-17 10:33:07,546 INFO spawned: 'dispatcher' with pid 180
READY
2019-07-17 10:33:08,649 INFO success: awx-config-watcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)2019-07-17 10:33:08,649 INFO success: channels-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:08,649 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:08,649 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-07-17 10:33:11,100 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:193
2019-07-17 10:33:11,100 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:193
2019-07-17 10:33:11,108 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:194
2019-07-17 10:33:11,108 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:194
2019-07-17 10:33:11,118 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:195
2019-07-17 10:33:11,118 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:195
2019-07-17 10:33:11,129 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:196
2019-07-17 10:33:11,129 WARNING  awx.main.commands.run_callback_receiver scaling up worker pid:196
Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 11, in <module>
    load_entry_point('awx==6.0.0.0', 'console_scripts', 'awx-manage')()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py", line 140, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 123, in handle
    reaper.reap()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py", line 36, in reap
    me = instance or Instance.objects.me()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 116, in me
    raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id

This is still an ongoing issue, even in 7.0.0. @shivam99aa where did you change the hostname? Can you please provide details to those of us still hitting this issue?

Clean install of 7.0.0 with external postgres throw this over and over in the awx_task docker logs:

Traceback (most recent call last): File "/usr/bin/awx-manage", line 11, in <module> load_entry_point('awx==7.0.0.0', 'console_scripts', 'awx-manage')() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py", line 142, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, **cmd_options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(*args, **options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 123, in handle reaper.reap() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py", line 36, in reap me = instance or Instance.objects.me() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 116, in me raise RuntimeError("No instance found with the current cluster host id")

I "change" the container hostname in docker compose file, follow the example bellow
https://raw.githubusercontent.com/aka-cafu/ecs/master/awx/docker-compose-ecs.yml

This is likely a duplicate of https://github.com/ansible/awx/issues/4294

This should be resolved in the next major release of AWX, but in the meantime, you can try out this patch that landed in devel: https://github.com/ansible/awx/pull/4268

@aka-cafu - thank you. That did not fix it (as it was already defaulted to 'awx'), but I appreciate the clarification.

@ryanpetrello - Thank you! The patch you pointed me to for reaper.py resolved the issue! I was boggled as we really only have once instance. No k8 or anything. A fresh pull of 7.0.0 and the only modifications being external postgres vars in the inventory. Anyway, we are up and running now. Thank you, again.

Was this page helpful?
0 / 5 - 0 ratings