Awx: splunk logging is not working with the default installation

Created on 2 Oct 2017  Â·  34Comments  Â·  Source: ansible/awx

ISSUE TYPE

  • Bug Report
COMPONENT NAME

  • API
  • UI
  • Installer
SUMMARY

After setting the splunk logging at the AWX UI, i am not able to see any logs at the splunk interface.

docker -f logs awx_web 

2017-10-02 08:48:39,088 WARNING  awx.api.generics status 500 received by user admin attempting to access /api/v2/settings/logging/test/ from 10.35.100.2
[pid: 443|app: 0|req: 391/8226] 10.35.100.2 () {52 vars in 2293 bytes} [Mon Oct  2 08:48:39 2017] POST /api/v2/settings/logging/test/ => generated 26 bytes in 95 msecs (HTTP/1.1 500) 7 headers in 216 bytes (1 switches on core 0)
10.35.100.2 - - [02/Oct/2017:08:48:39 +0000] "POST /api/v2/settings/logging/test/ HTTP/1.1" 500 37 "https://tower.app.foo.net/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" 
ENVIRONMENT
STEPS TO REPRODUCE

Below is my settings:

screen shot 2017-10-02 at 13 07 15

EXPECTED RESULTS

To see some output at the splunk search:

index=awx

ACTUAL RESULTS

No results

ADDITIONAL INFORMATION

Token is created for submitting logs. inventory outputs is as below:

localhost ansible_connection=local ansible_python_interpreter="/usr/bin/env python"
[all:vars]
use_container_for_build=true
awx_official=false
postgres_data_dir=/mnt/pgdocker
host_port=80
pg_username=awx
pg_password=xxx!
pg_database=awx
pg_port=5432
awx_secret_key=xxx
api installer ui needs_info bug

All 34 comments

A couple of things that might be helpful to try here:

  • I have only ever used HTTP, not TCP for Splunk, so I would suggest this
  • Is there anything notable to log? It might be that the logs are silent. To assure that some logs are emitted, I suggest editing the name of an arbitrary resource, like a project, to trigger the activity stream logger
  • The automatic reload is deployment-specific, and I doubt that it will fully work with the open source installer. After putting in these settings, I suggest restarting the system to make sure they have taken effect
  • After applying the settings and modifying a resource, the system logs might have information relevant to what went wrong in your particular situation

poke, can you provide any more information here?

There is no HTTP but HTTPs, TCP and UDP available for splunk. Tried HTTPs and it didint worked either. Never tried restarting the docker image.

Tried restarting the image also. Still getting errors when i click the test.

What kind of information can i share?

The test endpoint is supposed to be returning a fairly verbose message about what went wrong with the request (the request the server makes to your logging aggregator). If you haven't seen this, it could be that the UI just isn't passing it through. If you could either use the API browser to submit the data for the test, or if you could test in the UI and gather the (redacted, of course) request and response data, that should tell us either what configuration is causing trouble, or what message the server is responding.

Independently check collector

For Splunk, we always like to start with independent verification of the event receiver, from their docs:

http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/UsetheHTTPEventCollector

curl -k https://hec.example.com:8088/services/collector/event -H "Authorization: Splunk B5A79AAD-D822-46CC-80D1-819F80D7BFB0" -d '{"event": "hello world"}'
{"text": "Success", "code": 0}

If you have replicated this, great, that supports the expectation that AWX should be sending the message successfully with those exact values.

Configuration references

Additionally, if you have a look at the README.md in this folder:

https://github.com/ansible/awx/tree/devel/tools/docker-compose

You can see request data that is verified to work and completely reproducible with the container-based logstash we have in a docker-compose target.

Splunk is only slightly different. The way they do the port is also different, we should be replacing the port value in the URL with the port you give, but it's easier to not muck with that, and it'll just use the URL as-is.

{
    "LOG_AGGREGATOR_HOST": "https://hec.example.com:8088/services/collector/event",
    "LOG_AGGREGATOR_PORT": null,
    "LOG_AGGREGATOR_TYPE": "splunk",
    "LOG_AGGREGATOR_USERNAME": "",
    "LOG_AGGREGATOR_PASSWORD": "B5A79AAD-D822-46CC-80D1-819F80D7BFB0",
    "LOG_AGGREGATOR_LOGGERS": [
        "awx",
        "activity_stream",
        "job_events",
        "system_tracking"
    ],
    "LOG_AGGREGATOR_INDIVIDUAL_FACTS": false,
    "LOG_AGGREGATOR_ENABLED": true,
    "LOG_AGGREGATOR_LEVEL": "INFO"
}

(URL and token is from Splunk example, not meant to be used)

In addition to this, you might want to try toggling protocol, verify cert parameters. But once you confirm the endpoint is accepting messages with your token, and then see the test response, you should know what type of error the server is responding with.

# curl -k http://10.35.75.26:8088/services/collector/event -H "Authorization: Splunk 759D1202-BD2A-4F09-B74A-0328B32D66FE" -d '{"event": "hello world"}'
{"text":"Success","code":0}

I found the problem:

Changing logging aggregator part to http://10.35.75.26:8088/services/collector/event worked. Before i was just writing the ip address. Test successful now. But still now able to see the template run outputs at the splunk, though i see the test output.

I am choosing job template log level as normal and log aggregator level as info. Tried different options also. None of them helped me see the output log at the splunk.

Can it be because of my project's ansible.cfg has the below lines?

[defaults]
host_key_checking = False
stdout_callback = json

No, I don't think so. When AWX runs a job, it will set environment variables that should override those anyway.

Given that you have the logging test working, then I would investigate the logs during the container startup (to see if something obviously failed applying the logging configuration), and the logs during an action that is expected to be emitting a log. If both of these are silent, while you have the config that should work saved to the database, then that makes this tricky, and there could be something specific to this particular install method that I haven't thought of.

I updated the git repo and reinstall again. Without changing connection methot from https but just adding the url address now i can see awx activiy logs.
screen shot 2017-10-17 at 08 55 32

But still now able to see the job template logs. Is this normal? Should i usee some callback to send job logs to splunk?

Yes, the configuration you posted was obviously enabling the above loggers in addition to the job_events. These are emitted as you run a job on a task-by-task basis, containing task-related data. Actually, if that's the configuration you used, the DEBUG messages from celery beat shouldn't be showing up either. Something is likely wrong related to the container setup combined with the logging configuration. I don't have any good ideas right now, because both beat (which you are getting messages from) and the callback receiver are running under supervisor in the same container.

We may be setting up something incorrectly for base logging here? https://github.com/ansible/awx/blob/devel/installer/image_build/files/settings.py

^ I think that LOGGING['loggers']['awx']['handlers'] = ['console'] should lead to a conflict. For the awx logger, we leave in a NullHandler specifically for the purpose of swapping out for the external logger, depending on what the DB settings are on startup. Since the awx logger only has the console handler there, _something_ probably won't work. That should cause the beat messages and other task system logs to be dropped, but shouldn't affect the specifically enumerated analytics loggers, like job_events. That's the opposite of what was observed.

Also experiencing this with Logstash, there are no logs transfered. Only testmessages get send correctly.
@matburt i also suspect some initial logging settings because the AWX container logged to stdout and not to the files shown in the tower documentation.

Best regards
Flo

So my settings are following on a stock Container setup straight from Docker Hub:

"LOG_AGGREGATOR_HOST": "172.17.0.1",
    "LOG_AGGREGATOR_PORT": 5000,
    "LOG_AGGREGATOR_TYPE": "logstash",
    "LOG_AGGREGATOR_USERNAME": "",
    "LOG_AGGREGATOR_PASSWORD": "",
    "LOG_AGGREGATOR_LOGGERS": [
        "awx",
        "activity_stream",
        "job_events",
        "system_tracking"
    ],
    "LOG_AGGREGATOR_INDIVIDUAL_FACTS": true,
    "LOG_AGGREGATOR_ENABLED": true,
    "LOG_AGGREGATOR_TOWER_UUID": "",
    "LOG_AGGREGATOR_PROTOCOL": "tcp",
    "LOG_AGGREGATOR_TCP_TIMEOUT": 5,
    "LOG_AGGREGATOR_VERIFY_CERT": true,
    "LOG_AGGREGATOR_LEVEL": "INFO",

And i can confirm the Logs only show up via the Docker Logs e.g docker logs -f awx_task.
Furthermore when using the docker-latest of RHEL, all logs get smashed together in /var/log/messages so i need external logging : ).
AFAIK the external logging system seems not to be functioning at all.
@AlanCoding do you have a recommendation how to Debug this issue propperly ?

I already tried to modify the /etc/tower/settings.py on both containers. But after restarting them i could not abserve any change in the Logging behaviour.

The thing that I would be hoping for when debugging this is for you to hit this log message:

https://github.com/ansible/awx/blob/0579db1162146d65a8cc3aed19da76e81b149ae6/awx/main/utils/handlers.py#L87

In the event of connection errors, it will log (just once) to the local log files the traceback related to the connection error. In the absence of seeing that, there's little that I can speculate about. The only avenue is to start replicating from the ground-up with the open-source installer, tracing the code that applies the logger according to the database settings, which is pretty non-trivial.

@AlanCoding I think the Logger works fine. Didnt see that exception up on Container creation nor later in the usage of AWX. In my Opinion, the Code for external Logging works because

https://github.com/ansible/awx/blob/0579db1162146d65a8cc3aed19da76e81b149ae6/awx/main/utils/handlers.py#L208

gives a valid entry in Logstash. So i suspect the configuration of the complete Log Setup to be messed up, not the actual logging mechanism.

So how about these two files ?

https://github.com/ansible/awx/blob/bfea00f6dc6af0fb01057ce38e9d0337e6c589aa/awx/settings/development.py#L22-L28

and

https://github.com/ansible/awx/blob/3118092e51ff037bbccd48b22cf8dabfda53516b/awx/settings/production.py#L53-L59

Do the containers accentently use the development.py settings instead of the production.py values ?

I think the local settings is awx/installer/image_build/files/settings.py put there by the Dockerfile. In the open source installer, I think that it should still use production settings, so if I'm correct, the final settings are combination of defaults.py, production.py and settings.py. The final settings.py get the final say in the logging settings. These will interact with the dynamic logger setup here (using the values from the database which you save in the UI / API browser):

https://github.com/ansible/awx/blob/8666512d994bf9d81061d8df4955604cb66db57e/awx/main/utils/handlers.py#L341

But nothing is jumping out at me right now. Even though the handler is set to console, it should append the new handler for the external logger to that logger. This might inconsistently happen between celery and the web server containers, I really don't know.

I think I am experiencing the same/similar issue. I can see test log messages in Splunk and also messages that look like they are coming from the main awx service_logs but I don't think I'm seeing anything from activity_stream, job_events or system_tracking.

Not sure if it's relevant but when I click "REVERT" on the "LOGGERS SENDING DATA TO LOG AGGREGATOR FORM" field it removes all the spaces. Then if I click save it just hangs with the "Working" cog spinning.

I am running version AWX 1.0.1.153 and Ansible 2.4.1.0

Any ideas? Thanks

^ I could not replicate the UI thing on that specific "revert" button, but clicking around other places I got it to hang.

I saw this in the JS logs:

angular.js:14794 TypeError: input.replace is not a function
    at Object.listToArray (configurationUtils.service.js:31)
    at configuration.controller.js:414
    at arrayEach (index.js:1361)
    at Function.<anonymous> (index.js:3405)
    at getFormPayload (configuration.controller.js:390)
    at exports.default.formSave (configuration.controller.js:442)
    at fn (eval at compile (angular.js:15645), <anonymous>:4:164)
    at callback (angular.js:27451)
    at Scope.$eval (angular.js:18521)
    at Scope.$apply (angular.js:18620)

This should be a new UI issue. Are these logs the same as what you can see?

Sorry, I'm fairly new to AWX. Where would I find those logs.

I have also noticed that I am getting debug messages through to Splunk even though AWX is only configured for INFO messages.

I had changed the level to DEBUG as part of my testing but I have changed it back through the GUI.

{ [-]
@timestamp: 2017-12-12T19:22:21.687Z
cluster_host_id: awx
host: awx
level: DEBUG
logger_name: awx.main.tasks
message: Cluster node heartbeat task.
type: splunk
}

The JavaScript logs like the one I pasted can be recorded in Chrome by right clicking -> inspect -> Console

Here's what I see...

Failed to load resource: the server responded with a status of 400 (BAD REQUEST)
vendor.4c1b9c38e5c8ab72ad49.js:436 TypeError: input.replace is not a function
    at Object.listToArray (app.b6a5e3c7df10a96d5ccf.js:403)
    at app.b6a5e3c7df10a96d5ccf.js:403
    at arrayEach (app.b6a5e3c7df10a96d5ccf.js:29)
    at Function.<anonymous> (app.b6a5e3c7df10a96d5ccf.js:29)
    at getFormPayload (app.b6a5e3c7df10a96d5ccf.js:403)
    at exports.default.formSave (app.b6a5e3c7df10a96d5ccf.js:403)
    at fn (eval at compile (vendor.4c1b9c38e5c8ab72ad49.js:436), <anonymous>:4:164)
    at callback (vendor.4c1b9c38e5c8ab72ad49.js:436)
    at Scope.$eval (vendor.4c1b9c38e5c8ab72ad49.js:436)
    at Scope.$apply (vendor.4c1b9c38e5c8ab72ad49.js:436)
(anonymous) @ vendor.4c1b9c38e5c8ab72ad49.js:436

Mine is working with luck.

​

On Tue, Dec 12, 2017 at 11:51 PM, mattelphick notifications@github.com
wrote:

Here's what I see...

Failed to load resource: the server responded with a status of 400 (BAD REQUEST)
vendor.4c1b9c38e5c8ab72ad49.js:436 TypeError: input.replace is not a function
at Object.listToArray (app.b6a5e3c7df10a96d5ccf.js:403)
at app.b6a5e3c7df10a96d5ccf.js:403
at arrayEach (app.b6a5e3c7df10a96d5ccf.js:29)
at Function. (app.b6a5e3c7df10a96d5ccf.js:29)
at getFormPayload (app.b6a5e3c7df10a96d5ccf.js:403)
at exports.default.formSave (app.b6a5e3c7df10a96d5ccf.js:403)
at fn (eval at compile (vendor.4c1b9c38e5c8ab72ad49.js:436), :4:164)
at callback (vendor.4c1b9c38e5c8ab72ad49.js:436)
at Scope.$eval (vendor.4c1b9c38e5c8ab72ad49.js:436)
at Scope.$apply (vendor.4c1b9c38e5c8ab72ad49.js:436)
(anonymous) @ vendor.4c1b9c38e5c8ab72ad49.js:436

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/ansible/awx/issues/335#issuecomment-351190160, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAF89BTeMJnKPzuT2tqQCixSAcRxAFkUks5s_udMgaJpZM4PqcFU
.

--
Oğuz Yarımtepe
http://about.me/oguzy

Does anyone have any further insight into what the issue might be? Is there an underlying config file that I can check to see what's happening? Thanks.

Hi Together,

Not sure if it's relevant but when I click "REVERT" on the "LOGGERS SENDING DATA TO LOG AGGREGATOR FORM" field it removes all the spaces. Then if I click save it just hangs with the "Working" cog spinning.

I also experience this. AWX 1.0.1.240

image

When clicking revert:

image

When clicking revert it requests:
https://awx.xxxx.com/api/v2/settings/all/
and in the respose there are the aggregators
"LOG_AGGREGATOR_LOGGERS":["awx","activity_stream","job_events","system_tracking"]

When Saving there is this JS error. And the Page hangs with a loading wheel.

vendor.de7dd9b7b75a5ad9b8e5.js:436 TypeError: input.replace is not a function
    at Object.listToArray (app.26612aa730614b7b6215.js:403)
    at app.26612aa730614b7b6215.js:403
    at arrayEach (app.26612aa730614b7b6215.js:29)
    at Function.<anonymous> (app.26612aa730614b7b6215.js:29)
    at getFormPayload (app.26612aa730614b7b6215.js:403)
    at exports.default.formSave (app.26612aa730614b7b6215.js:403)
    at fn (eval at compile (vendor.de7dd9b7b75a5ad9b8e5.js:436), <anonymous>:4:164)
    at callback (vendor.de7dd9b7b75a5ad9b8e5.js:436)
    at Scope.$eval (vendor.de7dd9b7b75a5ad9b8e5.js:436)
    at Scope.$apply (vendor.de7dd9b7b75a5ad9b8e5.js:436)
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
$apply @ vendor.de7dd9b7b75a5ad9b8e5.js:436
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
dispatch @ vendor.de7dd9b7b75a5ad9b8e5.js:29
elemData.handle @ vendor.de7dd9b7b75a5ad9b8e5.js:29

But i don´t suspect this to be the actual logging problem where the external logging mechanism is not functional. The Streams are correctly set as you can see from https://awx.xxxx.com/api/v2/settings/all/. This issue is just a UI Problem. Maybe it parses the Json response wrong and inserts it without spaces.


For a Productive environment a functional logging mechanism is a important feature because with a central automation system one have to comply to company regulations which force actions to be traceable to each administrator/person. So it is important for us to get this working and i am willing to invest some time in this issue.

I think I am experiencing the same/similar issue. I can see test log messages in Splunk and also messages that look like they are coming from the main awx service_logs but I don't think I'm seeing anything from activity_stream, job_events or system_tracking.

what you described here is also the case with an ELK stack as external logger. And thats the real point of this issue. As far as external logging goes it seems to not function regardless of the external logging software. So @AlanCoding how could we advance this issue. We need some info how logging works in the docker environment or should work. What can we do to debug this, as it seems to be easy to reproduce?

Best regards from Germany
Flo

Quite frankly, my external logger now seems to work. I did an update to 1.0.2 , still the external logger was not working and only printing my Test calls. Than i flipped the External Logger Switch:
image
ON->OFF->ON

Strange 😃 , @oguzy & @mattelphick can you confirm this for Splunk ?

I am still on 1.0.1.153. I toggled the External Logger Switch and clicked save. However I don't think I am seeing the logs for "job_events" still.

That said, having never seen what successful logging looks like maybe I am interpreting the output incorrectly.

I do see a message that looks like this...

search___splunk_7_0_0

...but I am not sure which logger generates this message.

What I am hoping to see from the "job_events" logger is more detail like this...

ansible_awx___623_-_confluence_netbox_device_types

Am I misinterpreting things? What sort of detail should I see from the "job_events" logger?

Thanks

Can anyone help with the question above? Thanks

We have upgraded to version 1.0.2.303. However I am still not clear on whether the loggers are sending data externally.

This is the list of logger_names that I see. How do the logger_names in the log output relate to the four loggers in the configuration section (awx, activity_stream, job_events, system_tracking)?

image

Hi @mattelphick

I can look it up at the company on Monday but as far as i remember i have simmilar logs in ELK. Mainly it was important for us to log changes by users on templates, projects etc. which gets correctly written to the external logger aswell as what the specific user changed. I dont know if the actual output of a playbook gets logged. You can search in your logs for User, there should be log entrys for each changed object, what was changed and by whom. I can only confirm that i get those logs for sure. But as far as i can see the logging itself seems to work. You get more than just test messages in splunk.

I also dont know how the log chanels in the settings relate to the log tags nor does the documentation of ansible tower tell some more infos about these other than:

https://docs.ansible.com/ansible-tower/latest/html/administration/logging.html#set-up-logging-with-tower

Loggers to Send Data to the Log Aggregator Form: All four types of data are pre-populated by default. Click the tooltip help icon next to the field for additional information on each data type. Delete the data types you do not want.

Best Regards
Flo

No comment in 3 months, closing. Please provide more detail if you have it.

Was this page helpful?
0 / 5 - 0 ratings