NGINX Ingress controller version:
0.9.0-beta.15
Kubernetes version (use kubectl version):
v1.7.4
Environment:
uname -a): Linux x86_64What happened:
I want to index logs in elasticsearch using fluentd. I have installed the daemonset using coreos example:
https://coreos.com/tectonic/docs/latest/admin/logging.html
I then want to customise the logs for nginx controller to break up into appropriate fields.
I seen fluentd has an nginx parser. Following the example here https://coreos.com/tectonic/docs/latest/admin/logging-customization.html I added into extra.conf
extra.conf: |
# Example filter that adds an extra field "cluster_name" to all log
# messages:
# <filter **>
# @type record_transformer
# <record>
# cluster_name "your_cluster_name"
# </record>
# </filter>
<filter kube.ingress-nginx.nginx-ingress-controller>
@type parser
# Fluentd provides a few built-in formats for popular and common formats such as "apache" and "json".
format nginx
key_name log
# Retain the original "log" field after parsing out the data.
reserve_data true
# The access logs and error logs are interleaved with each other and have
# different formats, so ignore parse errors, as they're expected
suppress_parse_error_log true
</filter>
What you expected to happen:
I was expecting in kibana to see the newly created fields in document for nginx logs.
How to reproduce it (as minimally and precisely as possible):
Create cluster with nginx as ingress and add logging (based on guide above)
Anything else we need to know:
Anyone have sample fluentd conf for nginx ingress controller?
Thanks,
Shane.
Or instead of nginx parser, do I need to define a custom format for specific log format located at https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/log-format.md?
@shavo007 You should parse your own log format, as NGINX Ingress Controller contain additional fields, as the used Upstream and the namespace.
Take a look here
yep, i seen that @rikatz
with default parser with nginx, it does not parse correctly (example below)
http://fluentular.herokuapp.com/parse?regexp=%5E%28%3F%3Cremote%3E%5B%5E+%5D%29+%28%3F%3Chost%3E%5B%5E+%5D%29+%28%3F%3Cuser%3E%5B%5E+%5D%29+%5C%5B%28%3F%3Ctime%3E%5B%5E%5C%5D%5D%29%5C%5D+%22%28%3F%3Cmethod%3E%5CS%2B%29%28%3F%3A+%2B%28%3F%3Cpath%3E%5B%5E%5C%22%5D%29+%2B%5CS%29%3F%22+%28%3F%3Ccode%3E%5B%5E+%5D%29+%28%3F%3Csize%3E%5B%5E+%5D%29%28%3F%3A+%22%28%3F%3Creferer%3E%5B%5E%5C%22%5D%29%22+%22%28%3F%3Cagent%3E%5B%5E%5C%22%5D%29%22%29%3F%24&input=192.168.196.97+-+%5B192.168.196.97%5D+-+redflex+%5B10%2FNov%2F2017%3A00%3A12%3A42+%2B0000%5D+%22GET+%2Fapi%2Fv1%2Flogin%2Fstatus+HTTP%2F1.1%22+200+92+%22https%3A%2F%2Fdashboard.rts.onl%2F%22+%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+Win64%3B+x64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F61.0.3163.100+Safari%2F537.36%22+550+0.002+%5Bkube-system-kubernetes-dashboard-80%5D+100.112.15.1%3A9090+92+0.002+200&time_format=time_format+%25d%2F%25b%2F%25Y%3A%25H%3A%25M%3A%25S+%25z
just wondering had anyone looked into custom regex parser for ingress?
all good. think this will work:
http://fluentular.herokuapp.com/parse?regexp=%28%3F%3Cremote_addr%3E%5B%5E+%5D%29+-+%5C%5B%28%3F%3Cproxy_protocol_addr%3E%5B%5E+%5D%29%5C%5D+-+%28%3F%3Cremote_user%3E%5B%5E+%5D%29+%5C%5B%28%3F%3Ctime%3E%5B%5E%5C%5D%5D%29%5C%5D+%22%28%3F%3Cmethod%3E%5CS%2B%29%28%3F%3A+%2B%28%3F%3Crequest%3E%5B%5E%5C%22%5D%29+%2B%5CS%29%3F%22+%28%3F%3Ccode%3E%5B%5E+%5D%29+%28%3F%3Csize%3E%5B%5E+%5D%29+%22%28%3F%3Creferer%3E%5B%5E%5C%22%5D%29%22+%22%28%3F%3Cagent%3E%5B%5E%5C%22%5D%29%22+%28%3F%3Crequest_length%3E%5B%5E+%5D%29+%28%3F%3Crequest_time%3E%5B%5E+%5D%29+%5C%5B%28%3F%3Cproxy_upstream_name%3E%5B%5E+%5D%29%5C%5D+%28%3F%3Cupstream_addr%3E%5B%5E+%5D%29+%28%3F%3Cupstream_response_length%3E%5B%5E+%5D%29+%28%3F%3Cupstream_response_time%3E%5B%5E+%5D%29+%28%3F%3Cupstream_status%3E%5B%5E+%5D*%29&input=192.168.196.96+-+%5B192.168.196.97%5D+-+redflex+%5B10%2FNov%2F2017%3A00%3A12%3A42+%2B0000%5D+%22GET+%2Fapi%2Fv1%2Flogin%2Fstatus+HTTP%2F1.1%22+200+92+%22https%3A%2F%2Fdashboard.rts.onl%2F%22+%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+Win64%3B+x64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F61.0.3163.100+Safari%2F537.36%22+550+0.002+%5Bkube-system-kubernetes-dashboard-80%5D+100.112.15.1%3A9090+92+0.002+200&time_format=%25d%2F%25b%2F%25Y%3A%25H%3A%25M%3A%25S+%25z
Nginx now also support exporting in json format: https://nginx.org/en/docs/http/ngx_http_log_module.html (also possible in earlier versions but now with proper json escaping)
Note that escaping must be json.
You can specify you own logformat in json and it can be directly inserted in elasticsearch. No parsing is needed.
Would be a nice to have this in the docs.
@sander-su please check https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/configmap.md#log-format-upstream
@aledbf Oh nice, missed that.
Would be good to add that when using json, log-format-upstream has to be set to _json_
It would work otherwise but would not always parse correctly.
@sander-su sorry about that. The configuration exists but no docs (I just fixed that) https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/configmap.md#log-format-escape-json
Closing.
This can be done configuring the configmap with:
log-format-escape-json: "true"
log-format-upstream: '{"proxy_protocol_addr": "$proxy_protocol_addr"}'
(just an example, you can customize the fields you need)
Awesome. Still not sure why regex does not work. But I'll test this out in office next week
@aledbf ok, so i enabled json in configmap and i see the associated config inside pod
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app: ingress-nginx
data:
# use-proxy-protocol: "true"
proxy-body-size: 900m #https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/annotations.md#custom-max-body-size
log-format-escape-json: "true"
log_format upstreaminfo escape=json '$the_real_ip - [$the_real_ip] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status';
stream {
log_format log_stream [$time_local] $protocol $status $bytes_sent $bytes_received $session_time;
access_log /var/log/nginx/access.log log_stream;
error_log /var/log/nginx/error.log;
# TCP services
# UDP services
}
I do not see in elasticsearch these new fields.
Sample document in elasticsearch:
{
"_index": "logstash-2017.11.12",
"_type": "fluentd",
"_id": "AV-yneYbbcth13J2GDgC",
"_score": 1,
"_source": {
"log": "192.168.196.97 - [192.168.196.97] - xxxx [12/Nov/2017:23:43:30 +0000] \"POST /api/sockjs/487/xqvjzxcu/xhr_send?17e9205e77f1a944348ef9114fdd303c&t=1510530210799 HTTP/1.1\" 204 0 \"https://dashboard.xx.onl/\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36\" 703 0.003 [kube-system-kubernetes-dashboard-80] 100.112.15.1:9090 0 0.003 204\n",
"stream": "stdout",
"docker": {
"container_id": "c4a9b5b9bec7c949e5c8c188f115de4964ddc45d2b63ea0b588b72c984d2ea1c"
},
"kubernetes": {
"container_name": "nginx-ingress-controller",
"namespace_name": "ingress-nginx",
"pod_name": "nginx-ingress-controller-1701832439-g0g3h",
"pod_id": "bf6d20b4-c5f5-11e7-84f6-0283e4115176",
"labels": {
"app": "ingress-nginx",
"pod-template-hash": "1701832439"
},
"host": "ip-xxx.ap-southeast-2.compute.internal",
"master_url": "https://100.64.0.1:443/api"
},
"cluster_name": "atlas.xx.xx",
"@timestamp": "2017-11-12T23:43:30+00:00",
"tag": "kube.ingress-nginx.nginx-ingress-controller"
},
"fields": {
"@timestamp": [
1510530210000
]
}
}
What am i missing?
@shavo007 you need to change the log format to json. Please check my example
all good. for anyone else, here is config that matches log format. thanks for your help @aledbf and @sander-su
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app: ingress-nginx
data:
# use-proxy-protocol: "true"
proxy-body-size: 900m #https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/annotations.md#custom-max-body-size
log-format-escape-json: "true"
log-format-upstream: '{"proxy_protocol_addr": "$proxy_protocol_addr","remote_addr": "$remote_addr", "proxy_add_x_forwarded_for": "$proxy_add_x_forwarded_for",
"remote_user": "$remote_user", "time_local": "$time_local", "request" : "$request", "status": "$status", "body_bytes_sent": "$body_bytes_sent",
"http_referer": "$http_referer", "http_user_agent": "$http_user_agent", "request_length" : "$request_length", "request_time" : "$request_time",
"proxy_upstream_name": "$proxy_upstream_name", "upstream_addr": "$upstream_addr", "upstream_response_length": "$upstream_response_length",
"upstream_response_time": "$upstream_response_time", "upstream_status": "$upstream_status"}'
To parse existing logs, I've used this regex:
^(?<remote>[^ ]*) - \[(?<host>[^\]]*)\] - (?<user>[^ ]+) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<status>\d+) (?<bytes_sent>\d+) "(?<referrer>[^ ]*)" "(?<user_agent>[^\"]*)" (?<request_length>\d+) (?<request_time>[\d.]+) \[(?<upstream>[^\]]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>\d+) (?<upstream_response_time>[\d.]+) (?<upstream_status>\d+) (?<request_id>[^ ]*)
If you, like me, arrived here via google, and shavo007's fix didn't quite work (your log entries show up as one giant JSON blob in elasticsearch called log), it's because of this issue here: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/145
The fix is to add the following to your fluentd config to force fluentd to parse any log entries as json (stolen from here: https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml#L161-L177):
# Fixes json "log" fields in Elasticsearch
<filter kubernetes.**>
@id filter_parser
@type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
@shavo007 @aledbf Thanks for the help. The configuration for log-format-upstream as mentioned in your comments is working. The logs and the fields are now visible in Kibana.
The issue I am facing is all the fields are getting ingested as strings. This is a problem since I would have hoped to get fields like status or request_time to be of type integer or float.
Is there any configuration that can be used to provide type information.
I had a similar issue and had to update some fields for the parser - I guess the default Nginx log format was updated meanwhile.
@prashantkalkar I also added added the type definitions you were asking for earlier.
Thanks to everyone, the info in this issue helped me a lot.
In return, here is my config (maybe it's helpful for someone):
<filter kubernetes.var.log.containers.nginx-ingress-controller**.log>
type parser
key_name log
format /^(?<remote>[^ ]*) (?<host>[^\]]*) (?<user>[^ ]+) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<status>\d+) (?<bytes_sent>\d+) "(?<referrer>[^ ]*)" "(?<user_agent>[^\"]*)" (?<request_length>\d+) (?<request_time>[\d.]+) \[(?<proxy_upstream_name>[^\]]*)\] \[(?<proxy_alternative_upstream_name>[^ ]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>\d+) (?<upstream_response_time>[\d.]+) (?<upstream_status>\d+) (?<request_id>[^ ]*)/
time_format %d/%b/%Y:%H:%M:%S %z
types status:integer,size:integer,request_length:integer,request_time:float,upstream_response_length:integer,upstream_response_time:float,upstream_status:integer
reserve_data yes
inject_key_prefix nginx.
</filter>
recomment to use nginx ingress controller log-format to defiend a json log , and use json parser
Most helpful comment
all good. for anyone else, here is config that matches log format. thanks for your help @aledbf and @sander-su