The Envoy access_log documentation has an example with REQ(X-ENVOY-ORIGINAL-PATH?:PATH) for logging the request's path, but both the :PATH and X-ENVOY-ORIGINAL-PATH header contain the query string.
Query strings can sometimes contain sensitive details like access tokens - it would be great to be able to log the path without the query string, but I don't see a way to do this in Envoy currently. Opening this to either see if I'm missing something, or track adding this (small, I think) feature.
Currently, we have %REQ(MAIN?ALT_IF_MAIN_DOES_NOT_EXISTS):MAX_LENGTH
Do you have a suggestion on how we express that (remove qs)? Probably it is good to have a "special operator" for PATH e.g. %REQ(REMOVE_QUERY_STRINGS(MAIN, *)?REMOVE_QUERY_STRINGS(ALT_IF_DOES_NOT_EXISTS, token&id&key)). But not sure. WDYT?
something like REMOVE_QUERY_STRING($x) sounds reasonable - it would be nice to be able to say %REMOVE_QUERY_STRING(%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%)% or %REMOVE_QUERY_STRING(:REFERER)% (can logging operators like this currently be nested?)
I think my preference would be for a blunt tool - removing the entire query string. Denylists of fields to remove from the query string are very brittle - its too easy for clients (either internal clients developed by other teams or external users) to purposefully or accidentally change field names and break denylist-based redaction.
@bobby-stripe thanks!
I think my preference would be for a blunt tool - removing the entire query string
I agree. That makes sense.
@zuercher do you have advice here?
The current code won't automatically handle nesting afaik, though I imagine you could make that work. Would it be easier to just have a %REQ_NO_QUERY(:path)? The "remove query params" operator is coupled with headers to begin with, so I don't imagine there are other relevant nesting combinations?
Relevant formatting code: https://github.com/envoyproxy/envoy/blob/master/source/common/access_log/access_log_formatter.cc#L226
Referring to the following section of the RFC for URIs:
https://tools.ietf.org/html/rfc3986#section-3
The following are two example URIs and their component parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
Shouldn't the path not include the query string in the first place?
I suppose since the current way to log this information is by the macro %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%, indicating that it's a header set by envoy (?), then it should be okay to make additional macros which make up for it, such as:
%PATH% the path component of the URI
%QUERY% the query component of the URI
Are there concerns over creating more macros like this?
I don't want to break anyone's logs, but I also wish to log the path and query separately.
Thoughts?
@cetanu I feel your argument is valid. Let see what @snowp thinks about it.
@cetanu the :path here is referring to HTTP's :path pseudo-header, described in HTTP/2 spec RFC 7540:
The ":path" pseudo-header field includes the path and query parts
of the target URI (the "path-absolute" production and optionally a
'?' character followed by the "query" production (see Sections 3.3
and 3.4 of [RFC3986]). A request in asterisk form includes the
value '*' for the ":path" pseudo-header field.
@cetanu how about referer? And also as per @lizan above the :path contains query string.
I made an attempt with REQ_WITHOUT_QUERY in #7847, it seems straightforward from the implementation perspective. As always feel free to comment about it.
And I also feel like probably REQ(X?Y):Z:POST_PROCESSING somewhat future proof. WDYT?
A lot of requests don't have a referer in my experience, plus it can be somewhat falsified. The REQ_WITHOUT_QUERY is probably good enough though. If it fits with the existing convention for logging macros, then why not. I'm not an authority on this :P
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.
Any Plan as when this will be available i.e. access log Path without Query Param?
We just got hit with this during our last upgrade of istio. Where the latest log format had great information about the path but was leaking information in our logs that we didn't want (I know GET shouldn't send any sensitive information, but 3rd party SDK are not always clean). Would have been perfect to be able to keep the path without the query string.
Can we please reopen this :)
@zuercher do you think we can add another one to the rules in substitution formatter for this?
We'd like to have this feature too. I'd be glad to work on this if nobody is working on it, or to collaborate in any way needed to make it go faster.
Relevant: https://github.com/envoyproxy/envoy/pull/12991#issuecomment-692364642 cc. @zuercher @rgs1
Most helpful comment
Referring to the following section of the RFC for URIs:
https://tools.ietf.org/html/rfc3986#section-3
Shouldn't the path not include the query string in the first place?
I suppose since the current way to log this information is by the macro
%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%, indicating that it's a header set by envoy (?), then it should be okay to make additional macros which make up for it, such as:%PATH%the path component of the URI%QUERY%the query component of the URIAre there concerns over creating more macros like this?
I don't want to break anyone's logs, but I also wish to log the path and query separately.
Thoughts?