Hi there,
I'm having an issue where a field in my extra context would be filtered out because it contains the string 'password'.
The structure can be summarized as:
"extra": {
"field": "something_containg_the_string_password"
}
The value of the field is filtered out.
Is there a reason for this ?
Cheers,
Math
Our data filtering is covered in the docs:
Hi,
We're having the issue as well.
The code here: https://github.com/getsentry/sentry/blob/master/src/sentry/utils/data_scrubber.py#L104
means that our (standard) Django session cookie named "s", if added to the "sensitive fields" will make Sentry filter-out almost all the data.
What's a good reason not to simply filter on dictionary names (matching exactly) ?
The reason we do contains is this: "my_secret_password"
This is something we'll need to expand support on in the future, but it might be things like allowing you to explicitly whitelist an exact match.
I see, but still am pretty sure this is an anti-pattern, isn't it ? We want to hide based on the keys we define software-side for our systems, not based on the assumption that some value would maybe contain the "password" value.
It may hide valuable information (what if my server resides on passwords.mycompany.com ?)
Also, one may say that hiding a secret as secure as "my_magic_password" is, ahem... :)
I really think we should limit sentry to filter only on exact or fuzzy key matches for the sensitive fields, eventually with regexes for the values (like the card information as you already do, even if I'd say it should be simply field-name protected instead of recursively matched in every field)
I still don't see a very good reason to do it the way it is now :)
We are aggressive here because we absolutely do not want to leak sensitive data. It's important to understand that Sentry works cross-platform on just about every device you can imagine. In many cases data is not presented as key/value pairs. For example "foo=bar&password=baz" could be the value of a field, and we'd absolutely not want to capture that in case password is sensitive here. If you're situation is safe, you can disable disable the data filtering.
OK, that's a better use case. I'll see if I can come up wih something to enable finer control on what is filtered or not in a subsequent PR or discussion.
We do need filtering :) Just not that aggressive.
This also might be something where we can just make an additional setting that is for excluding by exact match. Definitely open to improving this as it's obviously problematic, but we don't want to change the defaults to be less restrictive.
Glad to see that I'm not the only one running into this issue.
After being told to RTFM, I've patched my own install to suit my needs.
I understand Sentry's approach, but it would be nice to be able to make it fit our needs without having to patch it.
Would be nice to have a way to whitelist keys for example.
Would having a "whitelist these key names" be sufficient enough? Do we need CONTAINS on the key names?
Hi,
I personnally feel it's not the best option I'd choose. Whitelisting every possible field name for every form, http request, for our ~100 different services seems harassing :)
@dcramer: if you wish so I think your proposed approach of choosing either fuzzy or exact match for the fields is a better one; but my personal favourite would be adding another whitelist.
So there would be the "contains" search on keys & values for all the sensitive words, including the default ones, AND an "exact match" blacklist based on key names (which may include default ones or not)
I'd gladly code that if need be.
My main concern about having multiple styles of matching is the user experience suffers.
There's already three options:
Now we'd need two fields for additional filters, one which does CONTAINS and one which does EXACT. Now what if we do a whitelist? We'd need two more fields for the same thing.
I'm greatly opposed to simply having an option to change all matching, as that's another poor user experience.
This might be something where we need to do something equivalent to rules and let you create a set of rules. "scrub data when key CONTAINS x".
@dcramer When you say 'Scrubbing' removes filtered data, what does that exactly mean?
Thanks for your kind answer!
Scrubbing happens before we store any data on disk.
@mattrobenolt Can you point me to the module or file which handles data scrubbing in Sentry 9 and above?
@ravi-ojha - very likely to be this: https://github.com/getsentry/sentry/blob/1ff4b3f02068031e08c3d70ead193011d74571e1/src/sentry/web/api.py#L306-L347
@jan-auer can you confirm?
Yes, and this means that datascrubbing now lives in libsemaphore:
Most helpful comment
Scrubbing happens before we store any data on disk.