MODSEC-194: Il would be useful to have a filter that convert all homoglyphs to their ASCII (or Latin?) equivalent.
This would be useful to stop SQL smuggling.
Original reporter: marcstern
rbarnett: Agreed. Two comments -
1) We are looking into implementing something similar to Snort's unicode.map file for conversions
http://cvs.snort.org/viewcvs.cgi/checkout/snort/etc/unicode.map?rev=HEAD&content-type=text/plain
2) In the meantime, the latest CRS v2.1.1 has the BETA advanced_filter_converter.lua script that is used to normalize many of the same issues. This file is the Lua port of the PHPIDS Converter.PHP logic which combats many of these evasions attempts. The Lua script is used by the newly named modsecurity_crs_41_advanced_filters.conf file -
http://mod-security.svn.sourceforge.net/viewvc/mod-security/crs/trunk/experimental_rules/modsecurity_crs_41_advanced_filters.conf
marcstern: Also, extended characters like %u2329 should be supported. Currently, the lowest byte is zeroed which inhibits the parsing of these characters.
Should I open a new bug?
rbarnett: We might be able to extend t:urlDecodeUni to better handle this issue. For example, we could do different Unicode mappings using the data found here -
http://www.lookout.net/2010/12/20/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/
http://www.lookout.net/wp-content/uploads/2010/12/uni2asc.csv
http://www.lookout.net/wp-content/uploads/2010/12/bestfit.csv
@zimmerle why was this abandoned it'd be cool to do homoglyph detection, perhaps we can do this in a CRS rule @dune73, thoughts?
Sure think it would be great to do this, but it sounds very tricky. It's certainly more flexible if done within a rule, but maybe it is too expensive and should be covered by ModSec itself.
Also I lack the know-how about much of this encoding, homoglyph stuff. So a couple of attacking payload examples would help me and probably some others to look at this from a practical viewpoint.
I think I can help here.
There are several pre-requisites & limitations.
Pre-requisites:
Limitations:
hmm yeah these are some good points... the transformation system as it exists is kinda not great is it... just not sure of other options. likewise good points need to be made about updating the unicode mapping file, i'm gonna link this issue in an open CRS bug we have on that matter.
Maybe the update to the unicode.map could be eased with something like CLDR transforms like Cyrillic->Latin
The fact that SecUnicodeMapFile is a global setting is a limitation indeed, but I think something like this can work for some scenarios:
<Location "/mysite/english/home/">
SecUnicodeMapFile unicode.mapping 1215
</Location>
<Location "/mysite/russian/home/">
SecUnicodeMapFile unicode.mapping 20127
</Location>
I think the point is not to convert automatically (that's what I merely did) but to know
Most helpful comment
I think I can help here.
There are several pre-requisites & limitations.
Pre-requisites:
SecUnicodeMapFile {...}/unicode.mapping 20127
Limitations:
We have an extended version (more or less exhaustive) that I generated automatically and updated manually.
This file is not public yet because I consider it potentially not 100% correct and I don't want to distribute this information that we use in highly sensitive environments to attackers.
It needs to be reviewed by several people but, most of all, the mapping principle should be validated: which characters should be mapped? For accented letters, it's obvious but what about greek characters for instance? Should they be mapped to a letter? What about the characters 02C5 (MODIFIER LETTER DOWN ARROWHEAD) & 02C7 (CARON)? Should they be mapped to a V?
In order to answer that, I think we need an exhaustive list of the back-end systems (app servers and DB) that perform this kind of mapping and to adapt the list consistently.
Potentially, we need to create several entries, one for each back-end.
I we can construct complete requirements, I'll complete it and share it with everybody.
t:utf8toUnicode,t:urlDecodeUni,t:htmlEntityDecode,t:utf8toUnicode
because a Unicode character could be coded as an html entity on top of the opposite - to be validated (as our parsing is maybe paranoiac)