Hosts: so since hosts blacklisting seems like it would be easier or simpler specifying wildcards like "click." or "click.*", is there a kind of dns software that can block hosts based on wildcards like those instead of specifying whole domains everytime?

Created on 5 Mar 2018  路  12Comments  路  Source: StevenBlack/hosts

the windows/unix hostfiles dont allow wildcards/regexps, I want to block additionally all domains starting with "click.", but dont have a clue how to yet

question

Most helpful comment

Yes and no.

Why no: Since hosts files only work on individual domains, the premise behind all of the contributions made to this project are based on that concept. For example, if I am blocking "hackerx.webhost.com" and I write a script to wildcard all second-level domains, I will end up blocking all subdomains of "webhost.com," even if the "hackerx" subdomain is the only bad egg in the whole domain. There is no way to know to what extent a fully qualified domain name is bad or not based solely on its fully qualified domain name. Companies like Google and Facebook that have services ranging from e-mail, instant messaging, ads, analytics, app engines, etc., and varied levels of blocked domains would get shut down completely and instantly get broken in far too many ways to make scripting a RegEx script practical.

That being said, in the data/stevenblack/hosts section you will find the following:

# Spam domains
# If your software is able, the below domains and all subdomains should be blocked
127.0.0.1 angiemktg.com
#*.angiemktg.com
127.0.0.1 weconfirmyou.com
#*.weconfirmyou.com

I started this section specifically for being compatible with RegEx/wildcards. Providing additional documentation with domains as such would allow you to know to what extent an FQDN is bad, but this would be up to the contributors to document. There could theoretically be an entire language hidden in comments to detail how an FQDN should be wildcarded. Here is another example:

127.0.0.1 angiemktg.com #.*[.]angiemktg[.]com
127.0.0.1 weconfirmyou.com #.*[.]weconfirmyou[.]com

If contributors can agree on a standard way to document RegEx, a script can then be made to easily extract only the RegEx and remove the countless duplicates and overlaps there are bound to be.

Why yes: If you are talking about strictly condensing the hosts to RegEx and NOT wildcarding, this would be a process similar to how ZIP compression works, but it wouldn't offer any additional functionality than what is already provided. The way ZIP compression works is it finds duplicate patterns and replaces them with much smaller symbols to represent longer repetitious strings of data and keeps a symbol table/dictionary to relate symbols to data so it can be decompressed later. In the case of condensing the hosts list to RegEx,it would work similarly by finding repeated patterns and building on to its RegEx patterns each time so the patterns include the newly discovered data, like refactoring an equation. But this would only serve to condense the hosts lists to a smaller list of RegEx patterns and not block anything additionally that isn't already blocked, so writing a script to do this seems a bit unjustified at the moment since the current lists are easily manageable by current software/hardware as they are already.

All 12 comments

Hello! Thank you for opening your first issue in this repo. It鈥檚 people like you who make these host files better!

Hi @dakd2,

Short answer: no, not for hosts files.

That said, some routing/router solutions support wildcards.

DualServer supports wild cards. Also, Privoxy is not a DNS software, but you can apply a URL blacklist to it and it supports wild cards and REGEX. Privoxy can also rewrite HTML real-time from whatever to whatever using REGEX, as well, and comes preloaded with a nice set of dynamic filters for ads, trackers, etc. Both DualServer and Privoxy can be shared by all network devices and share the same filters without any additional client-side software needed using just the standard DNS and proxy configurations, respectively.

@dakd2 take a look at Pi-hole

I was thinking about if it could be possible to do something like extract regular expressions or string patterns from steven black hosts and then use it in dnsmasq or bind for blacklisting/filtering

Yes and no.

Why no: Since hosts files only work on individual domains, the premise behind all of the contributions made to this project are based on that concept. For example, if I am blocking "hackerx.webhost.com" and I write a script to wildcard all second-level domains, I will end up blocking all subdomains of "webhost.com," even if the "hackerx" subdomain is the only bad egg in the whole domain. There is no way to know to what extent a fully qualified domain name is bad or not based solely on its fully qualified domain name. Companies like Google and Facebook that have services ranging from e-mail, instant messaging, ads, analytics, app engines, etc., and varied levels of blocked domains would get shut down completely and instantly get broken in far too many ways to make scripting a RegEx script practical.

That being said, in the data/stevenblack/hosts section you will find the following:

# Spam domains
# If your software is able, the below domains and all subdomains should be blocked
127.0.0.1 angiemktg.com
#*.angiemktg.com
127.0.0.1 weconfirmyou.com
#*.weconfirmyou.com

I started this section specifically for being compatible with RegEx/wildcards. Providing additional documentation with domains as such would allow you to know to what extent an FQDN is bad, but this would be up to the contributors to document. There could theoretically be an entire language hidden in comments to detail how an FQDN should be wildcarded. Here is another example:

127.0.0.1 angiemktg.com #.*[.]angiemktg[.]com
127.0.0.1 weconfirmyou.com #.*[.]weconfirmyou[.]com

If contributors can agree on a standard way to document RegEx, a script can then be made to easily extract only the RegEx and remove the countless duplicates and overlaps there are bound to be.

Why yes: If you are talking about strictly condensing the hosts to RegEx and NOT wildcarding, this would be a process similar to how ZIP compression works, but it wouldn't offer any additional functionality than what is already provided. The way ZIP compression works is it finds duplicate patterns and replaces them with much smaller symbols to represent longer repetitious strings of data and keeps a symbol table/dictionary to relate symbols to data so it can be decompressed later. In the case of condensing the hosts list to RegEx,it would work similarly by finding repeated patterns and building on to its RegEx patterns each time so the patterns include the newly discovered data, like refactoring an equation. But this would only serve to condense the hosts lists to a smaller list of RegEx patterns and not block anything additionally that isn't already blocked, so writing a script to do this seems a bit unjustified at the moment since the current lists are easily manageable by current software/hardware as they are already.

@dakd2, I mentioned Privoxy above, you may be interested in their RegEx patterns. Their official project is a SourceForge CVS, but there are plenty of mirrors/forks on GitHub you can check out:
https://github.com/kebugcheckex/privoxy/blob/master/default.filter

They dynamically block ads, analytics, etc., similar to our project, but work solely on HTML RegEx. HTML RegEx is obviously different from FQDN RegEx, but many of the patterns target FQDNs, as well, and you can extract and edit the patterns you are interested in. It's definitely something to play with at least when dealing with RegEx blocking.

Something I discovered is that uBlock origin will automatically block any subdomain of a blocked host. So a list that has 0.0.0.0 mixpanel.com would also block api.mixpanel.com - or any of its other subdomains. Its defiantly not as convenient as a dns-level or network level block, but it works for browsers and doesn't require a special format

@dakd2

Check: https://github.com/cbuijs/instigator

DNS Server I wrote in Python with as basis filtering. Regex filtering on both requests and replies comes as standard.

@ScriptTiger, at least Windows redirects only the specific hostname listed, but not its subdomains; however, dnsmasq (the tool that underlies Pi-Hole) does block listed domains and all subdomains.

Folks, I'm going to close this because it's out of scope.

There is actually a well suitet solution about wildcard blocking based on regex. But you need to run it on linux. The best here is dnsdist from https://github.com/powerdns/pdns

it is supporting both RE2Rules and Regex rules and is very fast and flexible. you can combine it with the recursor or unbound to speed up you're queries enhance you're privacy etc.

Was this page helpful?
0 / 5 - 0 ratings