Hosts: i'd like to publish this data in RPZ format

Created on 6 Dec 2017  路  51Comments  路  Source: StevenBlack/hosts

hello. i am paul vixie, and along with vernon schryver i co-created the DNS firewall system known as RPZ. see https://dnsrpz.info/ for more information about that, but briefly, it's a way to publish DNS policy information in a way that many different RDNS servers can subscribe in real time. the original implementation was in BIND9 but it is now supported in PowerDNS as well.

i realize that the "hosts" corpus doesn't change all that often, and that the real time update capability of DNS RPZ will probably not come into play very often here. however, there is a dearth of freely available DNS RPZ content -- most publishers are commercial. with this community's permission, i would set up a cron job to "git pull" from this repo, convert the results into DNS RPZ format, and if there's any change from the prior content, run an "rndc reload" to cause the new content to be absorbed and pushed out.

i'm doing this now with the suspect-networks.io feed, which it now occurs to me, you may want to incorporate into your corpus here. in any case i won't take this action without consensus, and steven black told me by e-mail that opening an issue here is the way to measure such consensus.

one final detail -- RPZ is free, unencumbered by patent, and i am not proposing to charge money for the DNS RPZ subscription version of this corpus. i will also give full source-credit whenever i discuss it. i don't ask for money or glory, only the opportunity to extend the reach of your work and the utility of DNS RPZ.

other information can be found here: https://en.wikipedia.org/wiki/Paul_Vixie

Most helpful comment

Fine, fine. Since you all are ganging up on me about this, DONE! They are LF now. I let git take care of my line endings for me.

All 51 comments

Hello! Thank you for opening your first issue in this repo. It鈥檚 people like you who make these host files better!

As long as all the licensing terms of use are observed, I don't have any problem with it. I'm sure other members may want to put in a bit more due diligence before commenting either way, but I'm fine with anything that makes the Internet a safer place for more people.

It has my thumbs up. Especially if Paul is part of it ;).

Hi Paul @vixie, is this something this repo might do natively? Is this the best spec to study?

We also have Issue #47 and #139 which call for listing hosts in ip6 as format as well, which is among our next steps, I think. How do you deal with this?

If you'd like to use the hosts files here and crank them into another format, I have no objection to that, bearing in mind that comments from origin sources should be preserved insofar as the licensing requires it.

i don't think the repo can do it natively, since it involves speaking the DNS protocols, specifically NOTIFY, IXFR, and AXFR. yes, that version of the spec is fine. we handle ipv6 addresses just fine. the way i'd handle it is by parsing your "hosts" files and generating qname and nsdname triggers, having nxdomain actions. i can preserve comments as non-operative TXT records in the rpz results, but there is no annotation in text of an actual rpz "hit", so those TXT records would only show up during diagnostic work.

Well, to reference https://github.com/StevenBlack/hosts/issues/139 again, Steven's latest comments about starting a server instance and API for hosts file distributions may come into play again here if he wants to go the extra distance to host a minimal DNS server to distribute it that way, as well. If it's all in the same environment using shared resources and the DNS is configured to only distribute the policies and not act as a fully functional DNS, I don't think it's that much of a leap at all.

The counter to the above, of course, would be I'm sure Paul would rather prefer something he could have full control over. Seeing as how the specs are only a few months old, I think he is looking to get a working proof of concept off the ground to solidify his model in general before having a bunch of other people jumping in. Of course I can also see how he might prefer to delegate out some of the work, as well, depending on what his personal schedule, etc., is like.

Either way I think this is up to Paul and his needs. And then up to Steven, as well, if Paul decides to go the route of partnering up to some degree. In any case, I think either decision remains beneficial to the consumers at large.

@vixie understood. Go for it!

@ScriptTiger i have no need of control, but i note that i care more about this than most others, so am willing to do the work. also the specs are five years old but the rfc draft is new. RPZ has been out in the world since 2011 or so. what's missing is solid well-curated free content for it. that's where "hosts" comes in.

@StevenBlack ok will do.

I think in your personal position there may be some political things to consider, as well, on behalf of your project. Personally taking on this project would put your project at odds with Google on the analytics and ads pieces, as well as all the other companies who provide such legitimate services that this particular community simple does not endorse. If Steven were to perform the service and you to simply point people in that direction as an available content host, but not officially endorse the content and present it more as an option or example, it might give you a bit more plausible deniability. I am not sure if this really applies to your particular case as I am not so intimately involved with your project, but it may be something to consider depending on the level of visibility you wish your project to have.

Also, just to clarify some terminology. I have been doing a bit of reading on this, how does the content here fit in with the "reputation data" model? We do indeed have many third parties contribute to this list, but we don't score domains or anything like that, we simply put them on the list or don't. Is this a different model from the earlier specs involving reputation data? Or have they both always coexisted as just different features of the same thing to rely on reputation data and/or supplied domain lists? After some reading I was kind of left with the impression the reputation data model was rather core to the project, transferring over from your earlier work with the Mail Abuse Prevention System.

Hey @ScriptTiger please go easy on visitors. Paul @vixie, who needs no introduction, simply came here for permission and clearance, and it's all-good.

Lol, I totally understand. I say these things out of personal interest, so I do hope it's not misunderstood.

@ScriptTiger RPZ is not limited to reputation, it even has some non-security purposes such as configuration management (I use it for localhost handling, for example.) There is no scoring.

@StevenBlack Firstly, I want to thank you and others who have contributed and keep doing so for maintaining the lists. I've personally deployed numerous pi-hole's for family and friends and without this work, the options would have been non-existent.

Re-opening this issue since I don't think the the conversion of the lists to DNS RPZ format happened. With the community's permission, would it be alright if I volunteer to do it ( giving full credit to this repo/source)? @vixie

Hi Swapneel @swapneelp! This strikes me as a very good idea, and you have my blessing and support to take this as far as you'd like to take it.

Over in Issue #739, Add support for Little Snitch Rule Group Subscriptions format we are considering offering domains in the Little Snitch format, too. I think this product would live alongside each hosts file variant, in the same folders.

So you can carry this forward as its own separate project/repo, or integrate RPZ into the products of this repo, I'm happy either way.

@StevenBlack Thank you for the quick response and support.

My suggestion is to create a separate repo to begin with and get it up and running. The scripts & documentation can then be included in this repo. I will then offer a DNS RPZ feed via my recursive DNS server which other users running DNS RPZ can do a DNS zone transfer(AXFR/IXFR). This way, users purely interested in using the feed will not have to do the conversion from the current format to the DNS RPZ format. They will merely have to point their recursive DNS server to my master server. If more people volunteer & offer zone transfers, the better it is for the community.

For people who would not want to do a zone transfer, I will also include the scripts and document the steps for doing the conversion. @vixie

@swapneelp, I understand what DNS RPZ is, but I'm not too keen on the "DNS RPZ format." I know after your last post you're planning on doing a lot of work in relation to this, but can you just quickly post a quick sample of what the format is, like a sample of the file contents, etc.?

@ScriptTiger Sure. DNS RPZ uses zone files. Below is a sample output from abuse.ch(The site appears to be facing some issues right now, I've shared the archived link of the page from wayback machine along with the original source,

Wayback machine - urlhaus.abuse.ch DNS RPZ page

Original Source - urlhaus.abuse.ch DNS RPZ feed

Ah, okay. I'm familiar with regular BIND zone files, but when searching about how to format DNS RPZ I kept getting the bracketed and blocked formats, but I think those must have been for the RPZ policies. So just to clarify, the format that you're looking to convert to is as show below?

some.domain.com CNAME . ; Comment comment comment
another.domain.com CNAME . ; Comment comment comment

@ScriptTiger Yes.

The sample link I had shared as well as the code output in your last comment results in the NXDOMAIN RPZ action. Just like any firewall, DNS RPZ is basically made up of TRIGGERS & ACTIONS. If anyone wants to learn more about the various TRIGGERS & ACTIONS, please see draft-vixie-dnsop-dns-rpz-00

For the latter part of your comment, that's not the only value. As mentioned earlier in this comment, with DNS RPZ, one can use various TRIGGERS. For example if I want to serve NXDOMAIN for the StevenBlack/hosts for the IP addresses 192.168.2.1, 192.168.2.3 (Refer "Client IP Address" Trigger) but exempt 192.168.2.2 from that policy action (Refer "PASSTHRU" Action). Can we do this policy based(TRIGGERS & ACTIONS) natively without DNS RPZ ? No.

The distribution of the zone data can happen using dns zone transfers(AXFR/IXFR) and is not a feature of the DNS RPZ.

see also https://dnsrpz.info/ which i linked in the head of this thread. and note that the draft is up to -04 now; -00 was just the initial rough cut. see https://tools.ietf.org/html/draft-vixie-dns-rpz-04 for newest.

I completely understand the triggers and actions and the value of RPZ, don't get me wrong at all there, it's definitely awesome. I'm just thinking about who else can use the same format. As long as the RPZ-specific options aren't used, such as rpz-passthru.,rpz-drop., andrpz-tcp-only., and the file headers are removed, it seems to me there are other DNS and firewall applications, for instance, that could use the same format, as well.

For example, the following may be what most are familiar with when it comes to a normal BIND CNAME record declaration.

example.com. IN CNAME host1.example.com.

However, the IN class is actually defaulted to and can be omitted entirely to make it look like the same format as we want for DNS RPZ.

example.com. CNAME host1.example.com.

And there are other applications, as I said, which use the same format, as well. So I'm not at all dissing RPZ, I am just further clarifying the format for those that may want to borrow your product for other uses and make the most out of it.

Besides the header bits (i.e. $TTL 30, @ SOA rpz.urlhaus.abuse.ch. hostmaster.urlhaus.abuse.ch. 1908211936 300 1800 604800 30, NS localhost.), do these RPZ files look about right? Would they function for the purposes of DNS RPZ?

https://scripttiger.github.io/alts/

@ScriptTiger yes, it works
(or at least bind doesn't complain & the 2 names I tried were blocked :)

It would be nice if you could summarize things tho - eg: one line instead of 1384:

$ grep '\.2o7\.net' db.stevenblack-rpz | wc -l
1384

could be one line

*.2o7.net  CNAME  .

@ScriptTiger I've looked at #530 but .. what specifically?

I'm already using privoxy & have a script that converts the blocklist to a privoxy action file and comments out redundant lines - eg:

.privacyassistant.net
# blocked by prev: www.privacyassistant.net
# blocked by prev: ext.privacyassistant.net
# blocked by prev: www.ext.privacyassistant.net

If you're talking about

If contributors can agree on a standard way to document RegEx

I like your idea of
#*.angiemktg.com
for saying all of angiemktg.com is to be blocked

noting, DNS RPZ is meant to be the standard format of DNS policy filtering. i'd hate to hear that we'd failed. .foo works, but foo. does not. that is, wildcards work in RPZ the way they work in DNS, not the way they work in the shell. we did not support regex because we have multimillion-rule RPZ's and we have to be able to search them in constant time. regex and any other string search is quadratic.

also noting, the file is a zone, and zones can be transferred, and zone transfers can be triggered by a NOTIFY (reducing update latency), secured with TSIG (ensuring authenticity and possibly offering access control as well), and incrementally transferred using IXFR, meaning only the deltas have to be transmitted. we can update a thousand subscribers of a multi-million rule RPZ every second -- and we do. i realize that rsync and other tools could do this for files. but with DNS RPZ it's all done by DNS.

what i do with policy rulesets that are not in RPZ format is fetch them in a cron job, diff each against prior, and if there is a difference, i copy it into place and tell BIND9 to reload that zone. BIND9 has an option called ixfr-from-differences whereby it loads the new zone, computes the deltas from the prior zone, and then offers that delta as IXFR to each subscriber, who is told with NOTIFY, and who may know a TSIG preshared key. the only latency is in the cron job frequency; all else is real time.

see also this work:

https://www.researchgate.net/publication/228895300_Modern_DNS_as_a_coherent_dynamic_universal_database

a full text of this article is online at:

http://family.redbarn.org/~vixie/dnsind.pdf

@ler762, check the RRPZ files:

https://scripttiger.github.io/alts/

On Saturday, 24 August 2019 06:35:44 UTC ScriptTiger wrote:

@ler762, check the RRPZ files:

https://scripttiger.github.io/alts/

ty! RRPZ looks like it won't be nec'y and may overblock (there could be
subdomains of the shared domain which are not malicious).

how frequently are these files updated? i don't want to fetch more often than
that.

--
Paul

RRPZ looks like it won't be nec'y and may overblock

We disagree. As far as I'm concerned, there's no such thing as overblocking things like 2o7.net
but if somebody wants to allow certain names they can do

appleglobal.112.2o7.net CNAME rpz-passthru.  ; unbreak apple.com

@ler762, check the RRPZ files:

sweet! Again, bind doesn't complain but I get weird results - like appleglobal.112.2o7.net isn't blocked! I was thinking it was me not understanding/misconfiguring something, but I'm not so sure now.. It'd be nice if others would test

Starting out with

$ grep '\.2o7\.net' db.stevenblack-rrpz
*.2o7.net CNAME .
and.co.uk.102.122.2o7.net CNAME .
;0.0.0.0 appleglobal.112.2o7.net #breaks apple.com
;0.0.0.0 applestoreus.112.2o7.net #breaks apple.com
ehg-moma.hitbox.com.112.2o7.net CNAME .
expedia.ca.112.2o7.net CNAME .
infomart.ca.112.2o7.net CNAME .
infospace.com.112.2o7.net CNAME .
metro.co.uk.102.122.2o7.net CNAME .
overstock.com.112.2o7.net CNAME .
sa.aol.com.122.2o7.net CNAME .
; 0.0.0.0 appleglobal.112.2o7.net #[affects Apple site]
flyingmag.com.122.2o7.net CNAME .
homepjlconline.com.112.2o7.net CNAME .
nysun.com.112.2o7.net CNAME .
popsci.com.122.2o7.net CNAME .
; 0.0.0.0 survey.112.2o7.net #[affect Sprint.com]
bcbsks.com.102.112.2o7.net CNAME .
www.ikea.122.2o7.net CNAME .

appleglobal.112.2o7.net isn't blocked

If I comment out all the extra 2o7.net names

$ grep '2o7\.net' db.stevenblack-rrpz
; 2o7.net -- server side tracking
2o7.net CNAME .
*.2o7.net CNAME .
; and.co.uk.102.122.2o7.net CNAME .
; appleglobal.112.2o7.net       CNAME .
;0.0.0.0 appleglobal.112.2o7.net #breaks apple.com
;0.0.0.0 applestoreus.112.2o7.net #breaks apple.com
; ehg-moma.hitbox.com.112.2o7.net CNAME .
; expedia.ca.112.2o7.net CNAME .
; infomart.ca.112.2o7.net CNAME .
; infospace.com.112.2o7.net CNAME .
; metro.co.uk.102.122.2o7.net CNAME .
; overstock.com.112.2o7.net CNAME .
; sa.aol.com.122.2o7.net CNAME .
; 0.0.0.0 appleglobal.112.2o7.net #[affects Apple site]
; flyingmag.com.122.2o7.net CNAME .
; homepjlconline.com.112.2o7.net CNAME .
; nysun.com.112.2o7.net CNAME .
; popsci.com.122.2o7.net CNAME .
; 0.0.0.0 survey.112.2o7.net #[affect Sprint.com]
; --WTF-- bcbsks.com.102.112.2o7.net CNAME .
; www.ikea.122.2o7.net CNAME .

appleglobal.112.2o7.net. is blocked

Uncomment just the one line

bcbsks.com.102.112.2o7.net CNAME .

and looking up appleglobal.112.2o7.net. works :(

If it makes a difference, I'm testing with bind
Version: 9.11.5-P4-5~bpo9+1-Debian

ty! RRPZ looks like it won't be nec'y and may overblock (there could be
subdomains of the shared domain which are not malicious).

As referenced in https://github.com/StevenBlack/hosts/issues/530, technically it would be impossible to wild card with 100% accuracy because the needed information just is not present in the hosts file format. However, I will reference the RRPZ description from the download page before continuing:

RRPZ
Similar to the RPZ format except that multiple child domains of a single blocked parent domain are reduced to a single wild card child domain of the shared common domain. Please note if multiple child domains of a common parent domain are present but the parent domain itself is not, those entries will not be reduced as the parent itself is not blocked and may have other child domains that shouldn't be blocked, as well.

That description should directly address your concerns. Basically in order to generate a list with wild cards certain assumptions are made, such as if a common shared parent domain itself is blocked and has child domains that are blocked as well, then the entire domain and all child domains should be blocked. This is an "if and only if" scenario, so it's a bit of a compromise or hybrid. If there are no child domains of any given domain blocked, then child domains for that given domain will not be blocked. And, as stated in the description, if multiple domains share a common parent domain, all such child domains of said parent domain will only be blocked if and only if that parent domain is explicitly blocked, as well.

However, I am, of course, all ears for how this algorithm might be improved if there are any suggestions to that effect.

how frequently are these files updated? i don't want to fetch more often than
that.

I regenerate all of the data for my entire website shortly after every new release from @StevenBlack. Obviously a generation script is used, but the script itself is triggered manually by myself and overseen for quality control purposes. Feel free to check out the website's repo here:

https://github.com/ScriptTiger/scripttiger.github.io

And the website's commit log:

https://github.com/ScriptTiger/scripttiger.github.io/commits/master

You can also find a similar chronicling of events posted as they happen from my Twitter:

https://twitter.com/scripttiger

sweet! Again, bind doesn't complain but I get weird results - like appleglobal.112.2o7.net isn't blocked! I was thinking it was me not understanding/misconfiguring something, but I'm not so sure now.. It'd be nice if others would test

@vixie could probably answer that much better than I. Whether it's a legitimate bug or if there's some context we are not taking into account, I can't think of a better person to consult with than the imagineer himself.

@ScriptTiger what is a child domain? I think you want to use, subdomain. That's the universally used nomenclature.

I'll make a separate post for this as it's just me thinking and not an actual solution or expert opinion of any kind.

The example @ler762 uses is appleglobal.112.2o7.net becomes unblocked once bcbsks.com.102.112.2o7.net is blocked, despite having *.2o7.net in place. This would lead me to believe that all parent domains of bcbsks.com.102.112.2o7.net are graced with an override to ignore the *.2o7.net rule.

ty! RRPZ looks like it won't be nec'y and may overblock (there could be
subdomains of the shared domain which are not malicious).

This leads me to believe that, again, this is part of that compromise to ensure domains are not over-blocked, and perhaps evidence to that effect, as implemented and carried out within RPZ itself. Explicitly blocked domains will, of course, remain blocked. However, explicitly blocked child, grandchild, etc., domains of an explicitly blocked parent domain with a wild card get an override when their child, grandchild, etc., domains are present in any further explicitly blocked domains.

For example, after *.2o7.net is implemented, let's assume all child, grandchild, etc., domains become successfully blocked. However, once bcbsks.com.102.112.2o7.net is also explicitly blocked, com.102.112.2o7.net, 102.112.2o7.net, and 112.2o7.net are overriden with this new rule and ignore *.2o7.net.

This might actually be seen as an improvement upon my originally intended compromise. Consider the following comments:

;0.0.0.0 appleglobal.112.2o7.net #breaks apple.com
;0.0.0.0 applestoreus.112.2o7.net #breaks apple.com

Because of the way RPZ works, these domains are actually not blocked because of bcbsks.com.102.112.2o7.net providing an override for the 112.2o7.net parent domain, saving apple.com from being broken. All you would have to do is uncomment those lines, as stated, to explicitly block them if you don't care that apple.com gets broken.

Thoughts?

@ScriptTiger what is a child domain? I think you want to use, subdomain. That's the universally used nomenclature.

@StevenBlack, my apologies!!! I would have corrected that in my last comment had I read yours first, but apparently we were both literally typing our comments at the same time and you beat me to it and now my wording looks silly and like I am specifically trying to be annoying in the face of your previous comment, which I am not. Further references will be made to subdomains and not child domains, noted!

ScriptTiger wrote on 2019-08-24 17:09:

ty! RRPZ looks like it won't be nec'y and may overblock (there could be
subdomains of the shared domain which are not malicious).

As referenced in #530 https://github.com/StevenBlack/hosts/issues/530,
technically it would be impossible to wild card with 100% accuracy ...

RRPZ
Similar to the RPZ format except that multiple child domains of a
single blocked parent domain are reduced to a single wild card child
domain of the shared common domain. Please note if multiple child
domains of a common parent domain are present but the parent domain
itself is not, those entries will not be reduced as the parent
itself is not blocked and may have other child domains that
shouldn't be blocked, as well.

That description should directly address your concerns. ...

However, I am, of course, all ears for how this algorithm might be
improved if there are any suggestions to that effect.

when i generate an rpz i usually put in two rules for any known-bad domain:

example.com CNAME .
*.example.com CNAME .

this follows from my belief that the identifier was granted based on
public trust, which has not been met in this case, and it ought to be
revoked, but there's no mechanism for that, so we're doing this.

so if i knew of three bad names which shared a common parent, such as:

foo.example.com
bar.example.com
baz.example.com

i would generate the following RPZ content:

foo.example.com CNAME .
*.foo.example.com CNAME .
bar.example.com CNAME .
*.bar.example.com CNAME .
baz.example.com CNAME .
*.baz.example.com CNAME .

in no case would i connote irresponsibility to the parent, since it
might be an effective TLD and i would never want to block, e.g., *.com
or *.co.uk or *.co.nz or similar.

how frequently are these files updated? i don't want to fetch more
often than
that.

...

You can also find a similar chronicling of events posted as they happen
from my Twitter:

https://twitter.com/scripttiger

ty!

sweet! Again, bind doesn't complain but I get weird results - like
appleglobal.112.2o7.net isn't blocked! I was thinking it was me not
understanding/misconfiguring something, but I'm not so sure now..
It'd be nice if others would test

@vixie https://github.com/vixie could probably answer that much better
than I. Whether it's a legitimate bug or if there's some context we are
not taking into account, I can't think of a better person to consult
with than the imagineer himself.
in the instructions for setting up an RPZ, i advise adding a sentinel so
that you can test to see whether it's detected. if not, the problem is
in your RDNS config, nor in your zone generation.

--
P Vixie

when i generate an rpz i usually put in two rules for any known-bad domain:

example.com CNAME .
*.example.com CNAME .

this follows from my belief that the identifier was granted based on
public trust, which has not been met in this case, and it ought to be
revoked, but there's no mechanism for that, so we're doing this.

so if i knew of three bad names which shared a common parent, such as:

foo.example.com
bar.example.com
baz.example.com

i would generate the following RPZ content:

foo.example.com CNAME .
*.foo.example.com CNAME .
bar.example.com CNAME .
*.bar.example.com CNAME .
baz.example.com CNAME .
*.baz.example.com CNAME .

in no case would i connote irresponsibility to the parent, since it
might be an effective TLD and i would never want to block, e.g., *.com
or *.co.uk or *.co.nz or similar.

That's essentially the way the RRPZ file is generated, except that all explicitly blocked domains do NOT automatically merit a wild card, only those with explicitly blocked subdomains are reduced to a wild card. To be honest, I could generate the RRPZ file dramatically faster if I just automatically did a wild card for every blocked domain instead of checking if there are actually any explicitly blocked subdomains.

@ler762, you started this, what do you think? Should I just wild card all domains automatically and prune child domains that are already covered by a wild card? Like I said, I am loving that because it's faster for me to generate because there's less checks. But the question is does that satisfy the needs of all RPZ users? Many domains may not even have any subdomains, making wild cards useless but preventative nonetheless. Would these automatic wild cards just be excessive and blocking too much?

@vixie, please check the VRPZ files:

https://scripttiger.github.io/alts/

If we can all agree on what the best practice should be, I'll remove any competing formats.

Just to fully exhaust all possible versions of this, what about just the standard RPZ format with no subdomain pruning and just automatically adding a second wild card subdomain for every domain listed? Obviously this would increase file size, but I know from experience BIND can handle much larger files than that. Would this also still be usable? In the end it would have a lot of redundant entries, to be true, but would it still function for everyone? Right now the subdomain pruning itself is rather resource and time intensive due to the substring evaluations conducted on every entry when you add in that I am generating 16 different version of every available format, along with all the other data I generate not related to @StevenBlack. Processes that just do in-line text manipulations versus actual evaluations are obviously much faster, so if it functions and is as secure as possible, I would much rather go with that, even if it is not pretty and has a lot of redundancies.

@vixie, @ler762, @swapneelp, please check the BRPZ files:

https://scripttiger.github.io/alts/

Keeping in mind that security and functionality are my top priorities, not reducing redundancies only for the sake of reducing file sizes, do these files work for everyone? If so, I will remove the other RPZ versions and this will become the new "RPZ" format available with every release.

Being honest, we all know BIND can handle some massive zone file sizes and these are tiny compared to if you've ever worked with BIND in an enterprise environment. I have an open issue of my own related to this, https://github.com/ScriptTiger/Unified-Hosts-AutoUpdate/issues/25, and, as I said, I yield only to concerns of security or functionality, not file sizes.

I look forward to hearing everyone's thoughts on these topics!

I'd like it deduped but otherwise yes this is what i do and what i hope you do.

VRPZ it is! @ler762 won't have any complaints since it's blocking more. There will only be one RPZ link in the next update and the rest will be removed, so please update your links accordingly if you had anything configured to pull the discontinued demo formats.

Since this thread is out of scope anyway and related to DNS, I thought I'd also throw some questions I have out there to make sure a couple of my other formats are optimized. Please ignore my ignorance beforehand, I am actually a Windows user with experience in large enterprises using Windows products that are bound by support contracts and whatnot and I don't get to play with these other applications too often. That being said, I mostly just blindly generate data based on the specs people give me, but I would like to make sure a couple things are actually correct.

Right now my dnsmasq format uses the following format:

address=/zarget.com/0.0.0.0

Is this the best practice? I have seen some use the following:

address=/.zarget.com/0.0.0.0

Is the above simply to block all subdomains? Should both be used?

Also, my Unbound format currently uses the following:

local-zone: "zarget.com" redirect
local-data: "zarget.com A 0.0.0.0"

Would this block zarget.com and all subdomains automatically with just these entries or is more needed?

Right now RPZ is the first format I have implemented pruning on since the idea of wild cards has popped up several times now and certainly does seem more secure, albeit will assuredly have its own mess of complications and broken websites because of it (i.e. the apple.com example given above). But nonetheless I would like to go ahead with implementing the pruning off of redundant subdomains from all formats used by software that can wild card.

Just finalized everything with the new RPZ format and reworked the de-duping/pruning so you won't have issues with subdomains overriding anything, but apple.com will be broken...I'm assuming no real loss there.

@ler762 mentioned Privoxy, so I threw in a new format for that, as well.

@StevenBlack mentioned Little Snitch earlier, and I actually did visit the links he provided, but maybe somebody could just post a quick sample here just to be sure.

And still looking for feedback for my previous inquiries on dnsmasq and Unbound, if anyone could just throw out their comments on those real quick, that would be much appreciated.

I forgot to link it in the previous comment, but just in case, those formats can once again be found here:

https://scripttiger.github.io/alts/

@ScriptTiger issue comments you create are editable...

wrt the apple.com problem, this question was posed on the dns-firewalls mailing list, with the following answer:

DNS is a tree. The presence of bcbsks.com.102.112.2o7.net also causes com.102.112.2o7.net, 102.112.2o7.net and 112.2o7.net to exist, holding no data (these are called empty non-terminals). Their presence prevents expansion of the wildcard. If you want to block those names too, you will have to do so explicitly.

Kind regards,

Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

the implication for RPZ generators was then explained down-thread:

Actually, to be complete:

bcbsks.com.102.112.2o7.net CNAME rpz-passthru.
*.com.102.112.2o7.net CNAME .
com.102.112.2o7.net CNAME .
*.102.112.2o7.net CNAME .
102.112.2o7.net CNAME .
*.112.2o7.net CNAME .
112.2o7.net CNAME .
*.2o7.net CNAME .
2o7.net CNAME .

Yes, it's a pain, but that is how wildcards work. Someone should write a tool to manage them. :)

--
Bob Harold

this seems legit to me.

The new RPZ list won't have any subdomains and I didn't make any compromises with it. @ler762 originally had an issue because subdomains of 2o7.net were being explicitly blocked, but I have corrected this in the latest version so absolutely no subdomains of blocked domains appear and no domain appears more than once on the list.

However, we encounter problems such as the following:

;0.0.0.0 appleglobal.112.2o7.net #breaks apple.com
;0.0.0.0 applestoreus.112.2o7.net #breaks apple.com

Because *.2o7.net blocks all subdomains and apple.com calls for the above subdomains explicitly, apple.com no longer works. It's not so much a problem with RPZ or my lists, it's more a problem with apple.com requiring people to link to untrusted domains.

@ScriptTiger Thanks for putting in the efforts/work. 馃憤

A suggestion w.r.t the file blacklist.txt - The file contains ^M ( carriage-return character ). It's no big deal. At my end, a dos2unix simply does the job.

I have configured a rpz zone in BIND with blacklist.txt. Will update if I face any problems.

I just cURLed the list myself and it came out LF and no conversion was needed since cURL does that automatically, or at least the implementation of it that I am using does. What are you using to fetch your lists?

Right now all of the text files on the website are normalized to CRLF just to keep a running standard, since most of the users when I first started it were Windows users since they were the ones having most of the trouble and I was focused on improving accessibility for a wider range of folks.

I could easily normalize the RPZ files to LF if it's really an issue, I know Linux apps throw a fuss about that sometimes. For the most part Windows apps can read either LF or CRLF just fine, it's just visually when you throw it into some legacy text editors the line endings don't show up and everything gets squished together on one line, which has been an ongoing topic both in this repository and in mine from Windows users.

Like I said, it would be easy enough for me to normalize just those files, but breaking the standard could get confusing later on down the road in many respects since I myself use Windows due to occupational requirements, I generate the data on a Windows system using Windows scripts, and I would honestly just like to stay true to my original intent there to minimize confusion.

Evening @ScriptTiger in case you would like to provide the RPZ zones as *.txt format you have to ensure they end in LF (\n) as a lot of DNS servers in both Linux and windows will fail loading the files do to a mix of windows only (CRLF) vs *nix standard (LF)

So I suggest you run your list through either tr or dos2unix

Fine, fine. Since you all are ganging up on me about this, DONE! They are LF now. I let git take care of my line endings for me.

Just to finalize this thread, I have released a hosts file to RPZ converter for anyone running Windows that wants to convert their hosts files locally.

https://github.com/ScriptTiger/Hosts-Conversions

If you're wondering why it seems so slow, it's due to the fact that every subdomain must be broken into substrings for each higher domain and checked on an exact match basis in order to accurately identify it as a subdomain and remove it. Partial matching could speed things up considerably, but then it becomes less accurate when you consider many phishing domains stack domain names, such as XXX.com.YYY.com.

I have, however, made as many performance improvements as possible, such as skipping matching all together for second-level domains and automatically including them. This may become a problem if someone wants to block an entire top-level domain, but I have never seen anyone go that far before. And you can read the internal script remarks for more specifics if you're interested.

I would imagine if someone wanted to port this to a more efficient text-processing language, such as Perl, performance could be considerably improved again. However, I try to stick to the native Windows tools to lower the technical barriers that many people experience, so for now mine is just written as a standard batch file.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mitchellkrogza picture mitchellkrogza  路  55Comments

Tobias-B-Besemer picture Tobias-B-Besemer  路  32Comments

ScriptTiger picture ScriptTiger  路  20Comments

Tobaloidee picture Tobaloidee  路  36Comments

StevenBlack picture StevenBlack  路  27Comments