Pi-hole: whitelisted default sites confusion

Created on 16 Feb 2018  路  38Comments  路  Source: pi-hole/pi-hole

How familiar are you with the the source code relevant to this issue?:

{1}


Expected behaviour:

{whitelist listed sites are more granular, are entire domains being blocked? Subdomains? More details on default white listed sites (i see the link already present) }

Actual behaviour:

{searching for a site that I want to confirm is blacklisted is confusing because it shows up in lists in a block list that are also whitelisted by default }

I'm using the most up to date version of pihole. I saw that 6 domains were added to whitelist list on my recent update. However, I cant tell what domains should be there by default.

The list in the whitelist section is a big vague. For example, it lists s3.amazonaws.com ... does this mean anything coming from an amazon s3 bucket will be permitted?

I am attempting to see if localytics is a site that will be blocked, but when i query the site i see this:

pihole -q -all localytics.com
Match found in list.0.raw.githubusercontent.com.domains:
analytics.localytics.com
Match found in list.4.s3.amazonaws.com.domains:
localytics.com
Match found in list.6.hosts-file.net.domains:
analytics.localytics.com
localytics.com

All three of these lists are also on the default whitelists. So, is this site blocked, or isnt it?

whitelisted domains:
Hosts-files.net
mirror1.malwaredomains.com
githubusercontent.com
s3.amazonaws.com
sysctl.org
zeustracker.abuse.ch

Discussion Feature Request

All 38 comments

We automatically whitelist the domains that the blocklists use as a way to make sure that the lists are accessible even if a user accidentally blacklists the source domains. So the 6 domains that you see added are the sources for the default blocklists we use.

The whitelist wins out of a domains is on both white and block lists. I agree that the information is not very clear and we don't do a good job explaining the reason behind the madness, if you could, can you give us an idea of what the output should look like to make it more user friendly?

And we probably should just whitelist the exact domains used for blocklist pulling instead of the more general top domains.

right, well if i was to test this .. my first poke would be to host ads on an amazon s3 bucket... something within the s3.amazonaws.com subdomain because it looks like that entire subdomain is whitelisted.

As you said, it at least appears that you are whitelisting entire TLD's, or at least entire subdomains of them.

It would be better if you built the whitelists directly into the application so that you didnt have to whitelist them. Or, at least whitelist just the specific URL's of the lists you are pulling from and lock them (pad lock icon) which if you click it tells you they are

1) default
2) why they are there
3) the option to remove it with a warning

maybe instead of a padlock, make the default white listed domains a different color that sticks out.

Right now I cant tell if calls out to localytics is blocked. When i query the domain it appears in lists, but what list is it in? Is that the block list or whitelist? Querying a url should tell you that.

I completely agree, I can move this to a feature request on https://discourse.pi-hole.net so we can track it and vote on it, or if you'd like you can open a FR post at that location.

also, if when i query that URL, as shown earlier, its within three of the default whitelisted domains .. im assuming i cant blacklist it because whitelist overides that, or maybe the app prevents you from doing both.

What are my options here?

Remove the blocking lists that would fall under that domain, either from the web interface or from the actual text file, and then pihole -g. That should remove the domain from the whitelists.

any idea what directory those lists are pulled down to?

Everything should be in /etc/pihole

so this domain exists in preEventHorizon, any idea what this is used for?

sudo grep "localytics" *
gravity.list:localip analytics.localytics.com
gravity.list:localip localytics.com
list.0.raw.githubusercontent.com.domains:0.0.0.0 analytics.localytics.com
list.4.s3.amazonaws.com.domains:localytics.com
list.6.hosts-file.net.domains:127.0.0.1 analytics.localytics.com
list.6.hosts-file.net.domains:127.0.0.1 localytics.com
list.preEventHorizon:analytics.localytics.com
list.preEventHorizon:localytics.com
Binary file pihole-FTL.db matches
useruser@raspberrypi:/etc/pihole $

I'm going to jump in here because I think you may be over thinking this.

As Dan mentioned above, there are _some_ domains added to the whitelist by default, but this depends entirely on what source lists you have in /etc/pihole/adlists.list. While we do ship a default list, this is completely bypassable by the end user...

As for your output of pihole -q:

pihole -q -all localytics.com
Match found in list.0.raw.githubusercontent.com.domains:
analytics.localytics.com
Match found in list.4.s3.amazonaws.com.domains:
localytics.com
Match found in list.6.hosts-file.net.domains:
analytics.localytics.com
localytics.com

Nothing on here says that localytics.com is whitelisted, rather the Match found in... sections are pointing you at the local copy of the blocklist that the requested domain was found in. In this case, s3.amazonws.com is whitelisted to prevent, for example, the hosts-file.net list blocking the other list.

For clarity, whitelisting the domain that a source list is served from does not whitelist everything contained on said list :)

preEventHorizon is an amalgamation of all of the individually downloaded lists, de-duplicated, and sorted in preparation for the lists to finally be transformed into gravity.list. gravity.list is the most important list here, as it is the list that dnsmasq is pointed at as a hosts-file. Any domain in this list will be blocked.

Right now I cant tell if calls out to localytics is blocked.

nslookup localytics.com should return you Pi-hole's IP.

It would be better if you built the whitelists directly into the application so that you didnt have to whitelist them

This wont happen, I'm afraid. If we start hard coding domains to be whitelisted into the application, then there goes a lot of our credibility. What's to say we won't whitelist other domains for payment from ad companies? I mean, that would not happen in any case, but for transparency, we will never whitelist any domain by default.

sudo grep "localytics" *
gravity.list:localip analytics.localytics.com
gravity.list:localip localytics.com
list.0.raw.githubusercontent.com.domains:0.0.0.0 analytics.localytics.com
list.4.s3.amazonaws.com.domains:localytics.com
list.6.hosts-file.net.domains:127.0.0.1 analytics.localytics.com
list.6.hosts-file.net.domains:127.0.0.1 localytics.com
list.preEventHorizon:analytics.localytics.com
list.preEventHorizon:localytics.com
Binary file pihole-FTL.db matches
useruser@raspberrypi:/etc/pihole $

An explanation of those files:

|file|description|
|:---|:------------|
|gravity.list| The master blocklist. Anything contained here is blocked|
|list.0.raw.githubusercontent.com.domains| Local copy of a source list|
|list.4.s3.amazonaws.com.domains| Local copy of a source list|
|list.6.hosts-file.net.domains| Local copy of a source list|
|list.preEventHorizon| ALL local copies of source lists combined and de-duplicated|
|pihole-FTL.db| Essentially the log file, databased so that the website can serve stats|

For clarity, whitelisting the domain that a source list is served from does not whitelist everything contained on said list :)

If you have whitelisted s3.amazonaws.com, then does this whitelist every url coming from that subdomain? Say for example the following:

s3.amazon.aws.com/mybucket/evilads.jpg

I understand that you are whitelisting that domain because that is where the list originates from, why not just whitelist the exact domain instead of the entire subdomain?

why not just whitelist the exact domain instead of the entire subdomain?
We are whitelisting the exact domain :)

##Disconnect.me Tracking
https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt

##Disconnect.me Ads
https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt

The only part of those URLS that a DNS server can know feasibly know about is s3.amazonaws.com, it doesn't care if it's http or https, and it also doesn't care/know about anything after the first /

If you would rather not have s3.amazonaws.com on your whitelist, you have a couple of choices, re-host those lists on a domain you control (and therefore know that no ads could be served from it), and replace them in your adlists.list file (e.g https://yourcomain.com/lists.disconnect.me/simple_ad.txt, or download the list and store it on your local file system, and add it to your adlists.list file as file:///path/to/the/list.txt

@PromoFaux dope, yeah your right. Thanks for pointing that out. Thats a huge problem ... presumably something pihole cant address? Has this discussion come up before?

@PromoFaux the s3.amazonaws.com was added, i think, by default. I didnt add it manually.

@PromoFaux I know this means more work and load on the pihole folks, but is there anything wrong with these lists living on a domain pihole owns and controls that is only used for the lists?

There may be copyright/licensing issues involved with this

Yeah, it's added because it's hosting a list in your adlists.list. As mentioned above, we whitelist those domains to prevent them accidentally being blacklisted by either another list. But personally I think we should maybe just not whitelist them automatically and remove the "politics" of the situation all together. Ideally we would ship with zero default lists, leaving all responsibility of blocking down to the user, but this does provide a barrier for entry on those less experienced

@PromoFaux are you saying ideally the user should build a blacklist or adlist to block manually and not pull it from an external source (thus making you whitelist that source) ?

No no, there are many lists out there which any user can add to their Pi-hole (example, see @WaLLy3K 's collection of lists https://wally3k.github.io/)

The thing is, the default adlists.list file has always been just a suggestion. Users are free to disable/remove any list they choose. By having some in there by default, those that have never experienced using linux before have something that works out of the box.

Of course, if you want to build a blacklist to block manually, there is nothing from stopping you doing that either!

We're having some discussion internally about whether or not we should in fact whitelist the source lists. as stated above, @DL6ER has even coded up the change already(read: removed the section that does it!)

What I meant by the ideally part was that adlists.list comes blank by default. The barrier to entry is people installing and not understanding why they are showing 0 blocked domains on first install. But then, that is something we can work around with proper documentation

Is there anyway you could do a short-lived bypass, rather than just whitelisting the domain? e.g. requests made by the list updater are sent to the upstream DNS (unless explicitly blacklisted by the user?), but all future requests made to those domains are handled normally.

We've actually just gone ahead and removed the code that whitelists the source lists.

Thanks for the update. Can I add that, and please correct me if I鈥檓 wrong, I viewed this as a possible means to bypass the Adblocking feature and not merely a confusion due to rhetoric.

If all of s3 is whitelisted because the originating blacklist lives there - then I could host a site that pulls ads from an s3 bucket and it would pass through because of that whitelist.

I agree with @PromoFaux that not subscribing to any blacklists by default (and also not whitelisting any domains by default) keeps the situation more 'clean' and less political/opinionated. But the issue of new/not-so-savvy users starting with a Pi-hole that does zero blocking doesn't seem too great either.

If the default blacklist subscriptions are eventually removed, perhaps a good option would be to provide a choice as part of the install process. E.g. a multiple-choice screen where the user can choose to start with 'default blacklists' or 'no blacklists'. That way

  • there is a clear choice
  • starting with a basic set of blacklists is easy for new/beginner users
  • more knowledgeable users can choose to start from scratch

On the 'whitelisting blacklist location domains' issue, could this potentially be an option in the web UI (off by default)?

Appreciate that 'more choices and settings!' is the common go-to solution and easier said than done, but I think it would cover all use cases, while still catering for new users.

@IanOliver It's a WIP, but i started something along those lines a while back : https://github.com/pi-hole/pi-hole/pull/2015

Oh, and as of 3.3.1, we no longer whitelist the adlist sources by default: https://github.com/pi-hole/pi-hole/pull/1973

@PromoFaux That's great to hear - hope you can get it implemented!

@PromoFaux I'll be whitelisting my blacklist sources myself anyway, but didn't you have an issue a while back where certain blacklists were blocking other blacklist sources?

It's always going to be a possibility, but we had a lot of internal conversation about it and you pretty much hit the nail on the head with:

keeps the situation more 'clean' and less political/opinionated.

What could be possible is that in regard to downloading the blacklist's content (the hosts-file and stuff), the Pihole ignores it's own blacklists and whitelists. Hence you do not need to worry about whitelisting common domains like s3, and removes the maintenance here. Only consideration would be that to prevent misuse of a 'backdoor' to ignore the workings of the Pihole.

And in regard to default blacklists, one might offer the following options:
Level 1: Malicious domains (such as malware infected, phishing sites)
Level 2: L1 + Tracking and Privacy violating domains (such as Google Analytics)
Level 3: L2 + Ad serving domains

I moved to Level 1 entirely at the moment. Because more and more is getting blocked, which cripples CDNs and website to such an extent that it doesn't work anymore. And the whitelist was growing a bit out of control, and the trial-and-error whitelist/remove from whitelist was becoming a pain to get things working.

IMO, some hosts file are to stringent. I feel that nothing is inherently wrong with ads, there is something wrong with tracking.

Anyways, hope that my contributions with my comment can help. If not, no harm is done either :).

@PromoFaux care to elaborate how this works? Are the block lists obtained outside of the blocking system now?

@cardassian-tailor , nothing is different other than the host domains for the block lists downloaded are no longer whitelisted. If one list blacklists another, then it will be on the user to whitelist it, or contact the offending blacklist owner to ask them why.

@jacobsalmela No need, it's already implemented https://github.com/pi-hole/pi-hole/pull/1973

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Mukrosz picture Mukrosz  路  34Comments

emmtte picture emmtte  路  33Comments

asoliman picture asoliman  路  33Comments

nicolasvac picture nicolasvac  路  35Comments

stewx picture stewx  路  60Comments