Hosts: Consider adding "The Quantum Alpha" ad list

Created on 23 Dec 2020  Â·  11Comments  Â·  Source: StevenBlack/hosts

Consider adding The Quantum Alpha ad list. That list is created using an AI web crawler, and it seems more extensive that any ad list I've seen so far.
On the down side, however, is that since this list created by AI, it could contain some false positives.

discussion ¯\_(ツ)¯

Most helpful comment

Thanks Dan @lightswitch05 great sleuthing, there.

Closing. Thanks for the suggestion Max @MAX10541 but I think I'll pass.

All 11 comments

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

I would like to know what 'AI' logic is happening here. It seems to me that this project is another large list-of-lists except without any credit to the original list creators:

https://blocklist-tools.developerdan.com/entries/search?q=block-test.developerdan.com

Hard to say for sure without a lot of history yet which lists are being used. They even have a list that claims to block YouTube ads, which I'm fairly confident isn't possible to do #1017

image

'EnergizedProtection' is a list-of-lists project, as well as the OISD list. I believe 'blackbook' is a legitimate source list as far as hosts lists go, but it does source from various services like URLhaus. Since that domain showed up in that list first, and then the other lists the following day - it makes sense that the "The Quantum Alpha" list is including one of those other lists as a source. Looking at the diff from blackbook and then the diff from The Quantum Alpha, you can see that the same domains were added (along with others). The one exception is bohler-edelstahl-at[dot]com - which was already in the list (perhaps directly from the URLhaus: Malicious URL blocklist project).

Again, without more history its hard to say for sure what lists are being used, but its pretty clear that this list is using other people's work without giving them credit, and violating their licenses. Which isn't really a surprise, no one just creates a 800,000 domain list in a matter of weeks. It appears the 'AI' in this case, is just a way to take credit for other people's hard work.

Oh. This is really sad to hear, considering that this project made it to the top page of HN today.
I would really hate it if this project was using other people's work without accrediting them.

Just for fun...

$ ./ghosts -c https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/raw/master/For%20hosts%20file/The_Quantum_Ad-List.txt
----------------------------------------
Base hosts file summary:
----------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 58,787
Bytes: 1.8 MB
----------------------------------------
----------------------------------------
Compared hosts file summary:
----------------------------------------
Location: https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/raw/master/For%20hosts%20file/The_Quantum_Ad-List.txt
Domains: 789,804
Bytes: 23 MB
----------------------------------------
Intersection: 58,751 domains

This is just stupid. We are most definitely not going to add this. Simple reason: there's no way anyone can curate 789,804 domains. We do things differently here. We review every diff from all our sources for each release.

There is no way adding 731k domains to our 58k list makes us a better list.

Here's more. Comparing TLD here, our list first (truncating at 50 tally) compared to the suggested list (truncated to 500 tally).

$ ./ghosts -tld 
----------------------------------------
Base hosts file summary:
----------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 58,787
Bytes: 1.8 MB
TLD tally:
   com: 30,483
   net: 7,116
   pl: 5,989
   eu: 1,204
   live: 1,068
   info: 843
   jp: 834
   org: 809
   vn: 802
   ru: 734
   xyz: 513
   de: 464
   io: 392
   uk: 358
   nl: 323
   cn: 310
   at: 301
   co: 297
   online: 295
   fr: 256
   site: 256
   biz: 229
   in: 199
   us: 194
   me: 172
   club: 171
   tv: 164
   mobi: 124
   tk: 124
   br: 123
   it: 119
   name: 114
   top: 112
   pro: 112
   cz: 105
   ca: 104
   cc: 98
   space: 84
   pw: 83
   be: 77
   icu: 73
   life: 71
   ro: 68
   ir: 66
   hu: 64
   asia: 62
   es: 58
   su: 56
   website: 54
   link: 53
   fun: 52


$ ./ghosts -tld -m https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/raw/master/For%20hosts%20file/The_Quantum_Ad-List.txt
----------------------------------------
Base hosts file summary:
----------------------------------------
Location: https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/raw/master/For%20hosts%20file/The_Quantum_Ad-List.txt
Domains: 789,804
Bytes: 23 MB
TLD tally:
   com: 356,741
   net: 63,535
   org: 37,830
   stream: 27,210
   ru: 19,258
   tk: 14,436
   pl: 13,898
   icu: 13,503
   info: 12,100
   top: 11,720
   br: 8,549
   de: 8,226
   xyz: 7,124
   cn: 7,119
   review: 7,009
   uk: 6,795
   bid: 6,554
   download: 6,144
   win: 5,776
   cc: 5,017
   us: 4,945
   in: 4,590
   club: 4,537
   io: 4,313
   pw: 4,226
   eu: 3,948
   fr: 3,825
   jp: 3,703
   au: 3,609
   nl: 3,598
   it: 3,411
   co: 3,300
   loan: 3,140
   site: 2,989
   biz: 2,930
   vn: 2,661
   gdn: 2,639
   ca: 2,552
   hu: 2,530
   online: 2,512
   live: 2,502
   ml: 2,479
   es: 2,048
   trade: 2,038
   me: 1,990
   pro: 1,981
   cf: 1,860
   ga: 1,848
   ua: 1,808
   ir: 1,774
   website: 1,733
   tv: 1,692
   ro: 1,689
   id: 1,683
   date: 1,679
   za: 1,528
   cz: 1,465
   ltd: 1,454
   cl: 1,397
   ch: 1,383
   at: 1,371
   se: 1,370
   be: 1,338
   pt: 1,304
   gq: 1,223
   space: 1,208
   mx: 1,200
   tech: 1,087
   tr: 1,078
   ar: 1,058
   dk: 1,042
   kr: 1,004
   mn: 912
   su: 753
   science: 750
   ws: 740
   life: 693
   gr: 673
   my: 634
   fun: 581
   xn--p1ai: 565
   sk: 541
   tw: 540
   il: 527
   pk: 524
   mobi: 504

I've always thought we are under-listing .cn and .ru TLD, and that's a clear weakness.

But 27k .stream domains, really? More than .cn and .ru combined? How is this legit?

Not buying it.

This is just a stupid list. Take coinbase for example. I'm not a fan of Coinbase, but let's see.

$ cat ~/temp/z.txt | grep coinbase

0.0.0.0 airdrop-coinbase.com
0.0.0.0 api.coinbase.com                <---------
0.0.0.0 api.exchange.coinbase.com
0.0.0.0 api.sandbox.coinbase.com
0.0.0.0 assets.coinbase.com
0.0.0.0 beta.coinbase.com
0.0.0.0 bittip.coinbase.com
0.0.0.0 buy.coinbase.com
0.0.0.0 coinbase-ca.com
0.0.0.0 coinbase-promo.info
0.0.0.0 coinbase-us1.info
0.0.0.0 coinbase.aa-gg.com
0.0.0.0 coinbase.com                          <-------
0.0.0.0 coinbase.com.eslogin.co
0.0.0.0 coinbase.gift
0.0.0.0 coinbaseboggether.tumblr.com
0.0.0.0 coinbasenews.co.uk
0.0.0.0 coinbasepro-giveaway.com
0.0.0.0 coinbasepromo.com
0.0.0.0 coinbasespromo.tumblr.com
0.0.0.0 coinbasewin.com
0.0.0.0 community.coinbase.com
0.0.0.0 custody.coinbase.com
0.0.0.0 developers.coinbase.com
0.0.0.0 docs.exchange.coinbase.com
0.0.0.0 eio-feed.exchange.coinbase.com
0.0.0.0 engineering.coinbase.com
0.0.0.0 ent-api.sandbox.coinbase.com
0.0.0.0 ex-notify.coinbase.com
0.0.0.0 exceptions.coinbase.com
0.0.0.0 exchange.coinbase.com
0.0.0.0 feed.exchange.coinbase.com
0.0.0.0 filetransfer.coinbase.com
0.0.0.0 fix.exchange.coinbase.com
0.0.0.0 icoinbase.com
0.0.0.0 images.coinbase.com
0.0.0.0 login.coinbase.com
0.0.0.0 promo-coinbase.com
0.0.0.0 public.sandbox.exchange.coinbase.com
0.0.0.0 sandbox.coinbase.com
0.0.0.0 sandbox.exchange.coinbase.com
0.0.0.0 staging.community.coinbase.com
0.0.0.0 status.coinbase.com
0.0.0.0 store.coinbase.com
0.0.0.0 support.coinbase.com
0.0.0.0 ws-feed.exchange.coinbase.com
0.0.0.0 ws.coinbase.com
0.0.0.0 ws.sandbox.coinbase.com
0.0.0.0 www.api.sandbox.coinbase.com
0.0.0.0 www.beta.coinbase.com
0.0.0.0 www.bittip.coinbase.com
0.0.0.0 www.blog.coinbase.com
0.0.0.0 www.coinbase-drop.com
0.0.0.0 www.coinbase-gift.com
0.0.0.0 www.coinbase.com             <---------
0.0.0.0 www.community.coinbase.com
0.0.0.0 www.custody.coinbase.com
0.0.0.0 www.engineering.coinbase.com
0.0.0.0 www.ent-api.sandbox.coinbase.com
0.0.0.0 www.ex-notify.coinbase.com
0.0.0.0 www.filetransfer.coinbase.com
0.0.0.0 www.login.coinbase.com
0.0.0.0 www.public.sandbox.exchange.coinbase.com
0.0.0.0 www.sandbox.coinbase.com
0.0.0.0 www.sandbox.exchange.coinbase.com
0.0.0.0 www.store.coinbase.com
0.0.0.0 www.ws.sandbox.coinbase.com

Fantastic point of reference. Those domains where in a single version of the 'CoinBlockerLists' - and a single version of your unified list. Somehow it got picked up by the poor quality 'Block List Project: Crypto' list. And now they are also showing up in this Quantum list. I think that makes the 'Block List Project' as a highly likely source for this Quantum list, but who knows really.

Source: https://blocklist-tools.developerdan.com/entries/search?q=www.login.coinbase.com

image

Thanks Dan @lightswitch05 great sleuthing, there.

Closing. Thanks for the suggestion Max @MAX10541 but I think I'll pass.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dcramer picture dcramer  Â·  26Comments

MattWeatherford picture MattWeatherford  Â·  25Comments

patrickdrd picture patrickdrd  Â·  29Comments

StevenBlack picture StevenBlack  Â·  27Comments

Tobias-B-Besemer picture Tobias-B-Besemer  Â·  32Comments