uBlock 🚀 - Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions

Regarding issue https://issues.adblockplus.org/ticket/2278:

@kzar, @ameshkov

Being able to have a token for regex-based filters would definitely help performance. However trying to programmatically extract a token from a regex-based filter sounds scary to me, too much risk of extracting erroneous tokens.

Suggestion: create a new filter option, token=[...], which filter creators can use to assign a predefined token to the filter. The creator of a filter is best placed to figure if and what token will work to store the filter internally.

For example, this filter in EasyList:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script

Could simply have been written by a filter creator:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script,token=filenuke

gorhill on 23 Aug 2016

Hey guys! I was thinking about solving this issue a while ago. Even tried to implement a simple token-extracting algorithm. I will post my ideas a bit later though.

Meanwhile, here is a list of known regexp rules:

/^(?![a-z]+\:\/+([^\/\:]+\.(il\|com\|net)\|[\.0-9]+\|([^\/\:\.]+\.)*(spot\.im\|vine\.co\|periscope\.tv\|vid\.me\|mako\.tools\|minidom\.org\|jquerymin\.org\|logidea\.info\|zoomanalytics\.co\|firstimpression\.io))\.?([\/\:]\|$))^[^\/\:\.]+\:\/+[^\/\:\.]/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google\|icdn\|auto\|sport5\|smartair\|mysupermarket\|blms\|linicom)\.co\.il\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.il\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^[a-z]+\:\/+[\.0-9]+([\/\:]\|$)/$image,media,object,script,stylesheet,subdocument,third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(fbcdn\|cloudfront\|facebook\|akamaihd\|ctedgecdn\|2mdn\|uploaditnow\|edgesuite\|doubleclick\|dmcdn\|slideshare\|advsnx)\.net\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.net\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google\|facebook\|twitter\|instagram\|youtube\|jquery\|googleapis\|vicomi\|twimg\|cdninstagram\|pinterest\|pinimg\|giphy\|playbuzz\|outbrain\|ytimg\|amazonaws\|cloudflare\|gstatic\|sniperm\|dinovich\|shortaudition\|linkedin\|opinionstage\|vimeo\|vimeocdn\|dailymotion\|flickr\|staticflickr\|tumblr\|soundcloud\|scribd\|syteapi\|addthis\|addthisedge\|reddit\|disqus\|disquscdn\|apester\|qmerce\|taboola\|taboolasyndication\|google-analytics\|googletagservices\|googletagmanager\|googleadservices\|googlesyndication\|h-cdn\|scorecardresearch\|serving-sys\|bootstrapcdn\|tiviclick\|ruchlis\|hotjar\|flx1\|mxpnl\|themarker\|adnxs\|conduit\|fourtips\|makojs)\.com\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.com\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/quang%20cao/	ABPVN List	http://abpvn.com/
/YanAds/	ABPVN List	http://abpvn.com/
/www/images/	ABPVN List	http://abpvn.com/
/ads-pic/	Adblock-Persian list	http://ideone.com/K452p
/eshop-eca/	Adblock-Persian list	http://ideone.com/K452p
/eshop98/	Adblock-Persian list	http://ideone.com/K452p
/402x192/	Adblock-Persian list	http://ideone.com/K452p
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/NetInsight/text/$domain=~ads.pandora.tv\|~opt.mgoon.com	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/omniture/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/NetInsight/html/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/cgi-bin/conad.fcgi/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/acecounter/$domain=~acecounter.com	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/adNdsoft/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wisenut/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/ad-pay/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wp-content/plugins/google-analyticator/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/realclick/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/max-banner-ads-pro/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/RealMedia/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/bannerManager/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/autoPage/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/overture/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wiseAd/euckr/inc/$subdocument	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/NetInsight/js/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/scrap_logs/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/banner_event/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/images/adpresso/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/AdBanner/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/cdsbData_gal/bannerFile/$image,domain=mybogo.net\|zipbogo.net	List-KR	https://list-kr.github.io/
/nad/media/	List-KR	https://list-kr.github.io/
/ajrotator/	Filtros Nauscopicos	http://nauscopio.nireblog.com/cat/filtrado
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!rodzinnawycieczka)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)[a-zA-Z0-9]{10,}\.wp.pl\//	Adblock polskie reguły	http://certyficate.it/polski-filtr-adblock/
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!facet)(?!wyleczto)(?!kuchnia)(?!film)(?!moto)(?!gwiazdy)(?!teleshow)(?!finanse)(?!kobieta)(?!dom)(?!pogoda)(?!tech)(?!historia)(?!czat)(?!ksiazki)(?!gryonline)(?!hotele)(?!narty)(?!samoloty)(?!wycieczki)(?!hosting)(?!irlandia)(?!multikurs)(?!casino)(?!foto)(?!tech)(?!www)(?!stg)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!alefolwark)(?!angielski)(?!arenamody)(?!beniamin)(?!bon)(?!bsg)(?!casino)(?!diety)(?!dlaprasy)(?!dlugi)(?!doladowania)(?!dom)(?!dysk)(?!ebiznes)(?!ebooki)(?!empire)(?!fantasyliga)(?!film)(?!fundusze)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)(?!gazetki)(?!gry)(?!horoskop)(?!kalendarz)(?!katalog)(?!khanwars)(?!komiks)(?!konflikty)(?!kontakty)(?!korsarze)(?!kultura)(?!mini)(?!mmho)(?!mobilna)(?!morizon)(?!moto)(?!muzyka)(?!narty)(?!naryby)(?!onas)(?!orangekinoletnie)(?!piraci)(?!poczta)(?!pomoc)(?!praca)(?!profil)(?!programtv)(?!pytamy)(?!rekrutacja)(?!rss)(?!rtvagd)(?!rycerzeiksiezniczki)(?!smeet)(?!speedwaymanager)(?!szkola)(?!szukaj)(?!tech)(?!teleshow)(?!triviador)(?!turystyka)(?!twojeip)(?!ulubiency)(?!warodfragons)(?!wycieczki)(?!zdrowie)(?!zoomumba)(?!topnews)(?!erotyka)(?!dzieci)(?!fitness)(?!gielda)(?!finansomat)(?!biznes)(?!sport)[a-zA-Z0-9]{4,9}\.wp.pl\//	Adblock polskie reguły	http://certyficate.it/polski-filtr-adblock/
/commoncfm/images/microsoftxboxone/$domain=buffed.de\|gamesaktuell.de\|gamezone.de\|pcgames.de\|videogameszone.de	German filter	http://adguard.com/filters.html#german
/[a-z0-9]{32,}/$third-party,domain=picshare.ru	Russian filter	http://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$script,third-party,domain=bigtorrent.org\|bigtorrents.ru\|cashtube.ru\|cmexota.ru\|dreamprogs.net\|dsvload.net\|ecsebo.ru\|enotbox.com\|faspiic.ru\|imagefile.org\|imgpay.ru\|kordonivkakino.net\|mcdownloads.ru\|mega-pic.org\|odnopolchane.net\|payforpic.ru\|pic4cash.ru\|pic4you.ru\|picclick.ru\|picforall.ru\|pics-money.ru\|pirat-pic.ru\|planeta51.com\|pronpic.org\|prons.org\|q32.ru\|rustorrents.net\|santikov.net\|sharezones.biz\|torrent-pirat.com\|unionpeer.org\|uraltrack.net\|viewy.ru\|xhamster-pic.com	Russian filter	http://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z]+\.js/$domain=rustorka.com	Russian filter	http://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z0-9]+\.(jpg\|gif)/$image,domain=rustorka.com	Russian filter	http://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$domain=anime-free.net\|cyberpirate.me\|imgbum.net\|online-porno-hd.ru\|tecnomectrani.com	Russian filter	http://adguard.com/filters.html#russian
/[a-z0-9]{30,}/$script,third-party,domain=free-torrent.org\|free-torrents.org	Russian filter	http://adguard.com/filters.html#russian
/^http://[a-z0-9_]{15,}\.[a-z0-9-]+\.[a-z]{2,}\/.*[a-zA-Z0-9]{100,}/$object-subrequest,domain=wat.tv	Liste FR	http://adblock-listefr.com/
/^http://[a-z0-9_-]{10,}\.[a-z0-9-]+\.[a-z]{2,}\/.*?\w{30,}/$~xmlhttprequest,domain=gentside.com\|maxisciences.com\|ohmymag.com	Liste FR	http://adblock-listefr.com/
/content/stargate/$domain=hlamer.ru\|kadu.ru\|krasview.ru	RU AdList	https://code.google.com/p/ruadlist/
/output/index/$third-party,script	RU AdList	https://code.google.com/p/ruadlist/
/https?://(?!(mc\.yandex\.ru\|www\.google-analytics\.com)/)/$third-party,script,subdocument,domain=massivmebel.by	RU AdList	https://code.google.com/p/ruadlist/
/^https?://goodgame\.ru/[a-z0-9]+$/$subdocument,domain=goodgame.ru	RU AdList	https://code.google.com/p/ruadlist/
/wp-content/plugins/popup-maker/$domain=info-life.in.ua\|intermarium.com.ua\|paragraf.net.ua\|unn24.com.ua\|varota.com.ua	RU AdList	https://code.google.com/p/ruadlist/
/^https?://(?!static\.)([^.]+\.)+?fastpic\.ru[:/]/$script,domain=fastpic.ru	RU AdList	https://code.google.com/p/ruadlist/
/images/brandings/$image,domain=sc2tv.ru	RU AdList	https://code.google.com/p/ruadlist/
/default/vbanners/$domain=noi.md	RU AdList	https://code.google.com/p/ruadlist/
/branding/$subdocument,domain=fanserials.tv\|kino-filmi.net	RU AdList	https://code.google.com/p/ruadlist/
/serial_adv_files/$image,domain=xn--80aacbuczbw9a6a.xn--p1ai\|куражбамбей.рф	RU AdList	https://code.google.com/p/ruadlist/
/^https?://(?!www\.)([^.]+\.)+?(kordonivkakino\.net\|m(ac-torrent-download\.net\|oviki\.ru))[:/]/$script	RU AdList	https://code.google.com/p/ruadlist/
/popupclick/$popup	RU AdList	https://code.google.com/p/ruadlist/
/http://[a-zA-Z0-9]+\.[a-z]+\/.(?:[!"#$%&'()+,:;<=>?@/\^_`{\|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com\|redtube.com\|tube8.com\|tube8.es\|tube8.fr\|www.pornhub.com\|youporn.com	EasyList	https://easylist.github.io/
/\/[0-9].\-.\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com\|keezmovies.com\|spankwire.com\|tube8.com\|tube8.es\|tube8.fr	EasyList	https://easylist.github.io/
/\.sharesix\.com/.*[a-zA-Z0-9]{4}/$script	EasyList	https://easylist.github.io/
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script	EasyList	https://easylist.github.io/
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn	EasyList China	http://abpchina.org/forum/
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com	EasyList China	http://abpchina.org/forum/
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu	EasyList China	http://abpchina.org/forum/
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com	EasyList China	http://abpchina.org/forum/
/ulightbox/$domain=hdkinomax.com\|tvfru.net	RU AdList: BitBlock	https://code.google.com/p/ruadlist/
/http://cdn[0-9]\.spiegel\.de/images/image-([^-]+)-[^-]+-[^-]+-(?!\1)[^-]+\.jpg/$image,domain=spiegel.de	EasyList Germany	https://easylist.github.io/

ameshkov on 23 Aug 2016

Please note the number of rules which are mistakenly made regexp-type.

ameshkov on 23 Aug 2016

@gorhill I've not been involved in that issue so far, so just done a quick bit of reading. I might get some things wrong.

While I agree that grabbing a keyword from the regexp seems scary, I'm not sure how the suggested token option would help. Take your filenuke example, there the automatic keyword would have been "filenuke" anyway.

Now if you think of a more advanced example which matches one of two possible domains, what would you put for the token option? If you chose to use parts of one of the domain as a keyword you'd end up not matching the other domain. Instead you'd have to omit the token option, which would end up as the same result as the automatic approach. (Since they mention that those kind of strings should be ignored.)

kzar on 23 Aug 2016

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

kzar on 23 Aug 2016

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

This would be an overkill

In order to do it they have restricted regular expressions support to a very limited subset.

ameshkov on 23 Aug 2016

Take your filenuke example

Yes, bad example. Here is another one found in EasyList:

/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.fr

Not sure if a token was available for this one -- whoever created the filter knows, but mainly my point is that token= option, would be an easy low-tech way available immediately (easy implementation) to deal with this, with no need for a regex parser (which would fail anyway with the filter here). If no token is present for untokenizable filter, then we just end up with the current behavior.

gorhill on 23 Aug 2016

Let's first think about what issue we are trying to solve.

First of all, domain-restricted filters are not a problem as there is no influence on the overall performance.

I suppose, that what we really need is to reduce the negative impact of the mistakes made by filters authors. For instance, the filters like /ajrotator/ and such. There is no problems with extracting a token from a rule like this.

Here is just a dirty example of a token extracting function:

var extractToken = function(ruleText) {

    // Get the regexp text
    var reText = ruleText.match(/\/(.*)\/(\$.*)?/)[1];

    var specialCharacter = "...";

    if (reText.indexOf('(?') >= 0 || reText.indexOf('(!?') >= 0) {
        // Do not mess with complex expressions which use lookahead
        return null;
    }

    // (Dirty) prepend specialCharacter for the following replace calls to work properly
    reText = specialCharacter + reText;

    // Strip all types of brackets
    reText = reText.replace(/[^\\]\(.*[^\\]\)/, specialCharacter);
    reText = reText.replace(/[^\\]\[.*[^\\]\]/, specialCharacter);
    reText = reText.replace(/[^\\]\{.*[^\\]\}/, specialCharacter); 

    // Strip some special characters
    reText = reText.replace(/[^\\]\\[a-zA-Z]/, specialCharacter); 

    // Split by special characters
    var parts = reText.split(/[\\^$*+?.()|[\]{}]/);
    var token = "";
    var iParts = parts.length;
    while (iParts--) {
        var part = parts[iParts];
        if (part.length > token.length) {
            token = part;
        }
    }

    return token;
};

I've tried this function with the rules above and here is the result:
https://ameshkov.github.io/web/regex-tokens.html?1

What for the token proposition, here are the downsides I see:

It does not solve the issue with regex filters created by mistake.
Complex rules which cannot be tokenized are rare. There are only 2 such filters in EasyList and both are domain-restricted.
No backward compatibility, filters with unknown options will be ignored by old versions. Also, for instance, getadblock guys aren't invited to our party so it could be a surprise for them.

ameshkov on 23 Aug 2016

getadblock guys aren't invited to our party

They are using ABP's filtering engine since AdBlock v3.0. See https://github.com/kzar/watchadblock/releases/tag/3.0.

gorhill on 23 Aug 2016

The other points still stand though:)

ameshkov on 23 Aug 2016

I wasn't aware of the many erroneous regex filters, looks like this can be easily addressed with a trivial code for these cases.

Mainly it was just to throw an idea out there, since these untokenizable filters have always bothered me[1], and I knew there was an issue like this opened on ABP issue tracker -- so I just threw the idea out there to have an easy fix, worth only if actually used by filter list maintainers.

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

[1] I was looking to even skip testing for domain hit -- but this is an implementation-dependent detail I suppose
[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

gorhill on 24 Aug 2016

[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

By the way, I'd like to raise a question about the non-standard syntax.

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has(), :xpath(), :matches-css [1] and such.

The idea is really great and we will support some of these extended selectors as well (:has() and :contains() are currently in the beta testing stage, :matches-css() is coming).

However, there is one issue that bothers me. The syntax you use (pseudo-classes syntax) is not backward-compatible and it will break good old stylesheet-based ad blockers like Adguard and ABP.

/* browser will ignore the whole style due to the second selector */
#banner, #banner:has(.test) { display: none; }

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

Backward compatible synonym for :has(...) will be [-ext-has="..."]
Backward compatible synonym for :matches-css(...) will be [-ext-matches-css="..."]
Backward compatible synonym for :xpath(...) will be [-ext-xpath="..."]

[1] As I understand, there is a backward compatible :matches-css() option already: https://issues.adblockplus.org/ticket/2390

ameshkov on 24 Aug 2016

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has() ...

FWIW We are working towards adding the :has selectors too https://issues.adblockplus.org/ticket/3143

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

:+1: Please do, I think collaboration benefits us all.

kzar on 24 Aug 2016

@kzar so, what do you think about the backward compatible syntax proposition?

ameshkov on 24 Aug 2016

@kzar regarding Lain's comment:

I think it's worth mentioning that :has() selector must work in combination with -abp-properties. So, filter like site.name##.block:has([-abp-properties="background: yellow"])

Using proposed syntax it could look like this:
##.block[-ext-has="*:matches-css(background: yellow)"]

ameshkov on 24 Aug 2016

@ameshkov Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway. We only need to implement special logic for those filters in the mean time as a stop-gap. I guess it's true (and unfortunate) that the syntax will break filters for ad blockers which haven't added support for now, but I guess that's not too bad since uBlock, AdGuard and Adblock Plus all plan to support them. (Also because they are only planned to be something used as a last resort.)

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

kzar on 24 Aug 2016

Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway.

True. However, here is one more argument for that type of syntax. We all support a lot of different browsers (including mobile and such) and trying to use pseudo-classes syntax requires us to do it simultaneously for all the platforms. While backward-compatible syntax allows us to roll this feature out gradually.

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

Yeah, I know, that's why I was surprised by the implementation proposed in the issue 3143.

ameshkov on 24 Aug 2016

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

I will support the backward-compatible syntax where possible, but personally, internally I prefer using the :() syntax. I see these new operators as nodes in a processing graph, and thus being able to easily and freely combine them I see this as a requirement for the future. Example[1]:

div.red:has(div.blue:matches-css(position: fixed;):contains(allo)):contains(publicité)

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

div.red[-ext-has="div.blue[-ext-matches-css=\"position: fixed;\"][-ext-contains=\"allo\"]"][-ext-contains="publicité"]

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

[1] Ok, the example is contrived, but it's just to illustrate easily combining such filters.

gorhill on 24 Aug 2016

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

Yeah, frankly, when I check something, I prefer to use the newer syntax as well.

However, it's not that bad, there's no need to support it inside of a composite filter.

Here, look at this example:

div.red[-ext-has="div.blue:matches-css(position: fixed):contains(allo):contains(publicité)"]

ameshkov on 24 Aug 2016

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

Nope, in fact it was all of a sudden for us:) Also there's no way we could do it in desktop and mobile versions.

ameshkov on 24 Aug 2016

@gorhill one more thing regarding the :matches-css(). I propose using a bit different syntax for it.

Could you please read this issue description and tell me what you think about it?
https://github.com/AdguardTeam/ExtendedCss/issues/7

ameshkov on 24 Aug 2016

Q: Why additional pseudo-classes for matching before and after

I already support selector:after:style-properties(pattern), I just extract the :after before using the selector at setup time. But I would not mind selector:style-properties-before(pattern) -- it would just make the setup code a bit simpler.

Q: Why pattern-matching?

I agree with (optional) pattern matching. Pattern-matching is not something I implemented, but I don't see a problem supporting this. For the implementation side of such filter however, I would just want to be sure its semantic does not force a very specific implementation.[1]

I suppose that using this approach we could also cover existing abp-properties rules

Note that ABP's -abp-properties has been implemented with a very different semantic in mind than something like :matches-css: to reverse lookup CSS rules. Such filters shouldn't be used directly on a set of nodes for filtering purpose. The purpose of all the filters I have been adding lately are to reduce a set of nodes (starting with one as small as possible), so the suffix part is key, to start with the smallest set of nodes possible is key for performance.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'], given that it has no suffix selector, would have to be tested for all elements on a page, which would just kill performance.

[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

gorhill on 24 Aug 2016

I already support selector:after:style-properties(pattern)

It may look pretty good, but it bothers me that :after in fact can't be part of a valid selector as pseudo-element cannot be selected. I suppose it could mislead a filter author.

[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

Yep, I've run into a number of issues while implementing it. For now I've used a cross-browser function for extracting the cssText string:
https://github.com/AdguardTeam/ExtendedCss/blob/feature/issues/7/lib/style-property-matcher.js#L96

Also I agree with you on the enumerated properties approach. There's no need in building the cssText field, I will change the current implementation.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'],

Yeah, you're right. Also now when I know how this type of rules work, I find it a bit misleading. At least I think Lain_13 does not understand how it works.

@kzar what do you think about implementing something more "straightforward"?

ameshkov on 24 Aug 2016

I guess if we use the properties approach and agree on *-before/after postfix, there is no need for me to use another name for that pseudo class. matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

ameshkov on 24 Aug 2016

matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

I agreed with this. This new selector, combinable with :has() is going make filter list maintainers' life easier.

gorhill on 24 Aug 2016

I've updated the syntax description:
https://github.com/AdguardTeam/ExtendedCss/issues/7

ameshkov on 24 Aug 2016

👍1

Looking into this specific case this morning: https://github.com/uBlockOrigin/uAssets/issues/110.

This would be solvable without exception filters if it was possible to outright remove the targeted nodes from the DOM:

finanzen.net###bodyCenter > div[id]:has(:scope > #Ads_BA_Sky):remove()

The current implicit action to take on targeted nodes is to hide them. However, being to re-style has make the job of working against anti-blocker mechanisms much easier (AdGuard support this).

Additionally, being able to remove nodes from the DOM is something I have found would take care of many other cases as well (I do believe AdGuard support this in some ways, not sure). From my point of view, being forced to whitelist network requests from 3rd-party advertisers/trackers is always the worst option, and we should extend the capabilities of cosmetic filtering (element hiding) to avoid such whitelisting.

gorhill on 26 Aug 2016

Oh, you have finally faced these german wunderwaffe-anti-adblock-solutions:)
I was impressed when I saw this particular script for the first time.

Currently the easiest way to circumvent it is to inject a script like this:

Object.defineProperty(window, `UABPtracked`, { get: function() { return true; }, set: function() {} })

ameshkov on 26 Aug 2016

Regarding the DOM nodes removal thing, I need some time to think about it.

ameshkov on 26 Aug 2016

Currently the easiest way to circumvent it is to inject a script like this

I didn't realize they were using the uabp thing, I already had a scriptlet to take care of these -- it was not injected on that site.

Though in the long term, scriplets require more work and maintenance, and I would rather use generic cosmetic filter syntax where possible. In the current case, a node removal would work. It would also work for that case (edit: never mind, would not work for this case). Anyway, something to think about.

gorhill on 26 Aug 2016

In the current case, a node removal would work

However, in this particular case node removal is not the best solution. This anti-adblock script is pretty ugly, it sets up a timer and redraws ads every 5 or so seconds. And with nodes removed it continues to do something with DOM.

Talking about anti-adblock scripts, I really do not see a good declarative solution which does not involve scripting.

ameshkov on 29 Aug 2016

Let's start with analysis. Most of the things we discuss are directly caused by the websites trying to circumvent ad blocking.

Basically, there are two approaches:

Make ad layout random or looking exactly as content layout.
Detect an ad blocker and show some warning or even redirect user to a blocking page.

Point 1 can be solved by the new pseudo-elements (at least for now).
Point 2 can be solved by scripts (like reek's AAK for instance).

Btw, reek is the best anti-adblock scripts expert I know, let's ask his opinion.

ameshkov on 29 Aug 2016

@gorhill @ameshkov We are discussing WebSocket circumvention on the Adblock Plus issue tracker, but unfortunately we've had to make the issue confidential. (Guess why...) Anyway I'd like to copy you both in on the issue, as mapx pointed out it would be good to get your feedback there too.

Are you guys signed up on our issue tracker? If so what are your usernames?

kzar on 15 Sep 2016

@gorhill Also a possibly dumb question, doesn't a Content Security Policy like connect-src http:; frame-src http: also prevent https connections?

kzar on 15 Sep 2016

doesn't a Content Security Policy like connect-src http:; frame-src http: also prevent https connections?

Not according to spec:

The URL matching algorithm now treats insecure schemes and ports as matching their secure variants. That is, the source expression http://example.com:80 will match both http://example.com:80 and https://example.com:443.

gorhill on 15 Sep 2016

Guess why...

They will see it anyway:)

Are you guys signed up on our issue tracker? If so what are your usernames?

Just signed up, username is ameshkov

ameshkov on 15 Sep 2016

@gorhill Ahh, makes sense.
@ameshkov Cool, added you to the issue.

kzar on 15 Sep 2016

@gorhill, @ameshkov Heads up, we're going to consider WebSocket requests as the type "websocket" instead of "other" in the future. More details in this blog post: https://adblockplus.org/development-builds/new-filter-type-option-for-websockets

kzar on 21 Sep 2016

@kzar hey Dave, thanks for the heads up.

removed. much confidential, very secret.

ameshkov on 21 Sep 2016

@gorhill @kzar
Btw, have you already seen the bleeding edge technology: loading ads code through RTCPeerConnection?

ameshkov on 21 Sep 2016

have you already seen the bleeding edge technology

Yes, first time I saw it on Merriam-Webster's site.

gorhill on 21 Sep 2016

Any idea besides wrapping RTCPeerConnection?

ameshkov on 21 Sep 2016

So far, no -- aside giving users the option of disabling entirely WebRTC.

gorhill on 21 Sep 2016

@ameshkov No, I did not realise people already started abusing WebRTC. Man. :-1:

kzar on 21 Sep 2016

Do you guys have an URL for an example of a website using WebRTC for circumvention that I can take a look at?

kzar on 21 Sep 2016

Actually would you mind removing that comment here?

Done;)

So far, no -- aside giving users the option of disabling entirely WebRTC.

Does it really work in Chrome? I thought it is a bit limited.

ameshkov on 21 Sep 2016

Do you guys have an URL for an example of a website using WebRTC for circumvention that I can take a look at?

Code example:
https://forum.adguard.com/index.php?threads/block-rtcpeerconnection.13808/#post-102128

ameshkov on 21 Sep 2016

I'd rather discuss our WebSocket plans in the issue on our tracker, since it's marked confidential

I understand not discussing ideas of workarounds for our own blocking solutions, but here I don't see the point, the websocket issue came about because it's already used out there.

gorhill on 21 Sep 2016

@ameshkov Thanks!

kzar on 21 Sep 2016

@gorhill There's a new issue I'd like to involve you with but can't unless you have a user on our issue tracker. Mind creating one?

kzar on 19 Dec 2016

Is this about the /g00 thing?

gorhill on 19 Dec 2016

Yup

ameshkov on 19 Dec 2016

Any idea on how do they detect dev tools?

ameshkov on 19 Dec 2016

Yup

Again in this case, I don't see the point of secrecy, the /g00 stuff is being currently used in production, so it's not like we are trying to prevent a work around blockers, they are already being worked around. That HTML/CSS/JS code is all open to scrutiny by anybody, so there is no privileged information to protect really.

Any idea on how do they detect dev tools?

Yes: https://www.reddit.com/r/firefox/comments/5gtedd/ublock_origin_developer_raymond_hill_on/dav4iiu/

gorhill on 19 Dec 2016

Thumbs up, nice technique!

ameshkov on 19 Dec 2016

Proposed network filter syntax extension: **[...]{...} -- two-asterisk syntax.

The ** sequence tells the filter parser that a regex-valid character class follows -- and optionally a regex-valid quantifier.
The [...] part would be a regex-valid character class specifier.
The {...} is optional. If present, passed as is to the regex constructor, i.e. it would be a regex-valid quantifier. If not present, the "zero or more" * quantifier will be used.

What it solves: better matching accuracy without having to resort to inefficient regex-based filter -- thus no issue with tokenizing the filter. Benefits of both plain filter syntax and regex-based syntax without their liabilities.

Example, an abused exception filter found in EasyList:

@@||nowdownload.*/banner.php?$script,domain=~calcalist.co.il|~gaytube.com|~mako.co.il|~pornhub.com|~redtube.com|~redtube.com.br|~tube8.com|~tube8.es|~tube8.fr|~walla.co.il|~xtube.com|~ynet.co.il|~youjizz.com|~youporn.com|~youporngay.com

With the new syntax, it can't be abused anymore (showing more than one variation to highlight flexibility):

@@||nowdownload.**[a-z]/banner.php?$script
@@||nowdownload.**[a-z]{2}/banner.php?$script
@@||nowdownload.**[a-z]{2,3}/banner.php?$script
@@||nowdownload.**[^/?#]/banner.php?$script

This is a real case example, and currently the proposed solution means there would no longer be a need to create an exclusion list for where the filter should not apply (pornhub et al. have been abusing it, along other such similar filters).

Efficiency: the more specific a regex the more efficiently it executes. The * syntax is commonly used and it means "matches anything in any number". The ability for filter list maintainer to be more accurate in describing what is to be expected can lead to more efficient regexes internally.

For example:

/site=*/size=*/viewid=

Let's say that the * were supposed to match some random sequence of digits. Of course whoever created the filter was not going to use a regex for such filter -- because they are rightfully frowned upon. On the other hand, the matching-anything-in-any-number semantic of the asterisk means that an inefficient regex must be used internally, one that will scan the whole URL to match. With the proposed syntax:

/site=**[\d]/size=**[\d]/viewid=

This is a much more efficient filter, as the regex execution will bail out of matching as soon as no digit is found at the placeholder locations. This gives the filter list maintainers the ability to be more accurate in describing what the filter should match, without resorting to full blown regex-based filters.

A nice-to-have side effect for filter list maintainers: ability to specify a sequence of character with no instances of specific characters in it, i.e. the [^...] regex syntax.

A filter parser would just need to have a special code path in case a filter is found to contains an instance of double asterisk, to extract and validate the sequence **[...]{...} and transpose into the proper regex equivalent sequence, or if it does not validate, just fall back on the normal single asterisk semantic for the sequence.

This does not mean all instances of * would need to be replaced, it's merely a new syntax which would become available to filter list maintainers to make their work easier/simpler.

gorhill on 20 Jan 2017

Sorry for the late reply, I've just got a free minute to think about it in silence:)

ameshkov on 4 Feb 2017

@gorhill you know, it looks very much like your token suggestion.

Comparing with the regular regexp rules:

Filter author still needs to know regexp syntax;
Easier to extract a token;
It is not as flexible, as regexp;

As I see it, the problem is that filters maintainers generally don't care about performance. One more regex rule would do no harm, so they will continue to use them.

Frankly, I suppose we should make engines smarter instead of providing more and more syntax sugar to maintainers.

Examples:

Regexp rules: we can extract tokens automatically. I mean really, it works for all the simple expressions, so why not?
||nowdownload.*/ - we can detect, that nowdownload.* is the domain name, so we can compile a more effective regexp.

Regarding point 1, we did implement it in the latest version. I guess we need some time to see how it goes and is there any problem with token extraction.

ameshkov on 4 Feb 2017

We're supporting the $webrtc filter option / request type now as well, here's the blog post and here's the implementation. We also do this by wrapping RTCPeerConnection but our implementations have some differences, most notably that instances without any URLs are not listed. Sorry I meant to post here earlier but forgot!

Edit: Oh, and I also filed Chromium issue 707683 asking for proper support to be added for the blocking of WebRTC connections.

kzar on 14 Apr 2017

👍2

@kzar Thanks for the head up. I do follow the issue tracker of ABP, so I was aware of this. I have been thinking of I how would implement this in uBO, but I will definitely avoid using a wrapper in uBO on my side, at least an unconditional one -- I consider this too risky, and in the event it causes an unforeseen issue, a user would be forced to disable the whole extension itself.

There is such a wrapper for uBO-Extra, and issues have been reported, see https://github.com/gorhill/uBO-Extra/issues -- you may want to use these as test cases. Having to disable a small companion extension is much less worst than having to disable the whole blocker.

Regarding https://bugs.chromium.org/p/chromium/issues/detail?id=707683, another approach is to have a content security policy directive for WevRTC connections, currently there is nothing to prevent these, a rather big hole in the CSP standard. See https://github.com/w3c/webappsec-csp/issues/92.

gorhill on 17 Apr 2017

On the other hand, there's not much can be done about WebRTC circumvention. Either wrap it or use the scripts injections approach which ABP does not support.

but I will definitely avoid using a wrapper in uBO on my side, at least an unconditional one

Actually, there is a way to override WS/WebRTC "conditionally". We've found a way to execute dynamic script injections before the pages' code. Not a 100% guarantee yet, though, but the first tests show that it works.

Wait a bit, I'll link you an example.

ameshkov on 17 Apr 2017

Here you are:
https://github.com/AdguardTeam/AdguardBrowserExtension/blob/a6a5db617fe410893c88babab2409f28fbaac47c/Extension/lib/webrequest.js#L313

The "fastest" way to perform a dynamic script injection is to use the onCommitted listener. Once it fires, send a message with the scripts directly to the frame and handle it in the content script. It appears, that the message is received by the content script before the page scripts are executed.

Problems:

As I've said, there is no guarantee, that this behavior will persist in the future Chromium versions.
Edge does not support sending messages to a specific frame.
I am not yet sure about FF WebExtensions, we didn't test it yet.

ameshkov on 17 Apr 2017

@ameshkov That is an interesting idea. On my side I was planning to maybe use a cookie to conditionally execute such content script code:

If specific content script code must be executed on a given origin, inject special cookie in webRequest.onReceivedHeaders.
Content script tests for special cookie, and if it exists, execute whatever code the cookie value tells it to execute (say a bit vector where each bit correspond to some special block of content script code). Reset special cookie to remove it from cookie set.
Test and strip special cookie from outbound requests in webRequest.onBeforeSendHeaders -- prototype showed that despite the cookie cleared in content script, the cookie was still present in few of the outbound requests close in time.

Not pretty and potentially have its own edge issues -- aside the added overhead.

The problem with executeScript and insertCSS, is that they are not well defined, and main chrome API statement...

Unless the doc says otherwise, methods in the chrome.* APIs are asynchronous: they return immediately, without waiting for the operation to finish

... really get its the way of reliability for the current case.

This suggests that tabs.executeScript (and also tabs.insertCSS) cannot be reliably injected in a tab/frame. The fact that these methods accept a run_at option makes all this the more ambiguous -- what is the point of using document_start if there is no guarantee that the script or CSS will be injected before any CSS or script has be executed on the page?

From the documentation, one can even imagine that a script/CSS injected through these methods could potentially end up being injected in a completely different tab/frame than originally intended (what happens when there is a quick redirect?). I consider this a flaw in the API.

If the documentation could explicitly guarantee that your approach will always work (i.e. the script/CSS is injected in a synchronous manner and onCommitted can be blocking), that would be great -- it actually make sense given when webNavigation.onCommitted is fired:

Fired when a navigation is committed. The document (and the resources it refers to, such as images and subframes) might still be downloading, but at least part of the document has been received

But then again, the way this is phrased, it's as if onCommitted could be fired after things are farther than just right after the document was created ("might").

gorhill on 17 Apr 2017

@gorhill nice catch with a cookie, I like it! What's good is that this is a true cross-browser solution (taking into account the MS Edge limitations).

The problem with executeScript and insertCSS, is that they are not well defined, and main chrome API statement...

As I understand, tabs.executeScript is out of the question anyway, as it executes a content script, not an in-page script: https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/executeScript

edit: ignore it, obviously the content script can inject the in-page wrapper.

If the documentation could explicitly guarantee that your approach will always work (i.e. the script/CSS is injected in a synchronous manner and onCommitted can be blocking)

I don't think it can, we'll have to keep an eye on its behavior, and that's the problem indeed.

@gorhill @kzar
Whichever approach is used, I suppose we need a common rules syntax for disabling this kind of wrappers.

I suggest introducing this type of rules:
@@*$websocket,domain=example.org -- to disable websocket wrapper
@@*$webrtc,domain=example.org -- to disable RTC wrapper

Thoughts?

ameshkov on 17 Apr 2017

The idea of conditionally running content scripts is certainly interesting, and not something I considered. Copying in @snoack since he might be interested in your ideas there too.

I think in the case of these wrappers however we'll stick with executing them consistently. IMO it's better to "just" get the wrapper right in the first place, and multiple code paths make debugging harder. It's good for you guys I suppose, after the next Adblock Plus release lands we'll find out the hard way if there are any problems, if not you can use the code too :stuck_out_tongue:.

kzar on 20 Apr 2017

Guys, I've just stumbled upon a new circumvention practice.

I guess @kzar is aware of it, not sure about @gorhill:

if (window.document) {
                    if (window.adonisContext)
                        return window.adonisContext;
                    var e, n = document.createElement("iframe");
                    return n.src = "https://nop.xpanama.net/if.html?adflag=1&cb=" + i(),
                    n.setAttribute("style", "display: none;"),
                    document.body.appendChild(n),
                    e = n.contentWindow,
                    n.contentWindow.stop(),
                    window.adonisContext = e,
                    e
                }
                return window

It seems that n.contentWindow.stop(), prevents the content script from doing its job. Also, with a real URL in the src they are able to bypass CSP restrictions (http:). I guess the straightforward solution would be to override document.createElement and document.__proto__.createElement

ameshkov on 12 May 2017

See https://github.com/uBlockOrigin/uAssets/issues/190#issuecomment-300897354.

My underfstanding , they can call n.contentWindow.stop() because at that point the iframe is about:blank (this is what chrome://webrtc-internals/ shows), which is treated in a special way by CSP engine (it always inherit embedding document's CSP). I realized yesterday it was not obvious how to disallow about:blank specifically without disallowing too much. However, in the end, frame-src http://*/* https://*/* worked (because there is no path in about:blank).

gorhill on 12 May 2017

I start thinking we need some kind of a $csp modifier able to set custom content security policy for an URL.

For instance:
||example.org^$csp="frame-src self"

It's crucial, though, that it should only strengthen existing policy. Which is easy to achieve if we'd add an additional CSP header.

Thoughts?

ameshkov on 12 May 2017

For instance:
||example.org^$csp="frame-src self"

This is pretty much what I implemented on my side (csp=..., not committed yet), except without the quotes (no need). This allows to inject whatever CSP policy we want, while never ever relaxing existing ones (when appended using ,).

gorhill on 12 May 2017

This is pretty much what I implemented on my side (not committed yet), except without the quotes (no need)

We'd like to support it on our side as well, let's have a common syntax for this kind of rules then.

Why no quotes btw? It'll make it much harder to parse a rule with more than one modifier. For instance, subdocument and third-party might be used along with this one (in theory, but anyway).

ameshkov on 12 May 2017

Ok, it seems that I am wrong, CSP instructions cannot contain a comma, so it should be relatively easy to parse something like ||example.org^$csp=policy,subdocument,domain=example.com

Are you planning to support additional modifiers in this type of rules?

ameshkov on 12 May 2017

CSP instructions cannot contain a comma

Yes, that's the reason I chose to leave out quotes. The comma can be used to _combine_ CSP policies, but I rather never have them used in filter options -- fits nicely with existing use of comma to separate filter options.

The nice thing with comma to separate distinct sets of policies when combined with existing CSP policies is that it won't cause spurious CSP reports, each comma-separated set gets its own report policy.

Currently all the following modifiers are supported when used with csp=: third-party, domain=, important, badfilter.

Additionally, exception filters for csp= can be crafted two ways:

Must be exact csp= match, i.e. @@||example.com/nice$csp=frame-src 'none' will cancel _only_ whatever filter tries to inject _exactly_ a csp=frame-src 'none' filter, but not a csp=frame-src 'self' filter; OR
@@...$csp will cancel all CSP injection for URLs which match the filter.

All this required refactoring on my side, as the semantic for csp= filters is that _all_ matching filters must be found (and furthermore applied according to important and @@), while normal filters only the first hit is returned.

gorhill on 12 May 2017

The nice thing with comma to separate distinct sets of policies when combined with existing CSP policies is that it won't cause spurious CSP reports, each comma-separated set gets its own report policy.

Isn't it the same in the case of an additional CSP header?

ameshkov on 12 May 2017

@gorhill overall, I love the idea and the features you want it to have.

I've come up with a formal description of the csp modifier:
https://github.com/AdguardTeam/AdguardBrowserExtension/issues/685

Could you please check are we on the same page?

ameshkov on 12 May 2017

Isn't it the same in the case of an additional CSP header?

I guess it is the same, however I prefer to append to existing CSP header because of this passage in documentation:

A server MUST NOT send more than one HTTP header field named Content-Security-Policy with a given resource representation.

A server MAY send different Content-Security-Policy header field values with different representations of the same resource or with different resources.

I have a bit of a problem parsing the meaning of this, so I went with what I thought was the safest approach, which is to append and use comma as separator.

gorhill on 12 May 2017

The $csp filter option is an interesting idea, I've opened issue 5241 to start a discussion with Wladimir and Sebastian about adding it to Adblock Plus as well.

kzar on 12 May 2017

@kzar 👍

ameshkov on 12 May 2017

I guess it is the same, however I prefer to append to existing CSP header because of this passage in documentation:

Huh, I wonder what this resource representation thing means. For instance, dropbox.com sends two CSP headers:

                              content-security-policy: script-src 'unsafe-eval' https://www.dropbox.com/static/compiled/js/ https://www.dropbox.com/static/javascript/ https://www.dropbox.com/static/api/ https://cfl.dropboxstatic.com/static/compiled/js/ https://www.dropboxstatic.com/static/compiled/js/ https://cfl.dropboxstatic.com/static/previews/ https://www.dropboxstatic.com/static/previews/ https://cfl.dropboxstatic.com/static/javascript/ https://www.dropboxstatic.com/static/javascript/ https://cfl.dropboxstatic.com/static/api/ https://www.dropboxstatic.com/static/api/ 'unsafe-inline' 'nonce-bKDizX/Kbtm0495WF9jC' ; default-src 'none' ; worker-src https://www.dropbox.com/static/serviceworker/ blob: ; style-src https://* 'unsafe-inline' 'unsafe-eval' ; connect-src https://* ws://127.0.0.1:*/ws ; child-src https://www.dropbox.com/static/serviceworker/ blob: ; form-action 'self' https://dl-web.dropbox.com/ https://photos.dropbox.com/ https://accounts.google.com/ https://api.login.yahoo.com/ https://login.yahoo.com/ ; base-uri 'self' api-stream.dropbox.com showbox-tr.dropbox.com ; img-src https://* data: blob: ; frame-src https://* carousel://* dbapi-6://* dbapi-7://* dbapi-8://* itms-apps://* itms-appss://* ; object-src https://cfl.dropboxstatic.com/static/ https://www.dropboxstatic.com/static/ 'self' https://flash.dropboxstatic.com https://swf.dropboxstatic.com https://dbxlocal.dropboxstatic.com ; media-src https://* blob: ; font-src https://* data:
                              content-security-policy: script-src 'nonce-bKDizX/Kbtm0495WF9jC' 'nonce-wTJ0bGY/hQQlxU0EVOpm' 'unsafe-eval' 'strict-dynamic'

edit: asked a question: https://github.com/w3c/webappsec-csp/issues/215

ameshkov on 12 May 2017

So, after all:

A server must not send multiple headers (yet in the draft "must not" is replaced with "should not")
Nevertheless, client must be ready to receive and apply multiple headers
Dropbox sends multiple headers. I suppose it proves that all the modern browsers are ready to handle this case.

ameshkov on 12 May 2017

@mapx-, @kzar

About https://issues.adblockplus.org/ticket/5291, what I don't see being done in the comment is:

Open the dev tools
Go to _Memory_ tab
Click the trash icon (I do 2-3 times, with a 2-second delay in between).

Yes, I have seen ABP grow to past 340 MB, and even leaving the browser on idle did not cause the memory to be garbage-collected. However, after accomplishing the above steps, there was memory being garbage-collected, and the memory snapped back to expected levels (keeping in mind fragmentation, js engine internals, etc.).

Any report of high memory usage should always be done _after_ the steps above, including the equivalent ones on Firefox.

gorhill on 8 Jun 2017

Thanks that's a good tip, I will give it a try. It's just weird (to me at least) that garbage collection is not happening automatically. We can hardly expect users to trigger it manually :(

kzar on 8 Jun 2017

Why does the chrome store say it is corrupted now ? :(

hemantgoyal on 21 Jun 2017

@hemantgoyal That is a Chrome hash function bug. See https://github.com/gorhill/uBlock/issues/2720 (already fixed with a bit of padding).

bershan2 on 21 Jun 2017

@mapx- re. https://issues.adblockplus.org/ticket/6002, see https://bugzilla.mozilla.org/show_bug.cgi?id=1415194.

gorhill on 8 Nov 2017

thanks:
https://issues.adblockplus.org/ticket/6002#comment:5

mapx- on 8 Nov 2017

Hey guys, I've recently stumbled upon an interesting adblocking circumvention technique used by Yandex.

The thing is that they use shadow DOM to circumvent element hiding rules:
https://uploads.adguard.com/up04_5pkb0_Yandeks.png

The open root is used in the screenshot so we can get inside with a /deep/, and when a closed root is used, /deep/ cannot help us. Anyway, all the shadow piercing selectors are deprecated and will be eventually removed so we have a problem here.

Possible solution (rather ugly though):

Support ::shadow pseudo-selector to pierce inside open roots.
Override attachShadow and force all shadow roots to be open.

Have you already faced anything like that? Thoughts?

ameshkov on 8 Nov 2017

@ameshkov
I added you here: https://issues.adblockplus.org/ticket/5318
and
https://issues.adblockplus.org/ticket/5302

perhaps it's about the same yandex stuff

mapx- on 8 Nov 2017

@mapx- thank you! Yeah, it's been a while since they began their crusade and both issues are relevant.

What bothers me is that the "closed shadow root" approach seems to be a universal way to avoid elements hiding and even user stylesheets won't help us defeat it once Chrome stops supporting /deep/ and ::shadow.

ameshkov on 8 Nov 2017

The /deep/ issue has been raised before. It's being deprecated as a valid CSS selector component in a CSS rule, but will still be valid as a CSS selector in querySelector[All] call. My understanding.

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

gorhill on 8 Nov 2017

@gorhill

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

That's basically what I meant -- support either shadow or /deep/ "polyfills" just like we do with :has.

Good news is that /deep/ is able to pierce inside closed shadow roots so my the second point in my comment is redundant.

ameshkov on 8 Nov 2017

just like we do with :has.

I just realized we can probably already just use :has for filter with /deep/?

example.com##div.container:has(/deep/ .aq)

Would this work now?

gorhill on 9 Nov 2017

I just realized we can probably already just use :has for filter with /deep/?

We don't yet support it (but we definitely will), but this is a partial solution anyway.

For instance, in Yandex case they shadow contains legit elements as well so we need something like example.org##div /deep/ span:has(.banner)

ameshkov on 9 Nov 2017

@ameshkov

The "fastest" way to perform a dynamic script injection is to use the onCommitted listener

I'm experimenting with this and this works fine so far on both Chromium and Firefox. I see 10-20ms gain in how earlier the scriptlets are injected (using tabs.executeScript), though when I measure with the number of scripts already handled (document.scripts.length), I can't see gain so far for the little I have tested.

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

gorhill on 13 Nov 2017

@gorhill

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

As I recall, we compared both and didn't see any serious difference. Actually, in future updates, we'll migrate to tabs.insertCSS in light of the coming user-agent styles priority improvement.

ameshkov on 15 Nov 2017

@ameshkov regarding Shadow DOM, we have user style sheets in Chromium now (works on Canary) along with the cssOrigin option to tabs.insertCSS. I'm also working on allowing extensions to access closed shadow roots. We should be in a good place soon.

mjethani on 3 Feb 2018

👍3

Hi @mjethani! I'm actually keeping an eye on all the pull requests you're pushing to Chromium, and you're doing a great job, thank you!

ameshkov on 4 Feb 2018

👍1

Hey guys, coming at you with a new modifier idea:
https://github.com/AdguardTeam/AdguardBrowserExtension/issues/961

I suppose it can benefit all the privacy-oriented subscriptions so we're planning to implement it in one of the future updates.

ameshkov on 5 Mar 2018

👍2

@ameshkov I don't see that much privacy value in dealing with cookies alone given that data can be stored in other local storage such as localStorage, indexedDB, etc.Blocking 3rd-party cookies in browser settings should be the first step for any privacy conscious person -- this also takes care of all local storages.

gorhill on 5 Mar 2018

Extending this modifier to handle localStorage sounds useful indeed, and it can be done without changing the modifier syntax.

However, I've never seen indexedDB used for tracking purpose.

ameshkov on 5 Mar 2018

Ublock: Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions

Most helpful comment

All 99 comments

Related issues