In the actual daily use case, the number of Rule is usually in the thousands. It is time to propose a simpler and more efficient matching method.
In Surge, the solution is RULE-SET. For example, RULE-SET,https://url/xxx.list,yourproxy. xxx.list define a rule list:
DOMAIN-KEYWORD,amazon,force-remote-dns
DOMAIN-KEYWORD,google,force-remote-dns
DOMAIN-KEYWORD,gmail,force-remote-dns
DOMAIN-KEYWORD,youtube,force-remote-dns
DOMAIN-KEYWORD,facebook,force-remote-dns
It's simple but not flexible. If we use an embedded language, we can do some optimization and customization. For example (pseudocode written in typescript):
interface Context {
metadata: object,
proxies: { [key: string]: Proxy }
domainTree: {
match: (domain: string): boolean
}
}
const random = (s: number, e: number) => s + Math.floor(Math.random() * (e - s))
function (ctx: Context): Proxy {
const { proxies, metadata, domainTree } = ctx
const hk = Object.keys(proxies)
.filter(key => proxies[key].name.include('hk'))
.map(key => proxies[key])
.sort((a, b) => a.delay > b.delay ? 1 : -1)
if (domainTree.match(metadata.host)) {
return hk[0]
}
return hk[random(0, hk.length)]
}
domainTree is a simple trie tree now used in host, and Context could provide more util function to help the user write their own script. BTW, Configuration can get rid of the binding of lists and rules.
# define a list
lists:
- url: https://example.com/list.yaml
interval: 300
type: trie-tree
- file: /opt/list.yaml
type: trie-tree
I found out two embedded languages that can be used in clash
filter 是不是写错了
const hk = Object.keys(proxies)
.filter(key => proxies[proxy].name.include('hk')) // proxies[key]
后面这个 lists 里面是什么样子的呢,每个trie tree对应一个script吗
@Fndroid Typo fixed. 列表里目前设想是一个 domain list,具体细节要接着讨论,我就是写个原型
期待新功能。同时期待surge中的支持mitm的url-regxp,那就完美了.
Some questions:
Where to set a proxy for a domain?
The use case is for Netflix, not all proxies support Netflix.
In the example list, there is no proxies, so if the list.yaml contains Netflix domains, how to choose proxies?
How to detect IP-CIDR and other types of MATCH RULE?
Usually we will have a google.list which contains google services domain and IP CIDR, how to detect the domain or domain suffix or IP cidr?
这样是不是相当于消除了proxy group的概念,一个list匹配的请求选择哪个proxy由rule script决定?但是一个rule script能利用的信息非常有限(proxy的name/ip等),仅由这些信息可能不足以决断出是否应该选用这个proxy。
可以借鉴Kubernetes里label和selector的概念,给节点加上label来描述节点,这样rule script在选择节点时有了更多参考信息。
例如:
Proxy:
- name: "hk01"
server: 1.1.1.1
type: ss
labels:
- hk
- netflix
- name: "us01"
server: 1.1.1.2
type: ss
labels:
- us
- youtube
- netflix
这样YouTube对应的rule script就可以这样写了:
const hk = Object.keys(proxies)
.filter(key => proxies[proxy].labels.contains('youtube'))
其实在 Clash 的配置文件来说,已经不需要“参考” Surge 的写法,或者接近它。
https://github.com/v2ray/domain-list-community
这个项目怎么样?只提供有限的内容,其它靠配置文件来定义。
对于规则来说,这个项目就够了,足够轻盈,维护简单,也能创建自己的 list。
当然也可以创建一套自己的,问题不大的,转换一下就好。
而对于需要订阅的用户来说,节点订阅的格式规范,更有意义吧?
deleted
Even if users can use a specific DNS server to query a domain. For example, use 223.5.5.5 to query taobao.com
支持这个想法,作为 yaml Rule 的可替代选项存在,方便进行更复杂的定制而无需增加新的指令(如 USER-AGENT)。
只不过发明或者选择另一种 embedded language,还不如直接弄一个 go 的 v8 binding,用 JavaScript PAC 的成熟规则(FindProxyForURL),这样适用性也更广,许多已有的优秀 PAC 也可以直接应用过来。
@reorx v8 可太大了,我不需要 v8 里面的大部分东西,所以一个精简的 embedded language 才是我想要的。PAC 的限制太多了,内置的函数也太少了。网上随便找一个规则类的 PAC 都是上万行,这不是我想要的。
Rule Set is easy to read and edit for end users, so I think it is better to implement a rule-provider to support both ruleset and rule script, something like:
rule-provider:
google:
type: set
path: google.yaml
url: https://example.com/google.yaml
interval: 3600
use: proxy-group-us
netflix:
type: set
path: netflix.yaml
url: https://example.com/netflix.yaml
interval: 3600
use: proxy-group-media
custom:
type: script
path: custom.script
url: https://example.com/custom.script
interval: 3600
use: proxy-group-other
@ruisiji I have a doubt about use, What is it for?
@Dreamacro The ruleset does not contain proxy/proxy-group, the use means which proxy/proxy-group handles this ruleset.
Rule Set is easy to read and edit for end users, so I think it is better to implement a
rule-providerto support both ruleset and rule script, something like:rule-provider: google: type: set path: google.yaml url: https://example.com/google.yaml interval: 3600 use: proxy-group-us netflix: type: set path: netflix.yaml url: https://example.com/netflix.yaml interval: 3600 use: proxy-group-media custom: type: script path: custom.script url: https://example.com/custom.script interval: 3600 use: proxy-group-other
I think this style of config is inflexible. In one hand, users can’t switch rule set’s proxy group easily. In the other hand, I think separating rules and script is not easy of maintaining, users will jump in different files to add, modify or delete rules.
In my opinion:
Here’s an example of config.yaml:
proxy-provider:
hk:
type: http
path: ./hk.yaml
url: http://remote.lancelinked.icu/files/hk.yaml
interval: 3600
health-check:
enable: true
url: http://www.gstatic.com/generate_204
interval: 300
us:
type: file
path: /home/lance/.clash/provider/us.yaml
health-check:
enable: true
url: http://www.gstatic.com/generate_204
interval: 300
Proxy Group:
- name: Proxy
type: select
proxies:
- hk
- us
rule-provider:
googleset:
type: http #or file
path: ./googleset.yaml
url: https://example.com/google.yaml
interval: 3600
Rule:
- SET-MATCH, googleset, Proxy
Here is an example of rule set file:
rules:
- DOMAIN-SUFFIX,ampproject.org
- DOMAIN-SUFFIX,appspot.com
- DOMAIN-SUFFIX,blogger.com
- DOMAIN-SUFFIX,getoutline.org
dnsmap:
- HOST, abc.com, 1.2.3.4
- DNS-SERVER, def.com, 8.8.8.8 #or system
rewrite:
- URL-REWRITE, ^http://www\.google\.cn http://www.google.com, header # 302 or REJECT
- HEADER-REWRITE, ^http://example.com,header-replace=User-Agent, Unknown
script:
- http-response, ^http://www.example.com/test, script-path=test.js, max-size=16384, debug=true
Maybe check out Dhall.
@aur3l14no It's a configuration language.
Dhall is a programming language so you can compress the rule set like what you do in your example using typescript. You are right it is not an embedded language. Well, It is a standalone configuration language but it can be exported to yaml. IMO, the implementation efforts are similar to using embedded language.
The main benefits using Dhall are:
The main downside is that Dhall may be hard to write because it's more of a functional language like Haskell, Ocaml, etc. and it has its own restrictions (to achieve safety and totality)
Maybe it's not the right tool for this case after all. I just feel using Dhall would be cool and is worth looking into : P
yes, it looks cool. But functional language is hard for most users. In this case, clash needs a lightweight embedded language.
I think lightweight is the best feature of clash.
是否考虑SQL呢?很多string match, regex search在sqlite以及其他扩展都有实现,还可以用bloom filter做很多性能优化
请问 @Dreamacro 是否有关于Rule script迭代的roadmap?是否可以考虑第一个迭代先把proxy和rules分开,然后看用怎样的language更好支持rules?谢谢
能否有快速rule set来匹配数千个 domain-suffix,这个是最常见的规则模式之一。
一组domain-suffix,写入一个map,把域名循环切割到起码包含一个dot为止,循环中直接判断是否在map内,这个匹配速度会好于遍历全部规则。
能否有快速rule set来匹配数千个 domain-suffix,这个是最常见的规则模式之一。
一组domain-suffix,写入一个map,把域名循环切割到起码包含一个dot为止,循环中直接判断是否在map内,这个匹配速度会好于遍历全部规则。
rule-provider is designed for matching large number of rules. 34W rules matches only need 7 µs.
@bash99 https://github.com/Dreamacro/clash/blob/master/component/domain-trie/tire.go
之前看见了,但是现在Config.Rules还是一个数组[],这样匹配规则时是否仍然是循环逐条匹配?
仅仅Host/Dns等匹配时用上了Trie?
@Dreamacro Is it possible to move the behavior to the rule-provider file?
domain:
- '.blogger.com'
- '*.*.microsoft.com'
- 'books.itunes.apple.com'
ipcidr:
- '192.168.1.0/24'
- '10.0.0.0.1/32'
It will be great if rule-provider file can support DOMAIN, DOMAIN-KEYWORD etc.
@ruisiji
rule-provider can't have different behavior, so that put into file seems to make no sense.
DOMAIN equal 'example.com', DOMAIN-KEYWORD unable to optimize in a large list
@Dreamacro Thank you for clarify this.
Another question: Is it possible to add a new rule to match both domain and all its child domains(including nested domain)?
I think most users want to proxy the domain and all its child domains at same time.
@ruisiji The difficulty is not implementation, but the expression.
both domain and all its child domains now can be expressed as:
- example.com
- .example.com
This expression is accurate, but it is also inconvenient. Accuracy is important in hosts.
Do you have any good ideas about how to express match both domain and all its child domains?
How about introducing another syntax plus sign(+), which can mean the domain and its child domains? I don't know if this is good for implementation or not, just as convenient perspective.
- +example.com
Although + is a yaml reserved sign, but it can be put at the leading of the string. Using @ is also an alternative, but this needs quoting the string(for me, I don't like quotes in yaml).
+ as block chomping indicator: https://yaml.org/spec/1.2/spec.html#id2794534
use starlack-go as interpreter, and start using it in premium.
@bash99 https://github.com/Dreamacro/clash/blob/master/component/domain-trie/tire.go
之前看见了,但是现在Config.Rules还是一个数组[],这样匹配规则时是否仍然是循环逐条匹配?
仅仅Host/Dns等匹配时用上了Trie?
@Dreamacro just wanted to clarify, does this mean that premium is using the trie (and I assume Aho-Corasick) for the rules matching especially in the DOMAIN-SUFFIX case?
@mayanez premium just a superset of the clash, When you see things like *.clash.dev +.example.com .foo.com, it's using the trie tree.
@Dreamacro thank you for the clarification.
However, I'm still not clear to me how the function Match in https://github.com/Dreamacro/clash/blob/50d778da3c155af36181b53aa736b21fb3753f24/tunnel/tunnel.go#L319 uses the trie if the rule is of type DomainSuffix. If my understanding is correct it simply calls this: https://github.com/Dreamacro/clash/blob/50d778da3c155af36181b53aa736b21fb3753f24/rules/domain_suffix.go#L23
Could you point me to how it is used for rules matching?
Most helpful comment
其实在 Clash 的配置文件来说,已经不需要“参考” Surge 的写法,或者接近它。
https://github.com/v2ray/domain-list-community
这个项目怎么样?只提供有限的内容,其它靠配置文件来定义。
对于规则来说,这个项目就够了,足够轻盈,维护简单,也能创建自己的 list。
当然也可以创建一套自己的,问题不大的,转换一下就好。
而对于需要订阅的用户来说,节点订阅的格式规范,更有意义吧?