Rfcs: Enhanced Patterns

Created on 5 Apr 2020  Â·  11Comments  Â·  Source: rust-lang/rfcs

std::pattern::Pattern could be "hello".
Example:

let s = "hello world":
s.contains("hello"): // ==> true

But it is also faster to do multi checks instead of using multiple contains functions.

Bad practise

s.contains('a') || s.contains('b') // about 51ns because the string makes two checks.

I created myself a Pattern implementation which is about 19ns and does just one check. But I think there must be a own pattern syntax.
Something like:

s.contains('a' || 'b')

Most helpful comment

You don't even need #2500 for this.

s.contains(&['a', 'b'][..])

All 11 comments

I suspect this falls under https://github.com/rust-lang/rust/issues/56345 / RFC https://github.com/rust-lang/rfcs/pull/2500 "Needle API" somehow, though I've never fully understood that RFC

You don't even need #2500 for this.

s.contains(&['a', 'b'][..])

@kennytm Yes, that's right, but that's not the only thing.

s.contains(&["abcd", "aaaa"]);

does also not work because that Pattern is only implemented in char-slices. An &str will be compared char by char. But what if we analyze that pattern in compile-time and see that the strings start with the same chars in this case "a". Why do we need to check these chars twice? Do you understand?

@deeprobin The standard library is not capable of "analyzing that pattern to see that strings both start with a" in run-time, let alone in compile-time.

You'd better use aho-corasick if you need to efficiently search for "abcd" || "aaaa".

@kennytm Exactly and that's why I created this issue so that this will be implemented at some point.

You'll need to explain

  1. why we need to essentially move aho-corasick into std to support searching multiple strings efficiently — is this feature so essential that crates.io is insufficient, and must be provided by the standard library? (and at this point why not just move regex into std)
  2. is that a || b syntax needed
  1. I think one should optimize what can be optimized. That means you should at least support simple multi-patterns like aho-corasick. regex supports more complicated patterns where I can maybe understand something that is not in the std.

  2. The a || b syntax is of course not absolutely necessary but would make the code a bit clearer.

s.contains(&['a', 'b'][..])

Should we add that to standard library documentation? And also mention the use of aho-corasick or regex if they need additional stuff.

s.contains(&['a', 'b'][..])

Should we add that to standard library documentation? And also mention the use of aho-corasick or regex if they need additional stuff.

As long as this is not yet implemented in std this would be very advantageous.

Should we add that to standard library documentation

All implementors of Pattern are automatically documented

@shepmaster Yes, but there are no examples there. Also the description is inconsistent, some ends with a period but some doesn't.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yongqli picture yongqli  Â·  3Comments

clarfonthey picture clarfonthey  Â·  3Comments

mahkoh picture mahkoh  Â·  3Comments

3442853561 picture 3442853561  Â·  3Comments

mqudsi picture mqudsi  Â·  3Comments