With addition of custom Index methods and operators. It becomes a breeze to use pg_trgm extension and create an Index for it.
For example:
modelBuilder.Entity<File>()
.HasIndex(f => f.Name)
.ForNpgsqlHasMethod("gin")
.ForNpgsqlHasOperators("gin_trgm_ops");
This extension requires to use % operator or similarity function to run fuzzy search. How can I run this functions with LINQ without sliding to raw SQL?
Support would have to be added to the Npgsql provider, to make it recognize specific CLR methods and translate them into the appropriate SQL function or operator. That is relatively easy to do - you can see the provider's full text support for inspiration; functions would be added as extensions over DbFunctions (like so), and a method translator (like this one would recognize those functions and generate the appropriate SQL.
Are you interested in giving this a try?
@roji . Can you clarify, what changes should be done in Npgsql provider side here https://github.com/npgsql/npgsql ?
None. Shay is talking about the EF Core provider (this repo), not about the driver.
Correct. If I understand correctly, the pg_trgm doesn't add any new data types - if it did that would require support at the Npgsql ADO.NET driver (in https://github.com/npgsql/npgsql). If it's just the translation of functions, that's entirely the business of the EF Core provider, which is this repo.
If I'm wrong about pg_trgm not adding types please let me know.
I guess the functions are pretty straight forward. The only issue I see here is naming the operators; %, <%, %>, <<%, %>>, <->, <<->, <->>, <<<->, <->>> 馃槀
The code looks more or less like this (from https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/889)
Function name | Method name
-------------------------|---------------
show_trgm | Trigrams
similarity | Similarity
word_similarity | WordSimilarity
strict_word_similarity | StrictWordSimilarity
Operator | Operator name | Method name
----------|---------------------------------------------|---------------
% | similarity_op | Similar
<% | word_similarity_op | WordSimilar
%> | word_similarity_commutator_op | WordSimilarInverted
<<% | strict_word_similarity_op | StrictWordSimilar
%>> | strict_word_similarity_commutator_op | StrictWordSimilarInverted
<-> | similarity_dist | SimilarityDistance
<<-> | word_similarity_dist_op | WordSimilarityDistance
<->> | word_similarity_dist_commutator_op | WordSimilarityDistanceInverted
<<<-> | strict_word_similarity_dist_commutator_op | StrictWordSimilarityDistance
<->>> | strict_word_similarity_dist_op | StrictWordSimilarityDistanceInverted
The documentation says that text <<<-> text returns the "distance" between the arguments, that is one minus the strict_word_similarity() value, and that text <->>> text is commutator of the <<<-> operator. So even if <<<-> is commutator (see the function name), we should name its method according to the docs.
As these are extensions methods on EF.Functions, we should give some more thought on their naming. If we have a method simply called Similarity accessible in the main namespace, there would be no indication that it has anything to do with pg_tgrm... This is important especially because AFAIK no type is being introduced which could help with overloading etc.
One option is to prefix all the method names with Tgrm, another is to include the extensions class in a special namespace containing Tgrm...
Maybe we should write a plugin for pg_trgm since it's an extension of PostgreSQL?
@YohDeadfall sure, that makes sense (like PostGIS).
However, until the port to 3.0.0-preview7 is complete and merged, nobody should start looking at this - method translation has changed dramatically.
@YohDeadfall I think you wanted to look at this - if so maybe assign it to yourself. Regardless, moving to backlog because it shouldn't block the 3.0 release.
sry for this question, but when is this feature expected to be release into public?
thx!
when is this feature expected to be release into public?
When someone finds the time to implement it 馃槃 Wanna give it a shot?
when is this feature expected to be release into public?
When someone finds the time to implement it 馃槃 Wanna give it a shot?
ah, i though someone was already working on it
i'm going to look into this, but i think this is way out of my league...
I guess @YohDeadfall is assigned to the issue, dunno if he's started anything...
Already started, but had no time to finish it for 3.0.0.
I made a working plugin for pg_trgm targeting 3.0.0. It follows as much as possible other plugin implementations and follows method names suggested by @YohDeadfall, importing them directly into EF.Functions, without any prefix or specific namespace (for now).
https://github.com/ins0mniaque/Npgsql.EntityFrameworkCore.PostgreSQL.Trgm
It's in need of better documentation and tests; but it seems to work well for me at the moment. Just call UseTrgm() on your NpgsqlDbContextOptionsBuilder to get started.
P.S. I also made Npgsql.EntityFrameworkCore.PostgreSQL.FuzzyStrMatch => .UseFuzzyStrMatch()
and for RUM: Npgsql.EntityFrameworkCore.PostgreSQL.Rum => .UseRum()
Wonderful, @ins0mniaque! In that case could you open a PR to add pg_trgm support? For other two I think we need additional repos under npgsql or leave it under your control since they aren't built in parts of PostgreSQL.
@ins0mniaque Just viewed your implementation. Could you cleanup it a little? Feel free to use https://github.com/YohDeadfall/efcore.pg/commit/07f4815d1a753dca8cc9efe1518f1a4ee9da016e#diff-de57f6dbe0f085cdda84edc7c4b05c48 as the translator implementation.
Gladly. This is indeed a lot cleaner; I originally based my code on NpgsqlFullTextSearchMethodTranslator.cs which is not as clean.
As for fuzzystrmatch, I'm pretty sure it also is built-in into PostgreSQL (since 2001):
https://www.postgresql.org/docs/current/fuzzystrmatch.html
https://github.com/postgres/postgres/blob/master/contrib/fuzzystrmatch/fuzzystrmatch.c
However, as @roji mentioned, the method names for the pg_trgm operators are pretty generic (and the 'difference' fuzzystrmatch method is pretty awful too), I wonder if it should remain in plugin form like NTS or NodaTime; that way people could 'opt-in' those methods by referencing the package.
Most helpful comment
I made a working plugin for pg_trgm targeting 3.0.0. It follows as much as possible other plugin implementations and follows method names suggested by @YohDeadfall, importing them directly into EF.Functions, without any prefix or specific namespace (for now).
https://github.com/ins0mniaque/Npgsql.EntityFrameworkCore.PostgreSQL.Trgm
It's in need of better documentation and tests; but it seems to work well for me at the moment. Just call UseTrgm() on your NpgsqlDbContextOptionsBuilder to get started.
P.S. I also made Npgsql.EntityFrameworkCore.PostgreSQL.FuzzyStrMatch => .UseFuzzyStrMatch()
and for RUM: Npgsql.EntityFrameworkCore.PostgreSQL.Rum => .UseRum()