Core: Revamp search API

Created on 31 Jan 2018 · 15Comments · Source: flarum/core

Plan for new search system:

Search driver (eg. Algolia) is called upon to update its index each time a post/discussion is changed
(Each post is a separate object in the index which contains not only post content but also its discussion's title/tags/other metadata)

q = foo "bar baz" (author:toby OR author:franz) is:following

Parse q into an abstracted query object . eg. https://github.com/pimcore/search-query-parser or https://github.com/netgen/query-translator
Loop through each Term in the query object, allow gambits to match and convert term into a Where object to be processed by search driver
Extensions may also populate the query object to enact permissions, eg. add a Where to restrict results to only tags that the user can see
Search driver builds its vendor-specific query by inspecting query object's Terms/Wheres, returns list of discussion IDs

However, this is complex and will take a while to implement, so it would be wise to keep it slated for 0.2 (as per current roadmap). In the meantime, we are doing #1339

needs-discussion typfeature

Source

tobyzerner

👍2

Most helpful comment

Re-reading this discussion, I feel like we do need to separate filters and full text search into different API endpoints and implementations.

The filter part, like proposed in #2104 , makes a lot of sense to me for retrieving posts with filters, discussions with a particular flag (bookmarked, followed, ...) or users based on filters (enabled, group) for an admin UI. The current approach works well and apart from creating proper extenders and re-usable classes, I don't think we need to change it too much. Changing to query-string based over string-based would be nice.

The full text search really stands on its own. Very few filters matter. You just want relevant results, and fast. Currently there are only two situations where this endpoint would be used: the quick search dropdown that includes discussions and users, and the full discussion search results page. We don't really need to make the search work with anything else by default. Here's there's also potentially the use case of directly hitting a search server like Algolia without going through Flarum backend at all.

My preference/proposal would be as follows:

Rename what we currently call "search" to something like "filters"
Drop the ability to have full text filters
Revamp filters to use key-values in the filters[] query string instead of a string query
We keep the system with the search/searcher and gambit manager, just rename it to something different
Refactor search input and results page to use a new extensible javascript API. That API needs to support calling external non-Flarum API endpoints for search
Implement a driver system for this new full text search system with the ability to select a different one for each model (discussion, user, and other models registered into the global search by extensions)
Create a new gambit system for the full text search
- Gambits could be shown as pills in the search field, and never as "hidden" gambits like we do now
- Drivers could expose which gambits they support. If a gambit is not supported, it's not added to the search query and this will be clearly visible in the UI
Create a default Flarum search driver that uses a server-side database search, probably something like described higher with local indexing with word-split
Provide a way for extensions to add gambit support to that default driver
Technically, the default search driver could well be implemented using the existing API endpoints and re-use some of the logic from the filters, but I think a complete separation would help to keep things organized and allow easier refactoring in the future
Other drivers would expose their own extender to let extensions register gambits on them if they wish so
Search drivers would be entirely responsible for the indexing strategy and access control. I think most forums will be fine indexing just the public data so IMO we shouldn't invest too much effort into that. If we let drivers implement the API call on javascript side and add their own API endpoint on Flarum API (if they need it), they basically have access to everything they need to know who the actor is and perform their own logic. The default driver of Flarum could continue using the ScopeVisibilityTrait if we want to offer full search including in restricted tags

So the API could look something like this:

PHP

class FilteredQuery implements ExtenderInterface {
    public function __construct(string $filteredQueryClass);

    // Equivalent to FlarumSearch::addGambit in the PR
    public function addFilter(string $filterClass);

    // Equivalent to FlarumSearch::addSearchMutator in the PR
    public function addQueryMutator(callable $callback);
}

// This extender's only purpose would be to make the drivers and resources known to the admin panel to create a configuration screen. Everything else will be on the client-side
class Search implements ExtenderInterface {
    // Registers a new driver for full text search
    public function driver(string $driverClass);

    // Registers a new full text search resource that this extension provides (for example, pages or albums)
    public function resource(string $name);
}

// This is the extender's for the base search driver provided by Flarum
class SimpleTextSearch implements ExtenderInterface {
    // Add a gambit for the default search driver. Those could work very similarly to what we have now, but would be separate from the filtered query filters
    public function addGambit(string $gambitClass);

    // Allows altering the SQL query of the default search driver, just like we can with filtered queries
    public function addQueryMutator(callable $callback);
}

// The server-side search driver class would only serve to build the admin configuration panel. Flarum needs to get a name/description, and know which resources are compatible with this driver. The list of resources isn't hard-coded as an extension could use this search driver's own extenders to add functionality
class AbstractSearchDriver {
    abstract function name();

    abstract function compatibleResources(): array;
}

JS:

// The extension would be responsible to use this extender to register one class for each resource they are compatible with
class SearchExtender {
    addDriver(resource: string, class implements AbstractSearchDriver);
}

// The search driver would work similarly to the current SearchSource objects, but it would also be used for paginated results in the discussions search results
class AbstractSearchDriver {
    // This method lets Flarum know which gambits it can offer to this driver
    // Flarum would take the list of suggested gambits for the context and compare it against this before suggesting it
    abstract supportedGambits(resource: string);

    // Performs the search asynchronously. We could wrap the result in another object that gives pagination hints
    // The data returned does not need to come from Flarum's API, but must contain models with some minimal attributes/relationships that we could spec out
    async abstract search(resource: string, query: string, page: number): Model[];
}

// We could add a new method to the global search state that can be extended by extensions to provide gambit suggestions
// This would replace the current GlobalSearchState.prototype.params used by extensions like Tags
app.search.suggestedGambits(): string[];

In the UI, I suggest making it so gambits get offered under the search field when focused.

So for example you click the search bar, and the following appears:
[ :mag: Search Forum ]
Suggested filters: [Tag General] [User Admin]

Clicking a gambit would add it visually to the search, either in text form or ideally in a fashion similar to the tag picker:
[ :mag: (Tag General) Search Forum ]
Suggested filters: [User Admin]

clarkwinkelmann on 4 Nov 2020

👍2

All 15 comments

After continually jumping back and forth between performance issues and little bugs in our search implementation, it has become clear that we need to make progress on our driver-based search infrastructure (#509). This would then allow for swapping out a default, naive implementation (hand-maintained index, similar to FluxBB; MySQL fulltext search; etc.) for more advanced drivers based on e.g. Elasticsearch or Algolia.

I was asked to describe my thoughts on the initial architecture in a new ticket. But we already have this one, so I'll add my thoughts here.

Laravel Scout

Multiple people brought up using Laravel Scout, as it incorporates a driver infrastructure for searches on Eloquent models. This might be possible to use, but I am somewhat skeptical for several reasons:

We want to allow extensions to change things at runtime.
Tight coupling to Eloquent models / queries (possibly not compatible with Toby's approach outlined above)
We want to define interfaces based on Flarum's functional requirements. Laravel's APIs usually change more often.

What might be possible is using Scout's engines (i.e. drivers) so that we have less work in building out the adapters to the various backends. We would still have to take care of the integration into Eloquent / our search infrastructure manually, though. Definitely worth a shot when we get to the implementation phase.

Requirements

These are the filters / gambits supported by our current search layer...

Searching users:
- email
- group name
Searching discussions:
- author name
- tag
- creation date (exact date, date range)
- is hidden
- is locked
- is sticky
- user-specific gambits:
- is (un)read
- subscription status (following / ignoring)

Initial thoughts:

User search by email is based on a permission anyway, so it might be better placed in a user list / admin user management page (slightly related: #283).
User-specific gambits are basically impossible to provide by the search driver. Can we combine driver results with our own database-backed filters (we have to use the IDs from the search driver to get the actual discussion titles etc. anyway). Pagination would be difficult or impossible, though.

We really have to come up with a better conceptual separation between what is search and what is filtering.

Driver capabilities

Once this list is complete, we should be able to derive an interface for driver implementations:

Update the index when a discussion / post / user is changed
- This should be done on the queue, but that's probably not the driver's concern.
Rebuild the entire index
Search the index with a given list of filters / gambits, return a list of matching IDs, paginated

Open questions

Where do we draw the line between search (requiring the backend), and filtering (database-backed)?
Where is the best place for completely hiding certain posts from the index (e.g. those marked as private)?
How do we ensure the driver can satisfy our sort order requirements for the list of results?

Naive implementation

As we don't want to require something like Elasticsearch by default (this would complicate the setup far too much), we have to provide a simple implementation by default.

As an alternative to the current MySQL-based fulltext search, we could implement a search based on a hand-made index, similar to what FluxBB and other forum software do / have done. Now that the application layer has to notify the index when it should be changed, this is possible. The main benefit would be compatibility with other database systems, should we ever get there.

Quick primer on how FluxBB's search works:

When indexing, split posts by words.
Add each word (exactly once!) to a search_words table and give it an ID. (Only do this for new words, obviously.)
For each word occurring in a post, add an entry to the search_matches (word_id, post_id) table. This can also contain metadata about where the match occurred (subject, opening post, body...) and the number of matches.
When searching, get the word IDs from search_words, then get the matching post IDs from the search_matches table.

Possible future changes

We want to keep these things in mind when designing the interface, even if we don't implement them right away...

Stemming - enables finding "playing" when searching for "player" (implementations are language-specific)
Highlighted snippets - the more powerful the driver becomes, the more we will have to rely on its ability to provide highlighted snippets of the results (because it integrates stemming etc.)

This is very likely incomplete right now. I will try my best to keep this post updated, as we discuss things below.

franzliedke on 17 Jan 2020

Currently, we have a DiscussionSearcher and a UserSearcher. I think we should abstract this system down to an AbstractSearcher, which would support extending ANY model, flarum-defined or not, with a searcher. When applied to Posts, this should also remove necessity for the (frankly redundant) ConfigurePostsQuery event.

Within that searcher, filtering is currently supported by gambits. I think this is a good system, and the new AbstractSearcher system should support extending ANY searcher with a gambit (if it exists).

Now, the searching itself it currently handled by a 'FullTextGambit'. This, in my opinion, is the part that we could refactor out into a system that supports various search drivers. A few possible ideas:

Split gambits into AbstractFilterGambit and AbstractSearchGambit
Make FullTextGambit not a gambit, but rather a feature of AbstractSearcher that supports various search drivers (of which the current system could be one). The driver would be configured in admin settings (similar to how mail driver is done).

In my opinion, the refactor I described above should be included by stable. Implementing an extender for custom search drivers could be delayed by 0.2.

askvortsov1 on 28 Mar 2020

Franz and I had a very productive discussion today, here are my takeaways:

Searching and Filtering

One point of uncertainty is where to draw the line between searching and filtering, and how these 2 concepts should fit in/work together. I am in favor of a tightly coupled approach, that is, having filtering and searching together under one system. Here's why:

We would expect that the machinery for executing a filter, executing a search, and executing a filtered search would not be radically different. It would be unexpected that adding a search term to a filtered query would use entirely separate logic to compute. For instance, you would expect that filtering for open PRs by askvortsov and filtering for open PRs by askvortsov with a search term of Policy Refactor would not be processed radically differently, but that the latter would build upon the former.
Filtering and searching are, arguably, extensions of each other: filtering restricts a queryset by certain structured metadata (like author, title, etc). Searching restricts it by an abstract 'full text search'.

Drivers

Our goal in designing the new system should be to eventually support search drivers. This should be extendable, so that extension developers can add new drivers. The issue emerges of how to generalize our implementation, so that filters (currently presented by gambits) will work across various extenders.

What I'm Proposing

The Search extender will allow developers to add 'search driver's, and to add 'filter spec's
- A search driver will handle the actual searching/filtering
- A filter spec allows extension developers who are adding functionality to define somewhat of an interface for filters. These will bridge the gap between available filters and the driver's implementation. These will also be available to the frontend.
Although we generally discourage this, search drivers will be encouraged to create their own extenders that would allow other extensions to add implementations of filters. As each search driver is so different, having an interface for filters that would be used within the search driver would only make everyone's lives more difficult.
Flarum's current search system will become the SimpleFlarumSearch search driver. It will have a SimpleFlarumSearch extender, which will allow people to add gambits, and to set fulltextgambits (or filterers and searchers, respectively if we want to change the names to be clearer).
All current gambits will be moved to the SimpleFlarumSearch driver, and corresponding FilterSpecs will be registered in the core of search.

Extenders

public class Search implements ExtenderInterface {
    // Add search driver
    public function searchDriver(SearchDriverInterface $driver);

    // Add filter spec
    public function filterSpec(AbstractModel $model, FilterSpecInterface $filterSpec);
}

public function SimpleFlarumSearch implements ExtenderInterface {
    // Right now these are 'Gambits'
    public function filter(AbstractModel $model, FilterSpecInterface $filterSpec, SimpleFlarumFilterInterface $filter);

    // Right now these are 'FullTextGambits'
    public function search(AbstractModel $model, SimpleFlarumSearchInterface $fullTextSearcher);
}

Interfaces

public interface SearchDriverInterface {
    // This returns a searcher for a given model, which actually performs the searching/filtering
    public function searcher(AbstractModel $model): SearcherInterface;

    // This will return an array of specs for the filters that this driver implements.
    // This should not be a static list, but rather should be generated as a result of other devs
    // adding filter implementations through the driver's extender.
    public function implementedFilters(): array[FilterSpec];

    // This will be called during resolution, and will provide a list of all existing FilterSpecs to the driver.
    public function addAvailableFilter(AbstractModel $model, array[FilterSpec] $filterSpecs);
}

 public interface SearcherInterface {
     public function search(SearchCriteria $criteria, $limit = null, $offset = 0, array $load = []): SearchResults;
}



md5-03a40be493a9bb378ab94a4a4bd89ead



public interface FilterSpecInterface {
    // Regex representing the filter's pattern
    public function pattern(): string;
}

askvortsov1 on 13 Apr 2020

This sounds like a great plan @askvortsov1 ; I'm extremely curious about the technical implications once we move gambits to the SimpleSearch. Well done!

luceos on 14 Apr 2020

After a bit more thought, I realized I'm missing one method in the SearchDriverInterface:

public function supportedModels(): array[AbstractModel];

This should work similarly to implementedFilters, but I think it's important to have the 2 separately, because we might have filters but not a searcher, or vice versa. However, there might be a bit of overlap, which might be unideal.

One other thing to consider: should filter specs go beyond a regex, and provide, say, a description? or non-regex filter types (equals, contains, between, gt, gte, etc)? In essence, I think that regex might be better as an implementation detail than as the actual interface, but I might be wrong

askvortsov1 on 15 Apr 2020

I feel for the API instead of:

filter[q]=testquery tag:testtag author:testauthor

this would be a lot clearer:

filter[q]=testquery&filter[tag]=testtag&filter[author]=testauthor

That would make it behave like the page parameter page[offset]=20&page[near]=20&page[limit]=20

But since it's not an easy change, the better alternative could be to rename filter[q] to either q or filter, which would also clarify things:

filter=testquery tag:testtag author:testauthor

(And maybe even simplify the page parameters to just offset, near, limit to keep it consistent.)

w-4 on 29 Sep 2020

I don't know that we'd want to rename filter[q] to q, as we want to ensure compliance with JSON-API. That being said, I like splitting up AND queries across multiple request params. We could also look into a broader overhaul, something like https://json-api-dotnet.github.io/JsonApiDotNetCore/usage/filtering.html.

That being said, I feel like we should separate discussion about the format of REST API requests, and the underlying search architecture (which is more focused on establishing a driver-based system)

askvortsov1 on 29 Sep 2020

I've spent a bit more time playing around with the system (using elasticsearch as an example, but only as that), and here's a reiteration of the main issues (shocklingly, pretty much exactly what Franz identified above):

SQL Fulltext search means that we can apply all filtering gambits AND the fulltext gambit in the same query. So we get one "block" of results, which we can then paginate, filter, etc.
Flarum's visibility scoping system is awesome, but this is where it fails: because rules for who can see what can get very complex, we can't reliably convert them into metadata that could be used to filter visibility on the elasticsearch level
Running a query against elasticsearch will return a fixed number of results. Because of (2), we will need to filter THOSE for visibility later, meaning that:
a. We need to request more results from ElasticSearch than we are planning to return. How much more? 2x? 10x? a constant huge number?
b. For proper pagination (no duplicates and no gaps), we'll need to supply an offset. What would that offset be?
It doesn't make sense to use elasticsearch for queries that contain current-user-based filters (unread, filtering) AT ALL, because as per (3), any results we extract from ElasticSearch would be filtered down beyond recognition

So, possible, non-mutually-exclusive approaches to fighting these issues (some of these are quite rough) :

a. Run 2 search systems (Flarum's basic one and elasticsearch/algolia/whatever) at the same time. If a search includes an unsupportable gambit like subscription status or unread, default to the native Flarum driver
b. Implement search drivers by tying gambits to a driver name. So an elasticsearch driver would work by overriding (some of) the default filtration gambits
c. Figure out how to improve Flarum's default search so that it takes discussion titles into account (and works better with CJK characters)
d. Have ElasticSearch return 10,000 or 100,000 search results, and keep all offset / size logic in the flarum portion of the application. This wouldn't be too different from our current method of searching, but could have performance implications (how fast can we filter down by ID if the list of IDs is very large?) Can we maintain order for relevance searches as returned by ElasticSearch?
e. Ensure every gambit is aware of ALL other gambits being applied. Also, allow filtration gambits to pass some kind of metadata to the fulltext gambit so that instead of directly executing the filter, they could prepare some elasticsearch query fragment to be run in the fulltext gambit. The more gambits a driver can replace, the more effectively it'll perform, and the vast majority of gambits are replaceable.

I'm planning to take a better look through the Laravel Scout system and see how they accomplish this (and whether they actually do).

Somewhat concrete proposal:

If we consider typical user behavior, people are unlikely to look beyond the _nth_ page of search results, where n is typically in single digits. Results become less relevant, and people have typically found something that works by then. (cue "The best place to hide a body is the 2nd page of google results" jokes).

Criteria that filter out significant portions of the search space (tags, dates, authors/participants, etc) can all be handled by Elasticsearch As long as we run queries that contain subscription and unread filters through the SQL search, searches we send to ElasticSearch should mostly contain results the user can view. We could further improve that by telling elasticsearch not to include anything from tags/categories the user can't view discussions in, which shouldn't be too performance-intensive, and would further cut down on inaccessible (permission-wise) results returned by ElasticSearch.

With these optimizations in place, if we request (offset + size) * c) results from ElasticSearch (where 2 < c < 10) and proceed as per (d) above, user experience shouldn't be affected in the VAST majority of cases. Furthermore, we might even be able to stick that returned data in redis or something, since if we don't use all of it, it could be recycled when the user next clicks "read more".

It goes without saying that this is an extension-space issue, although we might want to make it bundled (at least at the start) because of how critical an issue it is.

askvortsov1 on 31 Oct 2020

👍1

Re-reading this discussion, I feel like we do need to separate filters and full text search into different API endpoints and implementations.

My preference/proposal would be as follows:

Rename what we currently call "search" to something like "filters"
Drop the ability to have full text filters
Revamp filters to use key-values in the filters[] query string instead of a string query
We keep the system with the search/searcher and gambit manager, just rename it to something different
Refactor search input and results page to use a new extensible javascript API. That API needs to support calling external non-Flarum API endpoints for search
Implement a driver system for this new full text search system with the ability to select a different one for each model (discussion, user, and other models registered into the global search by extensions)
Create a new gambit system for the full text search
- Gambits could be shown as pills in the search field, and never as "hidden" gambits like we do now
- Drivers could expose which gambits they support. If a gambit is not supported, it's not added to the search query and this will be clearly visible in the UI
Create a default Flarum search driver that uses a server-side database search, probably something like described higher with local indexing with word-split
Provide a way for extensions to add gambit support to that default driver
Technically, the default search driver could well be implemented using the existing API endpoints and re-use some of the logic from the filters, but I think a complete separation would help to keep things organized and allow easier refactoring in the future
Other drivers would expose their own extender to let extensions register gambits on them if they wish so
Search drivers would be entirely responsible for the indexing strategy and access control. I think most forums will be fine indexing just the public data so IMO we shouldn't invest too much effort into that. If we let drivers implement the API call on javascript side and add their own API endpoint on Flarum API (if they need it), they basically have access to everything they need to know who the actor is and perform their own logic. The default driver of Flarum could continue using the ScopeVisibilityTrait if we want to offer full search including in restricted tags

So the API could look something like this:

PHP

class FilteredQuery implements ExtenderInterface {
    public function __construct(string $filteredQueryClass);

    // Equivalent to FlarumSearch::addGambit in the PR
    public function addFilter(string $filterClass);

    // Equivalent to FlarumSearch::addSearchMutator in the PR
    public function addQueryMutator(callable $callback);
}

// This extender's only purpose would be to make the drivers and resources known to the admin panel to create a configuration screen. Everything else will be on the client-side
class Search implements ExtenderInterface {
    // Registers a new driver for full text search
    public function driver(string $driverClass);

    // Registers a new full text search resource that this extension provides (for example, pages or albums)
    public function resource(string $name);
}

// This is the extender's for the base search driver provided by Flarum
class SimpleTextSearch implements ExtenderInterface {
    // Add a gambit for the default search driver. Those could work very similarly to what we have now, but would be separate from the filtered query filters
    public function addGambit(string $gambitClass);

    // Allows altering the SQL query of the default search driver, just like we can with filtered queries
    public function addQueryMutator(callable $callback);
}

// The server-side search driver class would only serve to build the admin configuration panel. Flarum needs to get a name/description, and know which resources are compatible with this driver. The list of resources isn't hard-coded as an extension could use this search driver's own extenders to add functionality
class AbstractSearchDriver {
    abstract function name();

    abstract function compatibleResources(): array;
}

JS:

// The extension would be responsible to use this extender to register one class for each resource they are compatible with
class SearchExtender {
    addDriver(resource: string, class implements AbstractSearchDriver);
}

// The search driver would work similarly to the current SearchSource objects, but it would also be used for paginated results in the discussions search results
class AbstractSearchDriver {
    // This method lets Flarum know which gambits it can offer to this driver
    // Flarum would take the list of suggested gambits for the context and compare it against this before suggesting it
    abstract supportedGambits(resource: string);

    // Performs the search asynchronously. We could wrap the result in another object that gives pagination hints
    // The data returned does not need to come from Flarum's API, but must contain models with some minimal attributes/relationships that we could spec out
    async abstract search(resource: string, query: string, page: number): Model[];
}

// We could add a new method to the global search state that can be extended by extensions to provide gambit suggestions
// This would replace the current GlobalSearchState.prototype.params used by extensions like Tags
app.search.suggestedGambits(): string[];

In the UI, I suggest making it so gambits get offered under the search field when focused.

So for example you click the search bar, and the following appears:
[ :mag: Search Forum ]
Suggested filters: [Tag General] [User Admin]

Clicking a gambit would add it visually to the search, either in text form or ideally in a fashion similar to the tag picker:
[ :mag: (Tag General) Search Forum ]
Suggested filters: [User Admin]

clarkwinkelmann on 4 Nov 2020

👍2

2 completely separated systems works even better! I like pretty much everything about this proposal. We'll probably have some duplicate code between the filters and SimpleTextSearch gambits, especially towards the start. I think we should just keep it duplicate without any forced abstractions to unite the two: as you mentioned, these are more complex systems and unnecessary complexity doesn't help.

The biggest challenge here will probably need to be implementation, as the PRs would be bulky. Possible split of work:

PR 1A

Copy-paste new endpoints for searching users and groups, and tweak the frontend for the search dropdowns and search results page to point there instead of to the filter.
Modify the current filter endpoints for groups to be a bit simpler: all we need to do is scope visibility, run it through the filters (based on key-value parameters), run it through all filter mutators, apply offset/size, and return. A lot of abstractions needed for search can probably be eliminated here
Modify frontend calls to filter endpoints to send filters as key-value, not as a single string

PR 1B

Copy-paste the #2104 extender, set it up for filters
Copy-paste the #2104 extender, set it up for the simple flarum search driver (essentially exactly what it is now, but with different naming and invocation)

PR 2

Allow registering new drivers in the frontend and backend
Allow different drivers to be used for different resources (a resource search endpoint might call $driver = SearchDriverManager::getDriver(RESOURCE_NAME)
Add a frontend UI for configuring which driver is used for each resource

PR 3

Add a frontend system for adding/removing gambits in a search (IE revamp frontend search API).

PR 4

Add support for searching posts within a discussion?

PR 5

Switch the MYSQL fulltext driver to something like the FluxBB one

So, that begs the question: what do we need when? PR 1A and 1B I think needs to be included for stable so we can deprecate the old events (preferably in beta 15, instead of #2104, which I think is feasible).

PR 2 would also be really nice to have sooner than later to enable foreign language search. I would also like to have this for stable, but maybe as a beta 16 thing.

PRs 3, 4, and 5 can be done when developers are available, but are not THAT urgent. PR 3 is the most important IMO.

askvortsov1 on 5 Nov 2020

I love the idea of separating fulltext vs filters. This would also allow retrieving private discussions using the fulltext search and then running filtering afterwards. We only need to tackle some basic issues like retrieving more entries than we need etc.

This would also mean we could look at laravel scout again, potentially.. Well established and easy to replace.

I personally don't think that we need to allow search drivers per resource. I think that would over-complicate core as most (if not all) implementation will use one driver.

Changing search, adding an advanced search page and introducing better ux for gambits/filters but I think it doesn't make sense to add that to the scope before stable. For the issues with search that UTF-8 needs fixed, we can possibly go for a simpler, temporary solution?

luceos on 10 Nov 2020

I personally don't think that we need to allow search drivers per resource. I think that would over-complicate core as most (if not all) implementation will use one driver.

I think it would be simpler to register drivers to resources, not even from the perspective of ALLOWING search per resources as for registering and accessing those drivers in the first place. Since each resource would have different gambits, trying to shoehorn search for ALL resources under one block of logic would be messy IMO.

Most of what I bring up in Phase 1A about custom pages isn't for an "advanced search UI". It's just that we need the index page to hit the filtering API endpoint, and the search page to hit the search API endpoint. This might be possible to integrate into the index page, but I need to play around with it first.

askvortsov1 on 10 Nov 2020

I can't see a solution where there isn't one driver per resource and where extensions can define their own search endpoints.

Unless we standardize a way for extensions to provide the data that must be indexed, but this kind of goes against the flexibility that could be offered by having radically different drivers.

My use case for drivers per resources is as follows:

Core adds user and discussion search
Core adds basic driver for user and discussion
Extension A adds new advanced driver for user and discussion
Extension B adds picture search
Extension B adds picture support to basic driver

If we force the forum to use basic for everything since it's the only driver that supports all search types, we will create lots of incompatibilities or just discourage extensions from implementing search in their own data.

We could force selecting the same driver for both user and discussion however since the two will always be used together and can't be disabled.

clarkwinkelmann on 10 Nov 2020

👍1

This would also allow retrieving private discussions using the fulltext search and then running filtering afterwards. We only need to tackle some basic issues like retrieving more entries than we need etc.

This actually wouldn't happen. Retrieving more entries than we need is the exact issue I brought up above. I'm envisioning these as 2 parallel, but maximally detached systems,

askvortsov1 on 12 Nov 2020

I made another attempt at #2453, and came across some concerns:

Shared Endpoints

After some more thinking, I don't see why search and filter couldn't share one endpoint. My main reasoning behind this is that regardless of whether a query is searched or filtered, the frontend expects a consistent format of results. The distinction between which method is used is essentially:

If the filter param consists of just a q string which is made of gambits and a fulltext query, it should be searched
If the filter param consists of key-value filter pairs, it should be filtered

Either method should return an Eloquent collection, which can then be marked as paginated (the actual application of sort, offset, and size happens inside the filtering / searching logic), have related objects / includes added to it, and so on. IMO, https://github.com/flarum/core/blob/master/src/Search/SearchResults.php#L14-L14, which is currently returned by DiscussionSearcher and UserSearcher, is pretty much what we'd want both the SearchDriverInterface and the Filterer to return.

Gambits vs Filters code duplication

With the filtering / search split, everything that is currently a gambit for the SimpleFlarumSearchDriver (in core and extensions) could/should be supported as a filter for the Filterer. However, the formats in which they are provided are different:

filter parameters would be provided as key-value pairs
search gambits would be provided as key:value substrings of the filter[q] param

This means we'd have 2 copies of code for all filters / SimpleFlarumSearch gambits. A few ideas:

What if we had filters and SimpleFlarumSearch gambits implement both FilterInterface and GambitInterface?
We could provide some utility that would apply filters as gambits SearchUtil::filterFromGambit, but I don't like that as it would be magicky and could constrain functionality / format
We could implement SimpleFlarumSearch to use Filterer for its gambits (kind of the previous option, but better), but that's still hacky

Another concern is preserving backwards compatibility. With the approach I'm taking in #2453, it's completely preserved for the Search logic (since that is completely unchanged), but it seems like it'll be a breaking change for the filter endpoints, as we can't really apply the gambits from the ConfigureXGambit events, since they won't comply with FilterInterface. However, we could set off Searching events in the Filterer basec on the resource being filtered.

askvortsov1 on 13 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Subscriptions] System setting to receive email notifications for every new discussion

IvoPereira · 24Comments

Using SSO authentication to login to an existing un/pw account

mlyle · 27Comments

Usernames cannot be numeric

luceos · 33Comments

User list

tobyzerner · 34Comments

Add screen name functionality

tobyzerner · 25Comments