I have enabled the search_all_users option for the user directory search (and applied the #2831 patch to fix the internal server error). I've found that the user directory search works very well except when searching for the single letters a, s, and t. Searching for other single letters of the alphabet produces the correct results.
POST to https://matrix.floydcounty.tv/_matrix/client/r0/user_directory/search with the proper authentication and the following JSON body:
{
"search_term": "t"
}
I expect to receive a list of users whose display name or username starts with the letter t (there would be at least 10 results on my server). Instead, I get an empty result:
{
"limited": false,
"results": []
}
POSTing with some other letter, for example j, will return a list of users as expected.
The homeserver log doesn't show anything unusual for this particular request.
This sounds eerily similar to https://github.com/vector-im/riot-web/issues/4950
@turt2live: I think you're onto something!
Here are my results when I search for wil:

...and here's what happens when I search for will:

Seems like the database is filtering search terms for 'common' words.
EDIT: I should also mention that I'm using PostgreSQL as my DB. My guess is that Postgres's full-text search feature is being 'helpful' and removing common words from the search. Is there a way to still take advantage of the weighted results of the full-text search without common words?
The relevant portion seems to be here: https://github.com/matrix-org/synapse/blob/18e3a16e8b2303e6b638f679b5b8533e329cbe7a/synapse/storage/user_directory.py#L664-L695
Indeed, if I check the English stopwords list (which lives at /usr/share/postgresql/9.5/tsearch_data/english.stop on Ubuntu 16.04.3 LTS), the letters a, s, and t are listed, as well as the word will.
I did a bit of research and it's possible to do full-text search queries without using stopwords, but it involves creating a new dictionary, creating a configuration that uses that dictionary, and possibly creating an index for the new configuration. Once all of that is done, the first parameter of to_tsquery in the above-referenced lines would change from the default english configuration to the name of the configuration with the stopwords removed.
Given all the work that fixing this would involve, I wonder if full-text search is the right solution for user directory searches.
I agree, the full-text search features, like derived words etc. are not really usefull when we search for user names. Should be changed to simple pattern matching in the relevant columns.
@ara4n
This is pretty interesting, is this going to be fixed sometime soon? Is it a lot of work? Seems like a different method should just be used instead of to_tsquery but I'm not sure that's an easy task...
Most helpful comment
I agree, the full-text search features, like derived words etc. are not really usefull when we search for user names. Should be changed to simple pattern matching in the relevant columns.
@ara4n