Received a report from a 2.11 user on our FB page that category search is case-sensitive for her, which means that sometimes she'll type the right category in the search field but nothing will show up.
AFAIK the MW API that we use is inherently case-sensitive, but the upload wizard seems to be able to find a way around that and produces the same category suggestions regardless of case.
Maybe if we convert everything to lowercase then the server performs a non-case-sensitive search? That's just an hypothesis, I have not tried.
@nicolas-raoul Possible! We'll try it out with a direct query first.
Can I take this issue?
@ankit-kumar-dwivedi please feel free!
@misaochan Is this issue free to be worked upon? if so can i take it?
Hey! Yes sure you should start working on it as I'm not working on it right
now.
On Sun, Jan 12, 2020, 7:21 PM Kshitij Bhardwaj notifications@github.com
wrote:
@misaochan https://github.com/misaochan Is this issue free to be worked
upon? if so can i take it?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/commons-app/apps-android-commons/issues/3179?email_source=notifications&email_token=AI7ACH2SIIJZ5DQVUZWKGVDQ5MN6ZA5CNFSM4JBW444KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIW2OWQ#issuecomment-573417306,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AI7ACHYZUMAZFWYYVHDK77LQ5MN6ZANCNFSM4JBW444A
.
Thank you:)
I'm re-opening this as I believe there's a problem with how this issue was fixed.
@kbhardwaj123 Can you clarify a doubt that I have regarding your PR #3326? In the description you say:
Tested from the MW api fuzzy search url that the category suggestions would deliver the desired results no matter what case you sent in the api call so to fix the issue api call has been converted to lower case.
Are you sure the API really doesn't care about the case of the category name given to it? I'm doubtful about that for several reasons. Here are a couple:
In case you're wondering why the test case didn't fail. Here's the catch:
The page title is case-sensitive except the first character.
From Manual:Page title - MediaWiki
I think the quote speaks for itself. I'll share the actual problem w.r.t to the app in the next comment.
@sivaraam while I was working on this I went with what @nicolas-raoul suggested so I ensured that all the category strings being passed to the OkHttpClient are converted to lower case and I wrote new tests regarding that and they worked fine but I guess I must have missed something I will take a look at at again
Ok. Here's the issue with the respect to the app: category search doesn't return any categories with a prefix that has a upper case character in it (other than the first one, of course). See #3582 for proof.
In case the issue is not clear to you from #3582, here's another example.
Here's what I get when I search for categories with "COVID" (mind the case) in the app (version: 2.12.3.629~a63a358):

Now, consider the linked example query which returns 25 categories which have "COVID" as it's prefix. Here are the categories that the query returns:
Category:COVID-19 guidelines in Brazil
Category:COVID-19 guidelines in Argentina
Category:COVID-19 guidelines in Albania
Category:COVID-19
Category:COVID-19 guidelines by country
Category:COVID-19 guidelines in Czechia
Category:COVID-19 guidelines
Category:COVID-19 guidelines in Denmark
Category:COVID-19 guidelines in Esperanto
Category:COVID-19 Clinical Cohort Research Conference, March 18, 2019, National Medical Center, Republic of Korea
Category:COVID-19 coronavirus
Category:COVID-19 guidelines by language
Category:COVID-19 guidelines in Arabic
Category:COVID-19 guidelines in English
Category:COVID-19 guidelines in Basque
Category:COVID-19 guidelines in China
Category:COVID-19 guidelines in Estonian
Category:COVID-19 guidelines in Bengali
Category:COVID-19 guidelines in Bangladesh
Category:COVID-19 guideline cartoons by Anika Nawar Eeha and Abdullah Al Mamun in Bengali
Category:COVID-19 guidelines in Bengali by Anika Nawar Eeha and Abdullah Al Mamun
Category:COVID-19 guidelines in Catalan
Category:COVID-19 guidelines in East Timor
Category:COVID-19 DIY
Category:COVID-19 guidelines in Breton
As you can see, none of the above categories are shown in the category suggestions.
@misaochan Given that we've now accidentally reduced the category search space rather than increasing it, you might want to ensure we fix this before releasing the next version.
Added to the release list, thanks for the heads up!
Hi @kbhardwaj123 , are you currently still working on this? Please do keep us updated, thanks!
@misaochan sure I'm on it, will update ASAP
Thanks @kbhardwaj123 ! As we are planning to include this in v2.13, when you submit your PR could you please rebase and submit it on the 2.13-release branch?
I investigated about the problem and here are my findings.
So Suppose I want to find the category Temple of Ishtar at Mari by entering temple of ishtar
these are the results using
generator=allcategories in the url API result which apparently is for prefix search and is case-sensitivegenerator=search in the url we get API result which is case in-sensitive and gives us the required Temple of Ishtar at Mari category.Now on reading the logs i realized that the method searchAll() in CategoriesModel was calling for prefix search and that right there is where the problem is, so i when i fix that by calling both prefix and search API and combining the result we finally get a case insensitive search.
But there's a catch
We are using the beta flavor of the APIs which give the following results
Possible Solution
AFAIK there are two ways
@misaochan @sivaraam @nicolas-raoul @maskaravivek I need your opinions on my investigation on this to fix it for v2.13 i mean are we going to use the production flavor of the APIs in v2.13
@kbhardwaj123 Thanks for the analysis. I'll look into it and share my comments soon. I have a quick doubt about one particular thing:
We are using the beta flavor of the APIs which give the following results
What do you mean by beta flavor of API? Do you mean the API hosted in the beta server (https://commons.wikimedia.beta.wmflabs.org/w/api.php) as opposed to the production server (https://commons.wikimedia.org/w/api.php)?
For category search (and really any testing that does not involve actually uploading), please use the prodDebug flavor of the app. The beta server is unusable for most testing.
@sivaraam yes that's exactly what i meant the API hosted on beta server https://commons.wikimedia.beta.wmflabs.org/w/api.php has server API but it is unable to give the required result where as the production API https://commons.wikimedia.org/w/api.php gives the expected result as shown bu the links in by previous comment.
@nicolas-raoul does prodDebug flavor use https://commons.wikimedia.org/w/api.php APIs ?
@kbhardwaj123 Yes, prod* flavors use the production APIs, for instance https://commons.wikimedia.org/w/api.php . Sorry that our beta servers are not representative of production :'-(
@nicolas-raoul sure then the problem is solved already, I will create the pull request
@sivaraam yes that's exactly what i meant
Thanks for the clarification.
... the API hosted on beta server commons.wikimedia.beta.wmflabs.org/w/api.php has server API but it is unable to give the required result where as the production API commons.wikimedia.org/w/api.php gives the expected result as shown bu the links in by previous comment.
Let me now clarify something. The API hosted in the beta servers and the production servers _would not_ differ in a functional manner. I'm glossing over a little but I believe it's fine for the case in question. You see different results when you use the beta server _only because_ all the categories present that are present in the prod server _are not present_ in the beta server. So, forget the fact that the APIs in the production servers are case-sensitive and the APIs in the beta servers are not, because both of them behave the same way. You can read more about the beta cluster in the following wiki page: Beta Cluster - MediaWiki
sure then the problem is solved already, I will create the pull request
Can you explain how you're going to fix this? I'm asking this to ensure everyone's on the same page. Also, I would suggest you to not rush this. I say this because making the category search case insensitive seems to be a lot complicated than it seems. It's better to know our options and choose the most appropriate one. If we have the release coming up soon soon we can always just revert the changes done in PR #3326 (which we would have to do anyway) and move with the release. We can then make the change after that in that case. @misaochan can comment better about the deadline.
You see different results when you use the beta server only because all the categories present that are present in the prod server are not present in the beta server. So, forget the fact that the APIs in the production servers are case-sensitive and the APIs in the beta servers are not, because both of them behave the same way.
Ok. Here's a proof for the fact that the Beta server behaves just the same way as the production server.
This returns the Category:TestCat despite the search term being testcat. So, the beta server's generator=search is case-insensitive too.
Let me now clarify something. The API hosted in the beta servers and the production servers would not differ in a functional manner. I'm glossing over a little but I believe it's fine for the case in question. You see different results when you use the beta server only because all the categories present that are present in the prod server are not present in the beta server. So, forget the fact that the APIs in the production servers are case-sensitive and the APIs in the beta servers are not, because both of them behave the same way. You can read more about the beta cluster in the following wiki page: Beta Cluster - MediaWiki
@sivaraam initially what i meant was that since beta servers don't have all the categories ( the working of both APIs is same that was clear from their documentation) this is what i wanted to show:
Suppose i want category Temple of Ishtar at Mari by typing temple of ishtar only
If theprodDebugAPIs are used they give what one expects:
with generator = allcategories result it doesn't give the required category because it has prefix only search
with generator = search result gives the required category due to it's case insensitivity
But the beta servers have a problem which is that they contain the category Temple of Ishtar at Mari using generator=allcategories see here but The generator=search is incapable of returning the category when provide it with temple of ishtar see here
Comprehensively the results displayed by the beta server's case sensitive API (generator=allcategories) is delivering a category which the case-insensitive API is not able to return and **no such problem is there in the prodDebug APIs
How i intend to solve this is that the searchAll method is the one at fault here, it only calls the prefixSearch API for searching categories so we we make a call using generator=search and combine both prefix and normal search results our problem would be solved.
And yes we need to use the prodDebug APIs because of the point i just mentioned above.
@sivaraam yes i agree that the beta ones are case insensitive but they don't seem to return Category: Temple of Ishtar at Mari (using generator=search) while the case insensitive beta API (generator=allcategories) return the category see result
So i implemented the solution and with the beta server's APIs and this is how it looks with screenshots
using category suggested by @sivaraam Category:TestCat

Now with Category:Temple of Ishtar at Mari (here i am showing that it exists on beta server):

But from the following screenshot it is visible that generator=search doesn't return that which leaves this category as case sensitive

And as soon as i change the flavor of the APIs to prodDebug all these problems dissappear
@kbhardwaj123 Thanks for your explanations. I see your problem now.
But from the following screenshot it is visible that generator=search doesn't return that which leaves this category as case sensitive
It's prudent to explore more before coming to conclusions. AFAIK, you can't just make some categories case sensitive and others case insensitive. It doesn't even make any sense, does it? Anyways, I'll try to clarify what's going on here. Here's the description of the search API from API:Search - MediaWiki [emphasis mine]:
GET request to search for a title or text in a wiki.
Just assume the search API does not search for the titles for now, I'll come back to the why such an assumption? later. Note that the search API looks for the text in the wiki pages. So, any query you send to generator=search looks for the search text in the contents of the wiki page (the category pages are the wiki pages, in our case). So, the results you get in the beta and production server depend not just on the presence of the categories it also are based on what content is present in the category pages. Let's take your case of the "Category:Temple of Ishtar at Mari".
Category:Temple of Ishtar at Mari - Wikimedia Commons
This is the category page in the production server. As you can see the category page has content such as info boxes which has your search text "temple of Ishtar". This could have been the reason for that category to be included in the your API query to the production server: query link.
Also, the fact that the search API searches the content is very clear from the results of the above query which include categories such as Category:Astarte (goddess), Category:Passing lion Babylon (Louvre, AO21118)
I'm not very sure about how/why a page is included in the result as the algorithm seems to be more involved. Relevant quote from the "Additional notes" section in API:Search page page:
Depending on which search backend is in use, how srsearch is interpreted may vary. On Wikimedia wikis which use CirrusSearch, see Help:CirrusSearch for information about the search syntax.
Coming to the why assume search API doesn't search the title part. Try the following query:
https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&generator=search&gsrnamespace=14&gsrsearch=temple%20of%20ishtar&gsrlimit=25&gsroffset=0&gsrwhat=title
I've just added gsrwhat=title to the query which tells it to search just the title. As you can see it would clearly say: "title" search is disabled.. Thus my assumption. See also: https://stackoverflow.com/q/14337219/5614968
I hope I've clarified your confusion about beta server not returning the results you expect, now. Let me know if I have not.
To conclude, the search API does more than what's needed (a category title search) and particularly doesn't seem to be searching the title at all. I don't think that would be a good choice. So, as I mentioned earlier we'll have to explore the proper way to achieve a case insensitive search. Here are a couple of related API pages:
Also, I believe we could ask the wikitech-l mailing list about this.
@sivaraam Thanks for such a comprehensive explanation :).
I agree with you that search generator could be an overkill as you pointed out that searching temple of ishtar returns some completely unrelated categories as they contain that term in their wiki text body. So what i am thinking is that in the question on stackoverflow which you mentioned one person gave a workaround of using intitle as:
srsearch=intitle:temple%20of%20ishtar
could solve our issue and return only those categories with the required search term.
Kindly give your opinions on this
@sivaraam i tried it and it returns exactly what we want, checkout the following link:
https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&generator=search&gsrnamespace=14&gsrsearch=intitle:temple%20of%20ishtar&gsrlimit=25&gsroffset=0
I feel that there's a tradeoff, i mean on one hand if we search the title of category then it gives quite relevant results but might eliminate some (though less relevant)possibly more suited categories but on the other hand it may also suggest some completely irrelevant results as you pointed out:
Also, the fact that the search API searches the content is very clear from the results of the above query which include categories such as Category:Astarte (goddess), Category:Passing lion Babylon (Louvre, AO21118)
I need opinions on: If the category search should be restricted to title (of it's wiki) only
The search URL above looks better than what we currently have indeed, but still not perfect, I think Commons has a better one for us.
Users will type and should see results appear as they type.
For instance, let's say I take a picture of a supermarket in Tokyo. I start typing "supermarkets in to"

How about using the API that sits behind that website search box? Is there any reason why it is not good enough?
I need opinions on: If the category search should be restricted to title (of it's wiki) only
In a word: yes. It's best to keep the search title only to ensure that the results are predictable and straightforward. Also, searching more than just the title is out of scope for this issue which is about making category search case insensitive. We can discuss enhancing the category search separately and focus on just making the category title search case insensitive for now.
How about using the API that sits behind that website search box?
Good idea. We would have to find how it works.
Is there any reason why it is not good enough?
I think we can answer this only after knowing how that works :)
@sivaraam @nicolas-raoul so i will make it title only search and create a separate issue for improving our category search functionality in favour of something similar to that of website search
so i will make it title only search ...
If you think of using intitle: part of search API then here's a problem I noticed with that. It only seems to be returning category pages for which a category page exists (just as expected). Examples:
https://commons.m.wikimedia.beta.wmflabs.org/w/api.php?action=query&format=json&formatversion=2&generator=search&gsrnamespace=14&gsrsearch=intitle:temple%20of%20ishtar&gsrlimit=25&gsroffset=0
This query doesn't return Category:Temple of Ishtar at Mari which is a valid category in the beta server that should be returned.
https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&generator=search&gsrnamespace=14&gsrsearch=intitle:rose%20factory&gsrlimit=25&gsroffset=0
This doesn't return the Category:"Rose" factory which is a valid category in the production server that should be returned. An incomprehensive list of such categories could be found at Special:WantedCategories.
This might not be a problem if we use the results of the generator=search API as a supplement to the results from the allcategories API (which does not have such a problem). But I just wonder if there is a better way to properly achieve this case insensitive category title search. That's why I was suggesting that we ask the wikitech-l mailing list. We could get a reliable answer of how to go about doing this.
@sivaraam sure in that case I agree that we should ask on the mail list, I will hold my PR on this issue
Any luck with the mailing list? We are holding 2.13 for this at the moment. :)
Any luck with the mailing list?
Apologies. I never got around to sending the e-mail to the mailing list. Got hung up with other things. I'll try to send it by tomorrow if no one else beats me to it :)
We are holding 2.13 for this at the moment. :)
You don't have to hold it anymore ;). I've create #3636 that reverts the changes done in the PR #3326. You can merge that and move on with the release. We can handle the case insensitivity in the next release :)
@sivaraam i agree this would be our best option for now and i am really keen to see what mailing list would suggest regarding this issue :)
I did some searching and phabricator and came to know that case insensitive category title search is a long standing feature request that is yet to be addressed [[ref 1](https://phabricator.wikimedia.org/T59302)] [[ref 2](https://phabricator.wikimedia.org/T187342)]. The linked comment is a nice TL;DR of the status quo.
It seems we really can't use search API for the reason outlined in the comment I referred to previously and another comment in the same ticket which I'm quoting here:
Is it not possible to use the article search engine with (invisible) category: prefix instead ?
Wouldn't that search for pages in the category namespace, rather than actual categories? Some categories don't have associated pages, and you can create pages in the category namespace for non-existent categories.
That's right. To add to that here's another reason for why search API is not a proper fit. There's something called hidden categories [[ref 1](https://commons.wikimedia.org/wiki/Commons:Categories#Categories_marked_with_%22HIDDENCAT%22)] [[ref 2](https://www.mediawiki.org/wiki/Help:Categories#Hidden_categories)] in Mediawiki (the wiki engine behind Commons). My understanding of them is that these hidden categories aren't meant to be added by users directly. An example of such a hidden category in Commons is Category:Uses of Wikidata Infobox - Wikimedia Commons. There's a way to identify such hidden categories using the allcategories API while the search API doesn't have such an option. [side note: we should think about filtering away hidden categories before showing category suggestions. That's for another issue though :)]
Despite all this, I sent an e-mail to the mailing list just to confirm if my understanding is correct.
To conclude, it seems we really can't provide a case-insensitive category title search for now :(
What we could do is to mention about our category title search use case to the following phabricator ticket to clarify that category search is not as "niche" a feature as they think it is.
https://phabricator.wikimedia.org/T187342
That might help us get an API that we can use soon.
@sivaraam thanks for such elaborate insights :). I have a doubt regarding the linked comment which you mentioned, it says:
Our regular search feature (aka "prefix index"), used for the main search field and used for the input field when creating an article link, is case-insensitive in most cases
The "prefix index" which is being reffered here is it the
allcategoryAPI's prefix search, if so then how is it case-insensitive in most cases, i am a little confused by this
You don't have to hold it anymore ;). I've create #3636 that reverts the changes done in the PR #3326.
Awesome, thank you!
@sivaraam thanks for such elaborate insights :). I have a doubt regarding the linked comment which you mentioned, it says:
Our regular search feature (aka "prefix index"), used for the main search field and used for the input field when creating an article link, is case-insensitive in most cases
I would quote that fully to as it makes sense only when it is complete. Here it is for the sake of discussion:
Our regular search feature (aka "prefix index"), used for the main search field and used for the input field when creating an article link, is case-insensitive in most cases. On Wikimedia wikis this comes from CirrusSearch. On other wikis (and on WMF until recently) this was provided by the TitleKey extension. The search feature has a namespace filter as well. Which would allow us to do case-insensitive search of page titles in the Category namespace.
Read that fully before reading further.
The "prefix index" which is being reffered here is it the
allcategoryAPI's prefix search, if so then how is it case-insensitive in most cases, i am a little confused by this
I'm reasonably confident that the comment either refers to the API:Prefixsearch or someother API. It definitely is not referring to the allcategories API as it mentions a namespace filter which the allcategories doesn't have (and doesn't have any need for).
Hope that clarifies your doubt.
Despite all this, I sent an e-mail to the mailing list just to confirm if my understanding is correct.
And here's our confirmation of my observation:
https://lists.wikimedia.org/pipermail/wikitech-l/2020-April/093295.html
Re-opening as this issue seems to have been closed by mistake.
@sivaraam This has been fixed via #3913. Does that not fix the issue for you?
Most helpful comment
@misaochan sure I'm on it, will update ASAP