One of the KEY points that is missing in the documentation (sorry if I missed it), is that SQL queries involving strings are case-sensitive. For example, the following query:
SELECT * c WHERE c.email = "joe.[email protected]"
Will NOT return any records if the email address stored is "joe.[email protected]" (note the capital T).
In order to perform a case-insensitive query, you must either normalize your data (to upper or lower case) or use LOWER() or UPPER() in the WHERE clause, for example:
SELECT * FROM c WHERE LOWER(c.email) = "joe.[email protected]"
However, be warned - using LOWER() or UPPER() will cause your Request Units (RUs) to spike and will likely cause 429 errors !
In my opinion there should be NO RU charge for using LOWER() or UPPER() and this would solve the problem of case-insensitive queries without getting clobbered by RU charges and 429 errors.
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@bdcoder2 Thanks for the feedback! I have assigned the issue to the content author to evaluate and update as appropriate.
@bdcoder2 Thanks for the feedback! Yes you should use Upper and Lower functions to consider case-insensitivity. The stackoverflow page - https://stackoverflow.com/questions/30512806/how-to-do-a-case-insensitive-search-on-azure-documentdb responded by MSFT team has couple of examples on how to use upper and lower.
We have a new version of indexing mechanism coming soon (no timelines yet) to support this functionality and it will address the RU consumption too. Hope this helps, we will update you when the feature is available. Let us know if you have any additional questions!
@bdcoder2 we are closing this issue now, if you have any further questions, feel free to reopen it or create a new issue.
@bdcoder2 We will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.
Is there any news on the index mechanism changes? Having to store an upper cases version of the data to avoid the RUs costs a lot of development time.
Just tried the "Lower" function on a single field of a document db Cosmos unit. WOrks great with a small number of records but our production env, with over a million docs, immediately returns a 429 error and shuts down. Not what you'd expect for the price. So @SnehaGunda when will this be fixed? As @simonvane said...avoiding RUs is costly. It will double the size of our storage and impacts TCO too. @SnehaGunda please reopen this request. No MODERN db should be case sensitive.
Any word on this? It's currently preventing our migration.
@simonvane @esfitsMack @imjuststeve This is all great feedback and a valid concern. Since the purpose of this channel is to address documentation issues, the best means for providing this feedback is via the Azure Cosmos DB UserVoice where you will be able to engage the product team directly. I did a search and did not find an existing entry. It would be advantageous to have this topic created and as more customers up-vote and comment on this subject, the product group will know it is an item that needs attention. Please let me know if you have any additional questions with this. Thank you!
@Mike-Ubezzi-MSFT I have added to this use voice post which seems to cover the feature requirement - https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/36119482-support-case-insensitive-indexing
However, it would be very helpful if the documentation covered best practices for handling case insensitive issues in a performance manner rather than everybody having to trawl the internet to find advice and everyone coming up with a different solution.
Thinking about it, where the are deficiencies / feature gaps in general it would save developers so much time if best practice workarounds were documented officially.
@simonvane Thank you for posting to the UserVoice. I am reopening this Git Issue as a doc-enhancement to address the documentation part of the request.
@SnehaGunda and @LalithaMV for awareness.
The best practice for this today is to store the data in a separate property all lower case and then use that for queries with values passed through lower().
We are currently working on providing case-insensitive indexing but do not have an ETA to announce at this time.
@Mike-Ubezzi-MSFT , please assign this to @timsander1 Tim, can you please include a note on this article to provide guidance in this scenario?
Thanks.
@markjbrown Is it safer to store as upper case because of the Turkish I problem? - https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/
Wasn't aware of this. Sure if you expect to have Turkish users then yes, use upper case.
@markjbrown I was just keen to make sure the note that @Mike-Ubezzi-MSFT / @timsander1 were going to do gives good advice and recommends storing as uppercase to avoid issues with text in different languages. It is not restricted to Turkish, that's just an example.
We've recently published an updated article on query performance troubleshooting in which we provide the following guidance.
UPPER/LOWER
Instead of using the system function to normalize data each time for comparisons, instead normalize the casing upon insertion. Then a query such as SELECT * FROM c WHERE UPPER(c.name) = 'BOB' simply becomes SELECT * FROM c WHERE c.name = 'BOB'
Thanks for raising this issue.
@markjbrown Thanks for the tip on the better query performance. While that is very helpful it doesn't really solve the issue as it requires a duplication of data. Does closing this issue mean that it's 'closed and won't fix'?
The issue from a documentation perspective is closed as we've explained the behavior.
We have a backlog item to provide case-insensitive search and will release this functionality in a future release. You can track that from here. https://feedback.azure.com/forums/263030/suggestions/36119482
Thanks.
@markjbrown Thanks for clearing that up!
Most helpful comment
Is there any news on the index mechanism changes? Having to store an upper cases version of the data to avoid the RUs costs a lot of development time.