Migrated from: meteor/meteor#7946
We should note here that this probably is mainly a minimongo problem.
Our project needs the following enhancement, since we are dealing with multilingual content:
Enhance Minimongo.Sorter to accept and respect collation options, possibly passing the options on to an implementation of String.prototype.localeCompare
Hey @brylie I have implemented a stopgap solution for a couple of projects that needed collated sorting and perhaps, you might be interested in implementing something similar.
The idea is to mimic collation and store normalized sortable values in the database. For example:
{
"_id" : "3FkDk39BCx3yyM2Fc",
"name" : "Ali̇ Özcan"
}
becomes
{
"_id" : "3FkDk39BCx3yyM2Fc",
"name" : "Ali̇ Özcan",
"collations" : {
"name" : {
"tr": "00000140101101000000000001800031010020100001016010"
}
}
}
To be honest, I got the idea from https://derickrethans.nl/mongodb-collation.html which was later revised as https://derickrethans.nl/mongodb-collation-revised.html but it explains the idea very well.
So since javascript does not expose the collator sortkey algorithm by default (or not in an easy way that I could find); you can:
Collator class to obtain a sortCodeThen, you can use aldeed:collection2's autoValue or matb33:collection-hooks's before.insert, before.update and before.upsert (_or something similar that you can build on top of the Mongo.Collection prototype/instance_) to calculate the sort codes on the fly and attach them to your original document.
Now, if you want to get even more creative, you can use collection hooks' before.find and before.findOne to modify your queries' options parameter, which contains the sort option (_and the fields option if you are used to limiting returned fields for performance/security_) and alter it to point to the collated sort field (or one of them if you've implemented them for multiple languages) which in effect, will modify all your client and server find's and your publication cursors to use a consistently collated sort output.
Now I know this has some drawbacks:
iLib because it will download the complete CLDR databaseYet, I've had good mileage with this.
Edit: If you would like to create an atmosphere package (_and a complementary npm one_) that does this and write some nice documentation, I can collaborate with you, at least provide you guidance.
If you would like to create an atmosphere package (and a complementary npm one) that does this and write some nice documentation, I can collaborate with you, at least provide you guidance.
I will propose this idea to our Product Owner (@bajiat) to see if we can get organizational support to maintain a new package.
At the recommendation of @bajiat, I offer an example of our current 'dynamic sorting' solution. We can discuss the merits and similarities of possible solutions, and decide whether to co-maintain a package.
In general, we are doing locale-aware sorting in our code like so:
// Make sure sorted by name
if (sort.name) {
// Get the language
const language = TAPi18n.getLanguage();
// Use custom sort function with i18n support
documents.sort((a, b) => {
return a.name.localeCompare(b.name, language) * sort.name;
})
In this way, we have avoided storing the internationalized sortby strings in our collections. What are your thoughts?
@brylie this can only be used after fetching the documents (with non-collated sort) and sorting the resulting array. If you have the complete dataset both on the client and the server (_which rarely is the case_), this solution will work as expected. But as soon as you begin limiting the number of documents that are sent to the client, your lists will in effect be non-collated until you reach the accented characters at the end of your published dataset.
Does anybody have a workaround for a reactive find with server side collation?
@mitar @serkandurusoy @filipenevola any updates on this?
Sorry, not that I know of. You can still use the raw collection on the server side to query by collation and then do a locale aware compare on the client side to maintain sort order.
Alternatively, you can create sorted hashes for your query criteria based on collator sort keys and save them alongside the data, dorectly sorting by that on both server and client
On Jul 3, 2020, at 13:16, Campbell McGuiness notifications@github.com wrote:
@serkandurusoy any updates on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks for the prompt response @serkandurusoy ! Trying to avoid the additional sort key data, so will continue to explore the raw collection collation approach. Obviously locale aware sorting on the client only is trivial if not using limit in your subscription, but with 30,000+ documents that's not an option :)
Would be nice if we could bring it to regular use instead of having to hide in behind raw collection.
Is it possible to use rawCollection() in a publication and have it return a Meteor find cursor?
@StorytellerCZ the reason it is hard to bring this to regular use in Meteor Collections and minimongo is that it requires not only changes to comparisons, making it slower for regular minimongo sorts, but also will require full CLDR data available to both the server and parts of it relevant for the collation on the client. This is too much overhead for a seemingly unpopular requirement that has a known workaround. I was not surprized or upset when this did not make it into core.
@camslice what you'll need to do is pass in your collated query to Mongo through raw collection, and then mimic the same sort on the client side with locale compare. It would give you consistent results for most cases.
That being said, this is trivial with methods, not with subscriptions where you'd need to manually set up publications on the raw collection, which might be too much work. You might use something like redis oplog to facilitate publications though..
Most helpful comment
@brylie this can only be used after fetching the documents (with non-collated sort) and sorting the resulting array. If you have the complete dataset both on the client and the server (_which rarely is the case_), this solution will work as expected. But as soon as you begin limiting the number of documents that are sent to the client, your lists will in effect be non-collated until you reach the accented characters at the end of your published dataset.