Meteor-feature-requests: Add mongodb collation support

Created on 7 Jun 2017  Â·  13Comments  Â·  Source: meteor/meteor-feature-requests

Migrated from: meteor/meteor#7946

Minimongo Mongo Driver

Most helpful comment

@brylie this can only be used after fetching the documents (with non-collated sort) and sorting the resulting array. If you have the complete dataset both on the client and the server (_which rarely is the case_), this solution will work as expected. But as soon as you begin limiting the number of documents that are sent to the client, your lists will in effect be non-collated until you reach the accented characters at the end of your published dataset.

All 13 comments

We should note here that this probably is mainly a minimongo problem.

Our project needs the following enhancement, since we are dealing with multilingual content:

Enhance Minimongo.Sorter to accept and respect collation options, possibly passing the options on to an implementation of String.prototype.localeCompare

Hey @brylie I have implemented a stopgap solution for a couple of projects that needed collated sorting and perhaps, you might be interested in implementing something similar.

The idea is to mimic collation and store normalized sortable values in the database. For example:

{
  "_id" : "3FkDk39BCx3yyM2Fc",
  "name" : "Ali̇ Özcan"
}

becomes

{
  "_id" : "3FkDk39BCx3yyM2Fc",
  "name" : "Ali̇ Özcan",
  "collations" : {
    "name" : {
      "tr": "00000140101101000000000001800031010020100001016010"
    }
  }
}

To be honest, I got the idea from https://derickrethans.nl/mongodb-collation.html which was later revised as https://derickrethans.nl/mongodb-collation-revised.html but it explains the idea very well.

So since javascript does not expose the collator sortkey algorithm by default (or not in an easy way that I could find); you can:

  • either use iLib to tap into its Collator class to obtain a sortCode
  • or use the CLDR data yourself to come up with your sort code implementation.

Then, you can use aldeed:collection2's autoValue or matb33:collection-hooks's before.insert, before.update and before.upsert (_or something similar that you can build on top of the Mongo.Collection prototype/instance_) to calculate the sort codes on the fly and attach them to your original document.

Now, if you want to get even more creative, you can use collection hooks' before.find and before.findOne to modify your queries' options parameter, which contains the sort option (_and the fields option if you are used to limiting returned fields for performance/security_) and alter it to point to the collated sort field (or one of them if you've implemented them for multiple languages) which in effect, will modify all your client and server find's and your publication cursors to use a consistently collated sort output.

Now I know this has some drawbacks:

  • It is admittedly hacky
  • Sort codes are long beasts, therefore will take up some space in the database
  • You won't be able to insert/update data from outside of your application, otherwise sort codes won't be generated/updated
  • The server bundle will grow significantly if you're using iLib because it will download the complete CLDR database
  • Your sortcode definitions may become outdated if you don't use iLib and manually use cldr data (just because it is kind of hard to download them manually whereas iLib automates that for you)
  • iLib version changes might cause some headaches. Whatever version you're using must be pinned down and updates might require you to update your whole database with new sort codes.
  • Sort code generation has to happen on the server! You cannot use that on the client, therefore you need to create client-mock functions (_or omit them completely on the client_) for code generation, which means, there will be a slight window of wrong sorting on the client until the data gets updated on the server and the change gets published back to the client. Yes you can work around this by hiding docs that don't contain sortcodes or by trying to come up with some sort of best-effort sorting. In any case your optimistic ui will have to be a little too optimistic.
  • This approach is not documented in detail (_I wish I had time to write a nice blog post about this or perhaps even create an atmosphere package - I had intended to, but never got around to completing neither_) and will take you some fiddling to get it right.

Yet, I've had good mileage with this.

Edit: If you would like to create an atmosphere package (_and a complementary npm one_) that does this and write some nice documentation, I can collaborate with you, at least provide you guidance.

If you would like to create an atmosphere package (and a complementary npm one) that does this and write some nice documentation, I can collaborate with you, at least provide you guidance.

I will propose this idea to our Product Owner (@bajiat) to see if we can get organizational support to maintain a new package.

At the recommendation of @bajiat, I offer an example of our current 'dynamic sorting' solution. We can discuss the merits and similarities of possible solutions, and decide whether to co-maintain a package.

In general, we are doing locale-aware sorting in our code like so:

// Make sure sorted by name
    if (sort.name) {
      // Get the language
      const language = TAPi18n.getLanguage();

      // Use custom sort function with i18n support
      documents.sort((a, b) => {
        return a.name.localeCompare(b.name, language) * sort.name;
})

In this way, we have avoided storing the internationalized sortby strings in our collections. What are your thoughts?

@brylie this can only be used after fetching the documents (with non-collated sort) and sorting the resulting array. If you have the complete dataset both on the client and the server (_which rarely is the case_), this solution will work as expected. But as soon as you begin limiting the number of documents that are sent to the client, your lists will in effect be non-collated until you reach the accented characters at the end of your published dataset.

Does anybody have a workaround for a reactive find with server side collation?

@mitar @serkandurusoy @filipenevola any updates on this?

Sorry, not that I know of. You can still use the raw collection on the server side to query by collation and then do a locale aware compare on the client side to maintain sort order.

Alternatively, you can create sorted hashes for your query criteria based on collator sort keys and save them alongside the data, dorectly sorting by that on both server and client

On Jul 3, 2020, at 13:16, Campbell McGuiness notifications@github.com wrote:

@serkandurusoy any updates on this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

Thanks for the prompt response @serkandurusoy ! Trying to avoid the additional sort key data, so will continue to explore the raw collection collation approach. Obviously locale aware sorting on the client only is trivial if not using limit in your subscription, but with 30,000+ documents that's not an option :)

Would be nice if we could bring it to regular use instead of having to hide in behind raw collection.

Is it possible to use rawCollection() in a publication and have it return a Meteor find cursor?

@StorytellerCZ the reason it is hard to bring this to regular use in Meteor Collections and minimongo is that it requires not only changes to comparisons, making it slower for regular minimongo sorts, but also will require full CLDR data available to both the server and parts of it relevant for the collation on the client. This is too much overhead for a seemingly unpopular requirement that has a known workaround. I was not surprized or upset when this did not make it into core.

@camslice what you'll need to do is pass in your collated query to Mongo through raw collection, and then mimic the same sort on the client side with locale compare. It would give you consistent results for most cases.

That being said, this is trivial with methods, not with subscriptions where you'd need to manually set up publications on the raw collection, which might be too much work. You might use something like redis oplog to facilitate publications though..

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Firfi picture Firfi  Â·  33Comments

GeoffreyBooth picture GeoffreyBooth  Â·  18Comments

scharf picture scharf  Â·  118Comments

mitar picture mitar  Â·  22Comments

meggarr picture meggarr  Â·  34Comments