Orchardcore: Transliteration feature

Created on 4 Oct 2019  路  17Comments  路  Source: OrchardCMS/OrchardCore

We are currently looking for a new modular CMS for some government portal. Since our team has good .NET Core experience one of our favorites is Orchard.

One of the requested feature is transliteration from Cyrillic to Latin alphabet. For example there are exact rules to transliterate each character (and vice versa):

卸 -> 啪
褑 -> c
褕 -> 拧
...

and so on.

The questions are:
Is this supported by Orchard?
If no, what are the plans for support?
Are there any architecture guidelines for developing this feature in case our team is willing to try to help implementing it?

localization

All 17 comments

@neman Orchard Core support both Serbian Cyrillic & Serbian Latin with the cultures sr & sr-CS I think you need to use the OrchardCore.Translations.sr-CS to achieve what you want without adding a new rules, also you can checkout the translation on Crowdin

FYI no one contributed yet for those culture, so I encourage you and your team - if it's possible - to start working with this translations, and I will be thankful for you to support Orchard Core Translations and support the community around it

Thanks

If I understand you correctly, if I have some original content in Serbian Cyrillic, Orchard has out of the box to transliterate it to Latin (not to translate)?
I'm not concerned about Orchard labels translations. Of course we can help to translate existing resources which you provided.

Also bear in mind that when selecting back to original content it should not transliterate, but revert to original content.

Latin to Cyrillic is not always one to one conversion.
For example

lj could be or 谢褬
nj -> or 薪褬
d啪 -> or 写卸

Note that the digraphs d啪, lj, and nj are considered to be single letters in Serbian Latin. In practice, original content is stored in Cyrillic and then use transliterate to Latin.
Changing back to Cyrillic is showing original content (transliteration is skipped)

This is how it's done on current solution with some old custom ASP.NET CMS.

You should create correct culture for Serbia on Crowdin

The correct ones are

  • Serbian Latin sr-Latn-RS

  • Serbian Cyrillic sr-Cyrl-RS

CS is for former state Serbia and Montenegro, legacy Culture

https://www.iso.org/obp/ui/#search/code/

FYI Orchard Core has a localization & content localization modules, so this mean you can enter your content either with Cyrillic or Latin, but no conversion involved here, but you could extend the existing module to translate from Latin to Cyrillic based on some rules that you are know better than me :)

You should create correct culture for Serbia on Crowdin
The correct ones are
Serbian Latin sr-Latn-RS
Serbian Cyrillic sr-Cyrl-RS
CS is for former state Serbia and Montenegro, legacy Culture

@agriffard could you please handle this one

@neman hey - Orchard 01 supported Transliteration (https://github.com/OrchardCMS/Orchard/blob/dev/src/Orchard.Web/Modules/Orchard.Localization/Services/TransliterationService.cs) - I don't think Orchard Core has it built in.

There is no reason it couldnt be ported over, I just wasent aware anyone was using it!

@Jetski5822 this is cool, let me check I think Seb point to it somewhere ..

I'm not wrong, here it is https://github.com/OrchardCMS/OrchardCore/issues/4151, I will close this as it's duplicate, @sebastienros feel free to open it if there's a reason for that

Thanks for clarification.
I do not think that #4151 is the same as transliteration service (they are linked for sure, but not the same PBI)

@neman I understand what transliteration is, but I don't understand in what context it could be used in the CMS. Could you give an example of actual usage, and what issue is would fix?

Or are you just looking for a service, like what @Jetski5822 has done, that you could use in your own modules? I doubt it.

Also, you might not be able to full understand the issue that was referenced in the context of OC. The idea is that a liquid filter is a way to script how unicode content can be processed on the front-end. For instance it will be used to built the ascii url based on a unicode title. Or to build logical identifiers for the data. Or to index the transliterated value.

Here is the use case (maybe it is part of that service mentioned by @Jetski5822, I'm not sure that's why I asked here)
Let's say that complete content of portal is in Cyrillic and it is default (for example news and other parts of the page).
There is a drop down to choose Cyrillic or Latin. When Latin is selected, the content should be somehow intercepted before render (maybe by middlleware or Orchard module, not sure until I get into code) and transliterated by predefined rules.
There should not be double content for the same language in the database.
I'm just looking how it is achievable with Orchard Core and what would be the right approach

I see. Because you don't want to manage two different content sets, you can't use our Content Localization feature. So it's more like an automatic UI conversion. This could be done with a custom controller I think, and you drop down would just point to this controller with the current content item, which would then be transliterated. Or this could be done by using a custom template so you could pick each piece of data you want to convert, using our liquid filter in this case.

Do you have a site url to share where I could test the feature?

There should not be double content for the same language in the database.
I'm just looking how it is achievable with Orchard Core and what would be the right approach

The current approach to use two translations, but if you want to avoid that I'm thinking for different ways

  1. If you develop with source you can change the PortableObjectStringLocalizer

  2. Without source we can:

a. Extend PortableObjectStringLocalizer

b. Add some events like localization loaded, and subscribe to it (if we support events)

c. Create TransliterateInterceptor that intercept the loaded translations and make the conversion

Sorry closed by mistake from my mobile

You could try for example here https://www.euprava.gov.rs

image

English site is different subportal, but other languages are translations. Only Serbian is transliterated

The last resort is to use client side java script, but I do not like that solution.

I think you could easily do a middleware that would intercept the content, parse the HTML (anglesharp) and transliterate any text. This can be shipped as a module that would be triggered with a custom querystring. But nothing specific to OC in this case, it cod used in any aspnet app.

Ok, thanks. I was thinking about middleware at first, but was curious to see if I can plug somewhere within OC or whatever you recommend.
So basically I can create middleware which would be a module of OC. Great I must deep dive in OC as soon as possible.

All the best @neman, feel free to ask if you have any question

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JanSichula picture JanSichula  路  3Comments

kevinchalet picture kevinchalet  路  4Comments

deanmarcussen picture deanmarcussen  路  3Comments

lzw5399 picture lzw5399  路  3Comments

cbadger360 picture cbadger360  路  4Comments