October: Input Converter: Preset URL for German

Created on 18 May 2016 · 14Comments · Source: octobercms/october

If you create a new page the input converter (title to url) is a bit useless for Germans. It needs to converts 'ä' to 'ae' instead of just 'a'.

Example for a simple German map:

    var GERMAN_MAP = {
        'Ä': 'AE', 'Ö': 'OE', 'Ü': 'UE',
        'ä': 'ae', 'ö': 'oe', 'ü': 'ue',
        'ß': 'ss', 'ö': 'oe', 'ü': 'ue',
    }

We need something like a German map inside input.preset.js. Or can we update the Latin map? Any ideas?

Completed Maintenance

Source

codebeauty

All 14 comments

Yes, perhaps submit a PR for this to be added. Add it as specifically a GERMAN_MAP so we can pick and choose based on the locale in future.

daftspunk on 19 May 2016

All the chars @codebeauty mentions are already in the LATIN_MAP so adding a GERMAN_MAP overrides the LATIN_MAP definition.

Those are missing though in the LATIN_MAP

'œ': 'oe', 'ō': 'o'

I will open a PR for those to show as :

gabsource on 21 May 2016

I think the problem is that it converts to "a" instead of "ae", something distinct to German by the look of it. @alxy can you comment?

daftspunk on 21 May 2016

👍1

@daftspunk you are right. Like my example map above, in German 'ä' needs to be 'ae', 'ü' will be 'ue' and so on.

| Converter Input | Converter Output |
| --- | --- |
| ä | ae |
| ü | ue |
| ö | oe |
| ß | ss |

codebeauty on 21 May 2016

So it needs its own MAP, then we should use the backend-locale to determine which map to use.

<meta name="backend-locale" content="en">

daftspunk on 21 May 2016

Yes, @daftspunk looks like a good way. The converter can check for the backend-locale information and builds the map under this condition.

codebeauty on 21 May 2016

Seems like @codebeauty substitutions are the mostly (officially) used for european to ascii substitution. Mentioned as alternatives on wiktionary.org. Also for passports :
https://help.cbp.gov/app/answers/detail/a_id/1142/~/esta---name-containing-alphabet-characters-that-are-not-in-the-american-english

Maybe just updating the LATIN_MAP with @codebeauty mapping would be better ?

gabsource on 21 May 2016

Interesting, I wonder why the LATIN_MAP is not set up this way already?

Safest option would probably be to add the GERMAN_MAP before the LATIN_MAP (for preservation) and make sure it is processed in this order: 1. GERMAN -> 2. LATIN

If nobody complains then we can merge GERMAN + LATIN as a single definition later.

daftspunk on 21 May 2016

It comes from this (based on commit message) :
https://github.com/django/django/blob/master/django/contrib/admin/static/admin/js/urlify.js#L5-L16
which is a mapping from the non-ascii to a single ascii char (or two chars for ligatures). Trying to keep the length and readability as closed to the original one.

@codebeauty and wikitionary substitutions seem to be based on their phonetic representation.

Seems like a choice of simplicity vs accuracy. The author of the perl module that inspired the python version talks about german people asking the same in the doc (telling them to patch their code instead of the lib) :
http://search.cpan.org/~sburke/Text-Unidecode-1.27/lib/Text/Unidecode.pm (paragraph entitled WHEN YOU DON'T LIKE WHAT UNIDECODE DOES)

gabsource on 22 May 2016

👍1

As @daftspunk proposed to satisfy people who use a language that needs specific transposition (like german), the solution could be to apply a specific mapping based on configured locale defaulting to the maps as they are now.

var SPECIFIC_MAPS = {
    'de': {
        'Ä': 'AE', 'Ö': 'OE', 'Ü': 'UE',
        'ä': 'ae', 'ö': 'oe', 'ü': 'ue'
    }
};

...
Downcoder.chars = [];

var locale = $('meta[name="backend-locale"]').attr('content');
if( typeof SPECIFIC_MAPS[locale] === 'object') {
   ALL_MAPS.push(SPECIFIC_MAPS[locale]);
}

for (var i=0; i<ALL_MAPS.length; i++) {
...

Implemented in PR #2031
Tested by switching from french to german convert ä to a for french and ä to ae for german.

gabsource on 22 May 2016

@daftspunk I think, the solution you found here is alright. "ä" should indeed be converted to "ae".

alxy on 22 May 2016

The solution looks good. I see the changes inside input.preset.js. I used a fresh clone of the develop branch. The locale is also correct. But "ü" is still "u". Any idea?