Howdy!
A recurring theme I've noticed in most chat programs today (Telegram included đ˘) is that text direction of a given message is determined by the first (relevant) character, which is a shame because:
So I propose a better algorithm which isn't much more complicated to detect the desired direction of a message, count the number of character in each language/direction (excluding links), and the one with more characters in it wins. More formally:
Examples of this algorithm can be seen with Google Hangouts (which is the only chat I can tell that actually has a smarter algorithm than "look at the first character").
Of course, this is a proposal and the concrete algorithm is open to change, but I think that it's a very good compromise between code complexity and correctness.
This would be slow tho, and not easy to implement.
Also, what about emoji, or numbers, or other symbols used in both layouts?
Once per message? (or even once per keystroke?) doesn't sound too slow. It's not something you continually need to do per frame, and if you choose the more UX-y per keystroke, you can just "remember" the current counters for the message on the side, and just increase the relevant one.
As for numbers/emoji/media/symbols, they do not count for either RTL or LTR. (just like they don't today, if the first character is a number, it would look at the second character, etc)

Can you point me towards the relevant piece of code where the direction is selected? I'm willing to PR this.
I can try to alter the way it works for messages in bubbles, but not in the message input field. Is the problem there as well?
The problem is there as well, yes. And I do think that it's good if we can change it there as well, but the bubbles are more prominent (you write once, read many times and by many people). I can prepare an example of how that can be done in a fiddle if you'd like, I don't think it would be more than 10 LoC
Here's an example illustrating what I'm after: https://jsfiddle.net/cv45ku2s/.
This could be further optimized to only consider one character at a time, etc. If you have proper abstractions for the input/transcript combo, it might be a tiny bit trickier, but likely not by much.
It is standardized that the base direction of a given message is determined by the first character. If you do not like the direction in a specific case you can fix it by adding a U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK. See the Unicode Standard Annex â 9 or the German Wikipedia article on bidirectional control characters for other possibilities.
The maintainers should check if the Unicode bidirectionality algorithm is implemented correctly and not invent an own one â after all, it is unevitable that the people at Unicode have planned more shrewdly than any chat application developer can ever do, and also there are libraries for displaying BiDi well â it is quite disappointing that you can run ldd on the Telegram executable and not find any references to HarfBuzz or Pango inât.
And I am serious in this case, go check it, for I have been rather unlucky using the bidirectional control characters from the General Punctuation block inside of Telegram.
Other possibilities would arise if Telegram would support full HTML at least by explicit enabling (not only Markdown), because HTML contains tags and attributes to manipulate bidirectional positioning.
@Socialdarwinist Harfbuzz should be used internally for text shaping in Qt, so it should be used in Telegram as well â perhaps no references because everything is linked statically in the executable.
@Socialdarwinist Your links are broken (Connection Refused), which is a bit alarming on unicode.org. Aside from that,
If you do not like the direction in a specific case you can fix it by adding a U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK
Really? "Enter a character that's not found in any keyboard" is your solution? What about mobile (if/when it reaches there)?
it is inevitable that the people at Unicode have planned more shrewdly than any chat application developer
How did you come to that conclusion? The people in unicode are omniscient now?
I completely disagree that just because there's a standard you have to implement it, even if it's suboptimal. Especially in a chat environment, it doesn't necessarily makes sense to only take the first (relevant) character into account.
How did you come to that conclusion? The people in unicode are omniscient now?
It is because of the same reason whereby open-source software is supposed to be better: If many have interests in it working, many people look on the things. And for Unicode, there are very many proficient people in the environment looking onto things in many stages before publication. If there is something wrong in Unicode, the world has to be blamed. I point out that you claim that the bidirectionality standard is suboptimal while there is no better one visible from your side at least â cocky. If you know an improvement, you can surely initiate the Unicode process to implement it.
Enter a character that's not found in any keyboard
As you might know keyboards do not contain characters but keyboard layouts do. /usr/share/X11/xkb/symbols/ara can easily get a new layout, especially as the default keyboard layout has much room free. I have already played some weeks with the thought of adding bidi signs to it. If somebody fancies to be faster than me, this is what my thoughts have collected to be added to the symbols/ara file in XKB:
U+066B ARABIC DECIMAL SEPARATOR, U+066C ARABIC THOUSANDS SEPARATOR, U+0640 ARABIC TATWEEL, U+200C ZERO WIDTH NON-JOINER
U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING, U+202C POP DIRECTIONAL FORMATTING, U+200E LEFT-TO-RIGHT MARK, U+200F RIGHT-TO-LEFT MARK, U+061C ARABIC LETTER MARK
U+2066 LEFTâTOâRIGHTÂ ISOLATE, U+2067 RIGHTâTOâLEFTÂ ISOLATE, U+2068 FIRSTÂ STRONGÂ ISOLATE, U+2069 POPÂ DIRECTIONALÂ ISOLATE
U+2019 RIGHT SINGLE QUOTATION MARK, U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK, U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
Additionally, there is direct Unicode Input possible in GTK+ (Ctrl+Alt+U) and even better in IMEs like there is in IBus and Fcitx with search by name.
For macOS it is also possible to add keyboard layouts. For Windows, people are doomed for using that system, as keyboard layouts there are binary. One can find keyboard layouts installable on Windows, but the compatibility does not appear to last. People choose to be dependent on Microsoftâs grace, that is what they get.
Using the percent of RTL or LTR characters to determine the direction is unpredictable and very bad UX unless you are working with large text paragraphs. This can be seen on Twitter which seems to implement a similar algorithm and you canât really tell (without counting the characters in your head) if a tweet will end up left to right or right to left.
The first strong character algorithm is at least predictable and can be controlled without rewriting the text to have a different character count. The accessibility of control characters should not be an issue, the application can easily have a RTL/LTR button/shortcut/whatever that inserts RLM/LRM in front of the text before rendering it.
@khaledhosny That's actually a really good option as well. Some button or a control to insert the appropriate Unicode character to control direction, assuming those are supported by all major systems, is something I can definitely get behind.
I have now outpoured my scheme of a new default Arabic keyboard layout. Contriving this has taken my day, and I have yet to put the real Arabic characters to the comments instead of (or in addition to?) transcriptions now used, but with my experience of bringing about XKB layouts it has worked at the first try, so I have published it now this evening; I just keep it a few days for digesting it and to give youâll the opportunity to evaluate it â the new version of xkeyboard-config is scheduled for the 31th of September.
@khaledhosny @behdad or I donât know who else, call your polyglot mates to have a look at it! I have mapped the bidirectional control characters to it except the overriding ones (I donât think LRO and RLO are supposed to be regularly used for text?) and as there has been much unused room on four levels I have mapped all characters additionally used in the Arabic scripts of the Pashto, Sindhi, Punjabi, Urdu, Kashmiri, Turkic and other languages next to the Arabic and Persian letters that have been present in the keyboard layout before my engagement (the same way I have mapped virtually the whole Cyrillic to a Russian-based layout).
I think you can comment at that gist for specific remarks about the layout, as those would be beyond the topic here.
As for this issue here, when that keyboard layout is shipped the issue is solved on Linux â now that, as I have just while writing this comment seen, the default Persian layout already includes the embedding and override characters, and the default Hebrew one the RIGHT-TO-LEFT MARK and the LEFT-TO-RIGHT MARK, and my edition of the Arabic default keyboard layout stretches the signs out.
I dare assume that the suggestion of writing bidirectional characters directly via the keyboard is sore persuasive. But the OP is from Israel according to his profile, so it becomes even more amusing to hear him complain about propositions to âEnter a character that's not found in any keyboardâ, as the Hebrew base layout contains:
key <AE09> { [ 9, parenright, U200E ]}; // LRM; Paren Mirrored
key <AE10> { [ 0, parenleft, U200F ]}; // RLM; Paren Mirrored
What, people canât help themselves because they use Windows or macOS? Then this issue has to be closed because it is an operating system issue.
Hey there!
We're automatically closing this issue since there was no activity in this issue since 398 days ago. We therefore assume that the user has lost interest or resolved the problem on their own. Closed issues that remain inactive for a long period may get automatically locked.
Don't worry though; if this is in error, let us know with a comment and we'll be happy to reopen the issue.
Thanks!
(Please note that this is an automated comment.)
Most helpful comment
Using the percent of RTL or LTR characters to determine the direction is unpredictable and very bad UX unless you are working with large text paragraphs. This can be seen on Twitter which seems to implement a similar algorithm and you canât really tell (without counting the characters in your head) if a tweet will end up left to right or right to left.
The first strong character algorithm is at least predictable and can be controlled without rewriting the text to have a different character count. The accessibility of control characters should not be an issue, the application can easily have a RTL/LTR button/shortcut/whatever that inserts RLM/LRM in front of the text before rendering it.