May be related to #698
7.3. Shaping Across Element Boundaries
https://drafts.csswg.org/css-text-3/#boundary-shaping
Sometimes a picture is used instead of a letter in a word. For example, a suggestive image in an advertisement or logo etc.


(Taken from https://www.youtube.com/watch?v=MV9-7hxMdb8, at the very beginning)
The big text in Arabic picture above is "أجي تفهم", meaning "come to understand".
The letter Tah ت joins with the Feh ف replaced by a kind of bulb (to suggest a tip/idea).
In HTML, a picture could be an emoji or more likely an <img> element. However, it is an inline element with zero m/b/p. One must use an explicit markup, e.g. zwj or tatweel sign, to get the joined shape.
Should this case (image as letter) be considered at the specification level, or is it an "effective change in formatting", and thus up to the user to try to work around it ?
Interesting. I would have expected this to be possible by using Unicode zero-width joiner characters (aka, ‍ entity) between the letters and the letters-as-image. The ZWJ is supposed to be treated as a "word" character and therefore it should trigger the connected versions of the glyph before/after it. As I understand, this is a common structure in many Indic scripts. But I couldn't find any existing examples of using it for Arabic.
I found that this approach worked in Chrome, but only if there was markup on the other side of the joiner. The ZWJ couldn't trigger a medial form between the Arabic letter and another Unicode character, such as an emoji.
In other browsers, the ZWJ didn't have any effect. Although on the plus side, Firefox and Edge do use medial forms even when letters are separated by markup.
Samples (also available as a CodePen):
Unicode does have designated characters for each Arabic glyph form, so you could use those to force the desired rendering. But that does seem sub-optimal.
If you want to pursue the use of ZWJ for forcing Arabic connected forms, the correct forum would be Unicode + browser bug trackers.
For handling the issue in CSS: Are you thinking of a property that could be set on an element (e.g., a <span> or an <img>) to indicate that the element should be treated as "word" characters and therefore trigger joining forms on either side? Would you want joining to be the default behavior?
Forgot to cite this relevant guidance by @r12a (who can offer much more guidance than I about what "should" happen):
Likewise, if you need to display a joining form where there would not normally be one, such as in the Persian phrase ۱۳۹۵ ه.ش., you would use U+200D ZERO WIDTH JOINER (in this case, immediately after U+0647 ARABIC LETTER HEH).
https://github.com/w3c/alreq/wiki/Should-I-use-the-Arabic-Presentation-Forms-provided-in-Unicode%3F
Although on the plus side, Firefox and Edge do use medial forms even when letters are separated by markup.
I believe this is the desired behavior that plain span should have nothing to do with text runs. Chrome fails to do so is an implementation bug.
But yeah, if ZWJ should do the work, then browser bugs should be filed for that, and ideally we may want some web-platform test to cover such behavior.
Agreed that this should be solved by using ZWJ, and that markup should have no effect on joining behavior.
CSS Text has a normative reference to Unicode, which pulls in the Arabic shaping and ZWJ handling requirements here: https://drafts.csswg.org/css-text-3/#text-encoding
It also specifies how markup affects shaping here: https://drafts.csswg.org/css-text-3/#boundary-shaping
And @frivoal is working on test cases for ZWJ's effect on Arabic shaping here: https://github.com/web-platform-tests/wpt/pull/14673
I think that means this issue is closed spec-wise. Some advice from anyone with insight on font interactions with shaping (@litherum / @upsuper / @jfkthame?) might be helpful in the testcase thread, though.
@ntounsi Does this work for you?
Thanks @fantasai, works for me.
Any way, I'm still confused about where (in 7.3. Shaping Across Element Boundaries) this case, image as a letter, falls :
1) Is it "an inline box boundaries" where m/b/p are zero (then nust not break shaping)
or is this :
2) an "effective change in formatting" (could break shaping)
3) an "impossible shaping across boundaries" ?
My guess is 3). Then must break shaping.
If true, I would suggest to add this example at the end of EXAMPLE 22.
I pulled together some quick tests that show that failures to respect zwj behaviour may be due to:
See the two comments starting at https://github.com/web-platform-tests/wpt/pull/14673#issuecomment-453519975 for links and more details.
On the separate issue of whether markup interrupts cursive joining, see our tests and results at https://w3c.github.io/i18n-tests/results/css-text-shaping
My personal opinion is that text should not automatically connect with images, since the browser has no way of knowing whether or not that is appropriate – zwj should be used by the content author for the (probably vanishingly rare) occurrences where it should. The spec text, however, doesn't seem to explicitly proscribe it. Should it perhaps say something like " shaping must not be broken across inline box boundaries with text on either side when" (highlighted just to show the change)?
hth
Any way, I'm still confused about where (in 7.3. Shaping Across Element Boundaries) this case, image as a letter, falls :
1. Is it "an inline box boundaries" where m/b/p are zero (then nust not break shaping) or is this : 2. an "effective change in formatting" (could break shaping) 3. an "impossible shaping across boundaries" ?My guess is 3). Then must break shaping.
If true, I would suggest to add this example at the end of EXAMPLE 22.
The box boundaries the spec is talking about are those that "separate two typographic character units". An image is not a typographic character unit. Even if it happens to be an image of a letter.
Where the spec talks about shaping across inline boundaries, it is referring to situations where there is text on each side of the boundary (and so there could reasonably be a question as to whether text shaping is continuous across the boundary, or applies separately to the two fragments).
In the case of an inline image, there's no question: we have a boundary between text and non-text (image). Images are not text, and do not participate in shaping.
So if the desired result is that the text before (and/or after) the image should be shaped as though the image were actually a letter, and participated in shaping (such as cursive Arabic joining), this must be explicitly controlled by the author (using ZWJ to trigger joining forms where appropriate).
this must be explicitly controlled by the author (using ZWJ to trigger joining forms where appropriate).
This answer my first question : "Should this case (image as letter) be considered at the specification level, or is it [...] to the user to try to work around it ?"
Most helpful comment
The box boundaries the spec is talking about are those that "separate two typographic character units". An image is not a typographic character unit. Even if it happens to be an image of a letter.
Where the spec talks about shaping across inline boundaries, it is referring to situations where there is text on each side of the boundary (and so there could reasonably be a question as to whether text shaping is continuous across the boundary, or applies separately to the two fragments).
In the case of an inline image, there's no question: we have a boundary between text and non-text (image). Images are not text, and do not participate in shaping.
So if the desired result is that the text before (and/or after) the image should be shaped as though the image were actually a letter, and participated in shaping (such as cursive Arabic joining), this must be explicitly controlled by the author (using ZWJ to trigger joining forms where appropriate).