In the glossary of css-ruby, the terms are marked as lang="zh/ko/ja", although they're written in English.
Source file: https://github.com/w3c/csswg-drafts/blob/c5115eaa46799f50d0798ac9f1f5a094d589d4e5/css-ruby-1/Overview.bs#L1487-L1529
Yeah... I see no reason why they need to be marked that way.
They are not really English, though. Strictly speaking they are romanization of words in those languages. Not sure how they should be marked.
You could use zh-Latn or zh-Latn-pinyin I guess.
(For the first one, at least. I didn't look up in http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry what script names to use in the other languages.)
I believe zh-Latn , ja-Latn and ko-Latn would be correct.
I think this is one of the scenarios where language tagging is not clear-cut, and very much falls back on the question: "What effect are you hoping to achieve by tagging for language?" (Or, actually, here perhaps what effect do you not want to produce?) Here's my 2p:
If you label these terms as described above you'll find that the browser applies different CJK fonts to the text unless overridden by styling (not the case here), which to me looks quite odd because they aren't usually written that way in the native language either. For myself, i see these as technical terms used in English that happen to be derived from CJK terms. I wouldn't put any language tag on them, because:
a. i don't know what that would achieve,
b. actually it can be detrimental in terms of styling
Arguably, UAs ought to be responding to the full language tag, including the writing-system, and not switch to Chinese if it sees lang="zh-Latn". However, even if that were true, I now think it's still be wrong to tag them that way: if we think a a screen reader instead, we wouldn't expect it to switch to a Chinese voice when reading "bopomofo". The document is in English and to be read / listened to by English speakers, and there's no benefit in switching to Chinese phonetics.
So I agree with @r12a: these are better seen technical terms used in English that happen to be derived from CJK terms, best left without a language tag.
Most helpful comment
I think this is one of the scenarios where language tagging is not clear-cut, and very much falls back on the question: "What effect are you hoping to achieve by tagging for language?" (Or, actually, here perhaps what effect do you not want to produce?) Here's my 2p:
If you label these terms as described above you'll find that the browser applies different CJK fonts to the text unless overridden by styling (not the case here), which to me looks quite odd because they aren't usually written that way in the native language either. For myself, i see these as technical terms used in English that happen to be derived from CJK terms. I wouldn't put any language tag on them, because:
a. i don't know what that would achieve,
b. actually it can be detrimental in terms of styling