The space that is followed by Chinese/Japanese/Korean should not be a breaking space.
Like how it works in DOM as well:


The space will be a breaking space no matter what the following character is. The breaking space works correctly in English, but it is not correct with Chinese/Japanese/Korean.

TextMetrics.isBreakingSpace may need to check the next character as well.
If the character is CJK, the space should not be a breaking space.
let isUnbreakSpace = false;
if (typeof nextChar === 'string')
{
const matchedAsCJK = nextChar.match(/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/g);
isUnbreakSpace = !!matchedAsCJK;
}
return (TextMetrics._breakingSpaces.indexOf(char.charCodeAt(0)) >= 0) && !isUnbreakSpace;
const style = new PIXI.TextStyle({
breakWords: true,
fontSize: 13,
fontWeight: "bold",
lineJoin: "bevel",
stroke: "#896161",
whiteSpace: "pre-line",
wordWrap: true,
wordWrapWidth: 285
});
const text = new PIXI.Text('テストテキスト テストテキスト テストテキスト テストテキスト ', style);
pixi.js version: _e.g. 5.1.6Reproduction: https://codepen.io/sukantpal/full/dyXKWKy
The bounding box is a little smaller (^) than the box you're showing the "current behavior".
CJK isn't supported out of the box, atm, and there are a number of issues with it. I was going to make a plugin a long time ago to add support, and some refactor work for Text and TextMetrics allowed the easy addition, but it just never got done, as I suck.
Not sure if it'd be something natively included in PixiJS itself, as its requires knowing what to do with certain symbols, for certain languages, via regex.
I need to take another look. I have a method in my bespoke version of PixiJS, but there I just outright know what language hte game is in, so don't have to be as smart about things
@SukantPal
Thank you for the reproduction. I didn't use the exact same code, so the width is smaller😉
@themoonrat
It sounds like a great idea to allow adding additions to Text and TextMetrics. It is hard to perfectly resolve the problem in all languages, so it might be easier to let the developers solving the problem by themselves via the addition.
This issue actually affects my current project. If there has anything that I can help, please let me know😊
@huang-yuwei Since I'm really struggling to properly contribute these days.... the following is the contents of a patch file. Would be interesting to know if it does solve your issues.
It _does_ rely on looking at certain characters, which I believe is accurate, and it relies on you creating and setting a new LANGUAGE property in settings of what the language is, so PIXI.settings.LANGUAGE = "zh-TW" for example.
The best way would be to auto detect, but that's where I'm unsure if you start to suffer performance penalties using regex's every time. So in this version, I state what the language actually is from my app, and go from there.
diff --git "a/packages/text/src/TextMetrics.ts" "b/packages/text/src/TextMetrics.ts"
index 95e94453..26a368d1 100644
--- "a/packages/text/src/TextMetrics.ts"
+++ "b/packages/text/src/TextMetrics.ts"
@@ -1,3 +1,4 @@
+import { settings } from '@pixi/settings';
import { TextStyle, TextStyleWhiteSpace } from './TextStyle';
interface IFontMetrics {
@@ -8,6 +9,18 @@ interface IFontMetrics {
type CharacterWidthCache = { [key: string]: number };
+/* eslint-disable no-control-regex */
+const regexBasicLatin = /[\u0000-\u00ff]/;
+const regexCannotStartZhCn = /[,!%),.:;>?]}¢¨°·ˇˉ―‖’”„‟†‡›℃∶、。〃〆〈《「『〕〗〞︵︹︽︿﹃﹘﹚﹜!"%'),.:;?]`|}~⦅]/;
+const regexCannotEndZhCn = /[,$(*,£¥·‘“〈《「『【〔〖〝﹗﹙﹛$(.[{£¥]/;
+const regexCannotStartZhTw = /[,!),.:;?]}¢·–—’”•‥„‧ †╴ 、。〆〈《「『〕〞︰︱︲︳︵︷︹︻︽︿﹁﹃﹏﹐﹑﹒﹔﹕﹖﹘﹚﹜!),.:;?]|}、]/;
+const regexCannotEndZhTw = /[,([{£¥‘“‵々〇〉》」〔〝︴︶︸︺︼︾﹀﹂﹗﹙﹛({]/;
+const regexCannotStartJaJp = /[!%),.:;?]}¢°’”‟†‡℃、。〄〆〈《「『〕゛゜ゝゞ・ゝゞ!%),.:;?]}。 」 、 ・ ゙ ゚ ⦅]/;
+const regexCannotEndJaJp = /[$([\{£¥‘“々〇〉》」〔$([{「 ⦆¥]/;
+const regexCannotStartKoKr = /[!%),.:;?\]}¢°’”†‡℃〆〈《「『〕!%),.:;?]}⦅]/;
+const regexCannotEndKoKr = /[$([\{£¥‘“々〇〉》」〔$([{⦆¥₩]/;
+/* eslint-enable no-control-regex */
+
/**
* The TextMetrics object represents the measurement of a block of text with a specified style.
*
@@ -574,7 +587,11 @@ export class TextMetrics
*/
static canBreakWords(_token: string, breakWords: boolean): boolean
{
- return breakWords;
+ const isCJK = settings.LANGUAGE.indexOf('zh-') !== -1
+ || settings.LANGUAGE.indexOf('ja-') !== -1
+ || settings.LANGUAGE.indexOf('ko-') !== -1;
+
+ return breakWords || isCJK;
}
/**
@@ -595,6 +612,59 @@ export class TextMetrics
static canBreakChars(_char: string, _nextChar: string, _token: string, _index: number,
_breakWords: boolean): boolean
{
+ const isCJK = settings.LANGUAGE.indexOf('zh-') !== -1
+ || settings.LANGUAGE.indexOf('ja-') !== -1
+ || settings.LANGUAGE.indexOf('ko-') !== -1;
+
+ if (isCJK)
+ {
+ if (_nextChar)
+ {
+ if (_char === ' ')
+ {
+ return true;
+ }
+
+ if (regexBasicLatin.exec(_char))
+ {
+ return false;
+ }
+
+ let regexCannotStart;
+ let regexCannotEnd;
+
+ if (settings.LANGUAGE === 'zh-CN')
+ {
+ regexCannotStart = regexCannotStartZhCn;
+ regexCannotEnd = regexCannotEndZhCn;
+ }
+ else if (settings.LANGUAGE === 'zh-TW')
+ {
+ regexCannotStart = regexCannotStartZhTw;
+ regexCannotEnd = regexCannotEndZhTw;
+ }
+ else if (settings.LANGUAGE === 'ja-JP')
+ {
+ regexCannotStart = regexCannotStartJaJp;
+ regexCannotEnd = regexCannotEndJaJp;
+ }
+ else if (settings.LANGUAGE === 'ko-KR')
+ {
+ regexCannotStart = regexCannotStartKoKr;
+ regexCannotEnd = regexCannotEndKoKr;
+ }
+
+ if (regexCannotEnd.exec(_char) || regexCannotStart.exec(_nextChar))
+ {
+ return false;
+ }
+ }
+ else
+ {
+ return false;
+ }
+ }
+
return true;
}
@themoonrat
Thank you for providing the codes. I have tried it in my environment.
I replaced the canBreakChars and canBreakWords with your code, and the breaking space problem still there.
Besides, after checking the use-cases in Ja and Zh, the text is not always in the same languages. Maybe setting a stable language may not be useful.
PIXI株式会社, Kenさん.

- テキストテキストテキストテキストテキストテキスト

Even though it is not working, you gave me e a direction to think about this issue. I didn't there was a canBreakChars exists.
I will also try it by myself, and hope can share my process with you soon😀
@huang-yuwei I've had a dig out with old PRs, and remembered https://github.com/pixijs/pixi.js/pull/4447
So, that PR was made making changes to TextMetrics ... but at the time we were making API changes that exposed canBreakChars and canBreakWords which could theoretically make it possible to make a plugin, that would override those function, and used them to make CJK text work. So his PR was good, but we were hoping to update it to use the new functions, but never got around to it.
I converted by own hacky version of CJK text to use these new exposed functions, but still not perfect, not as a plugin, and not good enough for public consumption.
If you could be that missing link to bring this all together and get CJK working that'd be amazing! That other PR has some unit tests that might help, too.
If you need any help, let me know :)
@themoonrat Thank you for sharing. I will take look at the PR 😊
@themoonrat
Hi, I had figured out some methods to get CJK working, and would like to hear feedback from you 🙏
The main target of my methods is aiming for three purposes.
The changes I have made are:
canBreakChars from the plugin.const regexCannotStart = new RegExp(
`${regexCannotStartZhCn.source}|${regexCannotStartZhTw.source}|${regexCannotStartJaJp.source}|${regexCannotStartKoKr.source}`,
);
const regexCannotEnd = new RegExp(
`${regexCannotEndZhCn.source}|${regexCannotEndZhTw.source}|${regexCannotEndJaJp.source}|${regexCannotEndKoKr.source}`,
);
PIXI.TextMetrics.canBreakChars = function canBreakChars(char, nextChar) {
if (nextChar) {
if (regexCannotEnd.exec(char) || regexCannotStart.exec(nextChar)) {
return false;
}
}
return true;
};
```
isBreakingSpace from the pluginnextChar in isBreakingSpace as well. Therefore, I suggest applying this change in PIXI. (If you are okay with it, I will push a PR later).PIXI.TextMetrics.isBreakingSpace = function isBreakingSpace(char, nextChar) {
if (typeof char !== 'string') {
return false;
}
const isBreakingSpace =
PIXI.TextMetrics._breakingSpaces.indexOf(char.charCodeAt(0)) >= 0;
if (isBreakingSpace && nextChar) {
const unbreakableSpace = !regexBasicLatin.exec(nextChar);
if (unbreakableSpace) return false;
}
return isBreakingSpace;
};
```
You can test it at here as well: https://codepen.io/huang-yuwei/pen/GRqbEmm
Thank you very much for the codes and shared #4447 with me. These are all very helpful.
we need to receive the nextChar in isBreakingSpace as well. Therefore, I suggest applying this change in PIXI.
I'm happy for this change to take place, yes. It doesn't effect current behaviour, and it'll follow the same param order as a different function that already requires it, canBreakChars.
Don't see anything wrong at a glance with what you've done... once the plugin has a repo or a pr, then I can easily replace my hacky version with your nice version and experiment with a number of games I have and see if I can spot any issues :)
This is exciting!
@themoonrat thank you for the comment and all of the help🙌
It does sound exciting!
I have created PR #7023, and will try the plugin later.
Will keep in touch with you 😉
Most helpful comment
CJK isn't supported out of the box, atm, and there are a number of issues with it. I was going to make a plugin a long time ago to add support, and some refactor work for Text and TextMetrics allowed the easy addition, but it just never got done, as I suck.
Not sure if it'd be something natively included in PixiJS itself, as its requires knowing what to do with certain symbols, for certain languages, via regex.
I need to take another look. I have a method in my bespoke version of PixiJS, but there I just outright know what language hte game is in, so don't have to be as smart about things