Twig: Method "trim" corrupts UTF-8 string

Created on 27 Jun 2018  Â·  2Comments  Â·  Source: twigphp/Twig

Method trim corrupts UTF-8 string on a specific symbol "л":

{{ 'ж'|trim('»') }} – ok, outputs "ж"
{{ 'л'|trim('»') }} – not ok, outputs "�"

Here's a fiddle https://twigfiddle.com/z0thnd

Most helpful comment

Because » is a multibyte char \x0A\xC2\xBB and the parameter of trim is a char list.
And ж is \xD0\xB6 and л is \x0A\xD0\xBB.

So what you are doing is actually

<?php

trim('\xD0\xB6', '\x0A\xC2\xBB'); // trim('ж', '»');
trim('\x0A\xD0\xBB', '\x0A\xC2\xBB'); // trim('л', '»');

which means trailing \x0A, \xC2 and \xBB would be removed. Resulting in

'\xD0\xB6' // trim('ж', '»') => 'ж'
'\x0A\xD0' // trim('л', '»') => invalid UTF-8 sequence

If you want to trim the real », a mb_* or preg_* function should be used.
Not sure if there is a built-in Twig way to deal with it though.


Reference:

All 2 comments

Because » is a multibyte char \x0A\xC2\xBB and the parameter of trim is a char list.
And ж is \xD0\xB6 and л is \x0A\xD0\xBB.

So what you are doing is actually

<?php

trim('\xD0\xB6', '\x0A\xC2\xBB'); // trim('ж', '»');
trim('\x0A\xD0\xBB', '\x0A\xC2\xBB'); // trim('л', '»');

which means trailing \x0A, \xC2 and \xBB would be removed. Resulting in

'\xD0\xB6' // trim('ж', '»') => 'ж'
'\x0A\xD0' // trim('л', '»') => invalid UTF-8 sequence

If you want to trim the real », a mb_* or preg_* function should be used.
Not sure if there is a built-in Twig way to deal with it though.


Reference:

Closing as @jfcherng gave the answer

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dvladimirov77 picture dvladimirov77  Â·  5Comments

Bilge picture Bilge  Â·  3Comments

garak picture garak  Â·  5Comments

CriseX picture CriseX  Â·  4Comments

yguedidi picture yguedidi  Â·  4Comments