Xterm.js: Selection with search and unicode

Created on 13 Sep 2018 · 27Comments · Source: xtermjs/xterm.js

Combining, surrogate or fullwidth chars in the line and/or the search string lead to weird selection offset problems. Steps to repro:

insert into demo: echo -en 'combining: ééé\nfullwidth: ￥￥￥\nsurrogate: 𓂀𓂀𓂀\n'
search for 'ééé', '￥￥￥' and '𓂀𓂀𓂀'

The selection is kinda off for all 3 types, it gets even worse if the line contains any of these before their occurence. It seems the renderer and the selection manager do not agree on the chars widths and lengths.

Since I had a similar problem with the linkifier, it might be fixable the same way (#1678).

areselection good first issue help wanted typbug

Source

jerch

All 27 comments

Seems to work fine for me on mac/master, let me know if you still see it.

Tyriar on 17 Sep 2018

@Tyriar Nope its not gone, still the same here. Maybe its a platform issue?

Looks like this atm:
grafik
grafik

Looks like the accent char is accounted for 2 halfwidth chars by the selector, while the ￥ symbol gets treated as one halfwidth.

Found this in the code:
https://github.com/xtermjs/xterm.js/blob/9e446a9a0e9f62899c450b28d78877b81a19724d/src/addons/search/SearchHelper.ts#L210

Imho the last argument should be the sum of wcwidth instead of the string length (not tested yet).

jerch on 24 Sep 2018

@jerch are you on Linux?

Tyriar on 24 Sep 2018

Yes Ubuntu 16 here.

jerch on 24 Sep 2018

I guess we need to have a setting for this stuff like you were suggesting before. Still not sure the best way of querying the platform for these character widths though, I doubt we can rely on all Linux distros being the same and macOS being a different case.

Tyriar on 24 Sep 2018

Same underlying issue to https://github.com/xtermjs/xterm.js/issues/1059?

Tyriar on 24 Sep 2018

Nope, this time its not wcwidth's fault, changing the argument I mentioned above fixes the problems (tested a few minutes ago)

jerch on 24 Sep 2018

👍1

Currently blocked by #1707 and #1709.

jerch on 26 Sep 2018

Some background on this:
The way the start and end pos of the selection is determined still does not work for all surrogates and fullwidth chars combinations - thus if there are any of those in the line before the match or in the match itself start and end offsets can occur.

This can be fixed the same way I had to fix the linkifier underlining in #1769, by mapping a string index back to the buffer index:
https://github.com/xtermjs/xterm.js/blob/c7fa89da8e97e907cdfb72b23eabf5c3a5d1bb9e/src/Linkifier.ts#L223

If done twice (match start and end) the selection will correctly point to the underlying cells.

jerch on 16 Dec 2018

This still happens. This is the problem line:

https://github.com/xtermjs/xterm.js/blob/cad9477eef22c0e505337c1f634f9f35a8804edc/addons/xterm-addon-search/src/SearchAddon.ts#L345

result.term.length for ééé is 6, the fix likely involves returning from _fineInLine an end row and col instead of the actual term.

Tyriar on 7 Oct 2019

Hello, I would like to join my peer @miggs125 in contributing to xterm by tackling this issue.

I will first attempt to improve selection of strings that include diacritical marks.

Silvyre on 28 Oct 2019

@Silvyre Sure thing. Note that the terminal buffer already accounts diacritical characters into one cell with the main character, thus the issue comes from the string position to cell back-mapping.

jerch on 29 Oct 2019

👍1

@jerch Thanks!

Note that the terminal buffer already accounts diacritical characters into one cell with the main character

Are you referring to the JoinedCellData type? As far as I can tell, this base type is not currently used within search selection (search selection appears to handle buffer cells as objects of IBufferCell type, which is not part of the ICellData hierarchy).

Silvyre on 30 Oct 2019

changing the argument I mentioned above fixes the problems (tested a few minutes ago)

Starting to get on the same page. OK, modifying _findInLine to return getStringCellWidth(term) instead of term appears to improve the selection of diacritical characters, e.g. ééé (at least on Ubuntu 18.04; getStringCellWidth() calls wcwidth(), which may perform differently on other platforms?).

I can't imagine this to be a satisfactory solution, considering that, as you mentioned, this does not work for all surrogate/fullwidth character combinations [across various platforms] (e.g. selection of ￥ is still not great, at least on Ubuntu 18.04).

Silvyre on 30 Oct 2019

selection of ￥ is still not great, at least on Ubuntu 18.04

To clarify, it sometimes works, as shown in this GIF, which I created after replacing every instance of line/term/cell.length with getStringCellWidth(...) in SearchAddon.ts. I'm going to try to tweak the find functions a bit more and see if I can improve behaviour that way.

Silvyre on 30 Oct 2019

@Silvyre Yes working with wcwidth correction is the right way to go here. Imho needed once for the search term itself (in case it contains weird chars) to get the amount of cells taken ("cell length"), then you'd need to correct every start offset found likewise to find the real cell offset. That cell-offset + term-cell-length % cols should give the real start and end position in the buffer.

jerch on 30 Oct 2019

👍1

@jerch Excellent, I'll work on that. Thanks again!

Silvyre on 30 Oct 2019

I have a general question regarding addons and dependencies: how are helper functions in src/common (e.g. getStringCellWidth, wcwidth from CharWidth.ts) imported into addons (e.g. addons/xterm-addon-search)?

Silvyre on 30 Oct 2019

@Silvyre They arent yet, the public API gets extended on request. Thus you'd have to go with internal refs for now. Maybe open an issue regarding this so we can decide how and where to put it.

jerch on 30 Oct 2019

Sure thing, I'll open an issue.

Silvyre on 30 Oct 2019

you'd need to correct every start offset found likewise to find the real cell offset

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

Silvyre on 31 Oct 2019

I've also noticed that selectionEnd appears to spend most of its time undefined, while finalSelectionEnd gets defined. A related bug, maybe?

Silvyre on 31 Oct 2019

Maybe, it's meant to be undefined for various types of selection if I remember right though (word, line, select all).

Tyriar on 31 Oct 2019

👍1

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

Ah yepp thats abit hidden in the codebase, the code regarding this is in Buffer.ts and BufferLine.ts, both contain several methods that demostrate how to walk cells, easiest startpoint might be this: https://github.com/xtermjs/xterm.js/blob/e8153d929d6bb4f7012d3f20aa8c74abc335715d/src/common/buffer/Buffer.ts#L480

Not sure if you can directly use this method, you have to take care where your string index origin is (whether col 0 of wrapped or unwrapped lines).

jerch on 31 Oct 2019

👍1

I'm using VSCode(1.50.0) on macOS Catalina(10.15.6) and this issue still happening.

JasinYip on 15 Oct 2020

@Tyriar Hi, so the issue has any solutions? I tried to load xterm-addon-unicode11, it can only fix emoji chars viewing but searching for Chinese chars still having the issue.

JasinYip on 18 Jan 2021

Been a while since I looked at this code but I think we could expose the active IUnicodeVersionProvider's wcwidth to extensions via IUnicodeHandling.activeProvider.wcwidth or similar to solve this.

Tyriar on 19 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Browser crash related to fit addon returning geometry with "Infinity" sizes

vincentwoo · 26Comments

Problem with emojis/unicode (assumed double-width?)

mofux · 24Comments

Running clear in terminal removes viewport content from buffer instead of hiding it

Tyriar · 31Comments

Can xterm attach use socket. io?

BengBu-YueZhang · 63Comments

Buffer performance improvements

Tyriar · 73Comments