Combining, surrogate or fullwidth chars in the line and/or the search string lead to weird selection offset problems. Steps to repro:
echo -en 'combining: ééé\nfullwidth: ¥¥¥\nsurrogate: 𓂀𓂀𓂀\n'The selection is kinda off for all 3 types, it gets even worse if the line contains any of these before their occurence. It seems the renderer and the selection manager do not agree on the chars widths and lengths.
Since I had a similar problem with the linkifier, it might be fixable the same way (#1678).
Seems to work fine for me on mac/master, let me know if you still see it.
@Tyriar Nope its not gone, still the same here. Maybe its a platform issue?
Looks like this atm:


Looks like the accent char is accounted for 2 halfwidth chars by the selector, while the ¥ symbol gets treated as one halfwidth.
Found this in the code:
https://github.com/xtermjs/xterm.js/blob/9e446a9a0e9f62899c450b28d78877b81a19724d/src/addons/search/SearchHelper.ts#L210
Imho the last argument should be the sum of wcwidth instead of the string length (not tested yet).
@jerch are you on Linux?
Yes Ubuntu 16 here.
I guess we need to have a setting for this stuff like you were suggesting before. Still not sure the best way of querying the platform for these character widths though, I doubt we can rely on all Linux distros being the same and macOS being a different case.
Same underlying issue to https://github.com/xtermjs/xterm.js/issues/1059?
Nope, this time its not wcwidth's fault, changing the argument I mentioned above fixes the problems (tested a few minutes ago)
Currently blocked by #1707 and #1709.
Some background on this:
The way the start and end pos of the selection is determined still does not work for all surrogates and fullwidth chars combinations - thus if there are any of those in the line before the match or in the match itself start and end offsets can occur.
This can be fixed the same way I had to fix the linkifier underlining in #1769, by mapping a string index back to the buffer index:
https://github.com/xtermjs/xterm.js/blob/c7fa89da8e97e907cdfb72b23eabf5c3a5d1bb9e/src/Linkifier.ts#L223
If done twice (match start and end) the selection will correctly point to the underlying cells.
This still happens. This is the problem line:
result.term.length for ééé is 6, the fix likely involves returning from _fineInLine an end row and col instead of the actual term.
Hello, I would like to join my peer @miggs125 in contributing to xterm by tackling this issue.
I will first attempt to improve selection of strings that include diacritical marks.
@Silvyre Sure thing. Note that the terminal buffer already accounts diacritical characters into one cell with the main character, thus the issue comes from the string position to cell back-mapping.
@jerch Thanks!
Note that the terminal buffer already accounts diacritical characters into one cell with the main character
Are you referring to the JoinedCellData type? As far as I can tell, this base type is not currently used within search selection (search selection appears to handle buffer cells as objects of IBufferCell type, which is not part of the ICellData hierarchy).
changing the argument I mentioned above fixes the problems (tested a few minutes ago)
Starting to get on the same page. OK, modifying _findInLine to return getStringCellWidth(term) instead of term appears to improve the selection of diacritical characters, e.g. ééé (at least on Ubuntu 18.04; getStringCellWidth() calls wcwidth(), which may perform differently on other platforms?).
I can't imagine this to be a satisfactory solution, considering that, as you mentioned, this does not work for all surrogate/fullwidth character combinations [across various platforms] (e.g. selection of ¥ is still not great, at least on Ubuntu 18.04).
selection of ¥ is still not great, at least on Ubuntu 18.04
To clarify, it sometimes works, as shown in this GIF, which I created after replacing every instance of line/term/cell.length with getStringCellWidth(...) in SearchAddon.ts. I'm going to try to tweak the find functions a bit more and see if I can improve behaviour that way.
@Silvyre Yes working with wcwidth correction is the right way to go here. Imho needed once for the search term itself (in case it contains weird chars) to get the amount of cells taken ("cell length"), then you'd need to correct every start offset found likewise to find the real cell offset. That cell-offset + term-cell-length % cols should give the real start and end position in the buffer.
@jerch Excellent, I'll work on that. Thanks again!
I have a general question regarding addons and dependencies: how are helper functions in src/common (e.g. getStringCellWidth, wcwidth from CharWidth.ts) imported into addons (e.g. addons/xterm-addon-search)?
@Silvyre They arent yet, the public API gets extended on request. Thus you'd have to go with internal refs for now. Maybe open an issue regarding this so we can decide how and where to put it.
Sure thing, I'll open an issue.
you'd need to correct every start offset found likewise to find the real cell offset
@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?
I've also noticed that selectionEnd appears to spend most of its time undefined, while finalSelectionEnd gets defined. A related bug, maybe?
Maybe, it's meant to be undefined for various types of selection if I remember right though (word, line, select all).
@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?
Ah yepp thats abit hidden in the codebase, the code regarding this is in Buffer.ts and BufferLine.ts, both contain several methods that demostrate how to walk cells, easiest startpoint might be this: https://github.com/xtermjs/xterm.js/blob/e8153d929d6bb4f7012d3f20aa8c74abc335715d/src/common/buffer/Buffer.ts#L480
Not sure if you can directly use this method, you have to take care where your string index origin is (whether col 0 of wrapped or unwrapped lines).

I'm using VSCode(1.50.0) on macOS Catalina(10.15.6) and this issue still happening.
@Tyriar Hi, so the issue has any solutions? I tried to load xterm-addon-unicode11, it can only fix emoji chars viewing but searching for Chinese chars still having the issue.
Been a while since I looked at this code but I think we could expose the active IUnicodeVersionProvider's wcwidth to extensions via IUnicodeHandling.activeProvider.wcwidth or similar to solve this.