Notepad-plus-plus: [Feature Request] Make ZERO WIDTH SPACE have some visual component

Created on 19 May 2020  Â·  51Comments  Â·  Source: notepad-plus-plus/notepad-plus-plus

Try sorting this:

https://pbs.twimg.com/media/
https://twitter.com/
​https://pbs.twimg.com/media/

Somehow, the last URL is positioned wrong.

Wait.., there is a zero-width character on the third (before the h in https) which is "​" (select to see that character). I think NP++'s show symbol should also show such characters.

accepted scintilla dependent

Most helpful comment

Here's the complete list as of now, that I plan to implement:

image

For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide. I don't know if there is a standard abbreviation...does anyone?

Comments are most welcome.

More "simple and stupid" spec'cing out of this feature to come...

All 51 comments

The character is U+200B (Unicode name: ZERO WIDTH SPACE).

I think NP++'s show symbol should also show such characters.

What's your opinion on what it should look like?

A yellow line, much like when you make it show spaces and tabs.

@GhbSmwc I took the liberty of renaming your issue; hopefully that is okay with you. :-)

This is going to depend upon Scintilla.

This was "addressed" by the Scintilla project back in 2013: https://sourceforge.net/p/scintilla/feature-requests/980/

Their response was to add an API so that you can represent any character you want, "specially":
https://www.scintilla.org/ScintillaDoc.html#SCI_SETREPRESENTATION

The representation mechanism is the "white text on black box" variety, much like line-endings look when visually enabled in Notepad++:

image

Here's an example of what it would look like to set the representation to "" (an empty string), the narrowest possible thing, which is harmonious with a "zero-width" character:

image

I suppose it is "workable" but I'd say it isn't ideal.

This request has come up more than once if you look back through the Issues history. I encountered this problem when "screen scraping" data out of certain HTML pages.

If this is not taken up, then maybe add a Caution to NPP interface to say that the option "View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.

Only way to detect a Zero Width Space is to move a cursor along a line of text while watching the character position number in the status bar,

@Daksol

This request has come up more than once

And note that it remains an "open" request

"View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.

The "Show All Characters" verbage is historical and comes from a time when "space", "tab" and "end-of-line" characters were the only things that weren't shown. Today that menu item might be "Show space, tab, CR, LF"

The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5062#issuecomment-699563782

You can have these characters visible RIGHT NOW if you are willing to use a Pythonscript to control them.

@sasumner

The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: #5062 (comment)

The issue https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5062 talked about the BOM, which is visible, but removed by Notepad++, so I won't reopen it since it's not an issue. OTOH, I agree the invisible Unicode character is an issue.
So let's add Show Zero Width Non Breaking Space menu command between Show End Of Line & Show All Character, and make Show All Character command shows also ZWNBS chars.

@donho

Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)

5062 was also used as a springboard for the general discussion of invisible characters.

let's add Show Zero Width Non Breaking Space menu command

But...there are many more invisible UTF-8 characters than that, that users could want.
Hmmm...
Let me come up with a proposal for how to best handle this?

Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)

Yes, HexEditor does some dirty hacking to show these 3 bytes.

But...there are many more invisible UTF-8 characters than that, that users could want.

Then let's add "Show all invisible character" menu command - I don't think users care about showing which kind of invisible characters.

Let me come up with a proposal for how to best handle this?

OK. Please keep it simple & stupid.

@sasumner. Thanks for getting this back on the program, so to speak.

Some thoughts on what would be best way of handling hidden characters - this derived from the usecases I was encountering when I first discovered zero width space and friends.

My real world interaction with hidden chars has often been when troubleshooting web pages, urls et al. And what I have often found myself doing is putting a string into (say Excel) and adding formulae to show the Unicode code for each item. That then shows up the non-breaking spaces, zero-width spaces etc. Not very convenient to say the least.

  • For the most common hidden, unprintable and whitespace characters

    • Examples: CR, LF, Tab, Space as at present, (suggest adding NonBreakingSpace here)

    • With Show-All-Chars:



      • show as descriptive block - exactly what happens in Notepad++ currently



  • For other unprintable chars

    • Examples: Zero width space

    • With Show-All-Chars:



      • either show as dark-shaded rectangular block, and with onmouseover show the hex value


      • or as highlighted group of characters showing including Unicode value, eg _%200B_



@Daksol said

Some thoughts on what would be best way of handling hidden characters...

Notepad++ only has limited control over the display; the display is primarily controlled by Scintilla.
The only possible thing that N++ can do with these characters is what is shown with CR and LF in my previous posting HERE.
So any such characters would have to look like those.

@sasumner But indicator for this not_visible_chars must be the orange dot (same as for standard space)? I would prefer to somehow distinguish a normal space from the other characters.

@ArkadiuszMichalski said:

must be the orange dot

Scintilla controls the "orange dot" behavior, for but only for U+0020 character.
The best Notepad++ can do for making any new characters visible is like what it does for CR and LF.
In the default theme this means whiteish characters on a black background.

So here's a sampling of what could be done, stolen from a forgotten discussion thread on the Community site:

image

OK thx, looks good, at least for me.

Here's the complete list as of now, that I plan to implement:

image

For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide. I don't know if there is a standard abbreviation...does anyone?

Comments are most welcome.

More "simple and stupid" spec'cing out of this feature to come...

@sasumner
Are they (in the list above) all not displayed currently in Notepad++ ?

Are they (in the list above) all not displayed currently in Notepad++ ?

Correct.
I do not know if the list is complete, but all of those currently in the above list are currently invisible in N++.
Unless someone says "You forgot about 'foo' !" we can start with the above list.

@donho

Here's the file the above image was made from, if you want to "see" how N++ displays the invisible characters:

invisible_utf8_chars.txt

BTW, here is what is displayed for the invisible_utf8_chars.txt in Windows' notepad.exe :

image

@sasumner
For me FF, NEL, LS & PS are displayed:

image

My debug info:

Notepad++ v7.9.1   (32-bit)
Build time : Nov  2 2020 - 01:03:56
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Enterprise (64-bit) 
OS Version : 2004
OS Build : 19041.630
Current ANSI codepage : 1252
Plugins : DSpellCheck.dll HexEditor.dll mimeTools.dll NppConverter.dll NppExport.dll NppXmlTreeviewPlugin.dll 

Notice U+202E.
There may be some additional effort for Notepad++ (Scintilla) to actually respect something like the "right-to-left override" and actually display sections of a file in RTL form.

Here's what the Scintilla demo editor SciTE/Sc1 (v.4.3.0) shows for the invisible_utf8_chars.txt file:

image

@donho said:

For me FF, NEL, LS & PS are displayed

I can confirm that -- Oops, my error on those.
We can drop them from the list.

Notice U+202E.

Disregard what I said earlier about this.
It appears to already be working (reference @donho 's earlier screenshot -- the line with U+202E is in RTL).
I'm not sure yet why it isn't showing RTL for me, or in Scintilla demo editor!

@sasumner

For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide

So with one command, will all the invisible Unicode chars be shown as your screenshot (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8284#issuecomment-734023781) ?

Here is what I see in N++ 7.9.1 -- why don't I see the same as @donho for 2060, 2066 - 2069 ??

image

@donho said

So with one command, will all the invisible Unicode chars be shown as your screenshot

I didn't get to that part of the "simple & stupid" spec yet :-)
But yes, one "user" command.
As to Scintilla commands, it takes one SCI_SETREPRESENTATION call for each UTF+8 character you want to set a representation for.
Or, alternatively, to turn off with SCI_CLEARREPRESENTATION

@donho

I case you can't tell, I really like the "simple & stupid" thing :-)

Here is what I see in N++ 7.9.1 -- why don't I see the same as @donho for 2060, 2066 - 2069 ??

I have tested on my another laptop, both have the same result. I guess you are under the OS older than mine ?

image

Notepad++ v7.9.1 (64-bit)
Build time : Nov 2 2020 - 01:07:46
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 2004
OS Build : 19041.630
Current ANSI codepage : 1252
Plugins : HexEditor.dll JSMinNPP.dll mimeTools.dll NppConverter.dll NppExport.dll

@donho says:

I guess you are under the OS older than mine ?

Slightly older:

OS Name : Windows 10 Enterprise (64-bit) 
OS Version : 1809
OS Build : 17763.1518
Current ANSI codepage : 1252

This introduces the complication of how well this entire feature works for all users of N++, under so many various versions of Windows...

I probably should do a larger survey of how other editors display this invisible_utf8_chars.txt file...

VS2019 text editor does absolutely nothing special in showing this file. It is like the invisible characters are not even there.

I got this:
image

@ArkadiuszMichalski @donho So there's 3 of us in our sampling, and all 3 get different results.
Maybe attempting to do something for this feature is just asking for trouble, what with all the variation?

Uploading a new version of the file, this time with a byte-order-mark, to ensure that editors it is tried upon know that they are opening a UTF-8 encoded file:

invisible_utf8_chars_2_w_bom.txt

@sasumner
I'm confused... now 2060, 2066, 2067, 2068 and 2069 in both files you provided don't display on my 1st laptop.

5 Unicode chars in both files you provided do display on my 2nd laptop.

With BOM I got this:
image

@ArkadiuszMichalski I see no changes in your two screenshots, except the lines that have been removed from the second file. This is not a problem, I'm just stating it for the record.

@donho said:

I'm confused... now 2060, 2066, 2067, 2068 and 2069 in both files you provided don't display on my 1st laptop.

5 Unicode chars in both files you provided do display on my 2nd laptop.

That confuses me as well. :-(

config.xml matters!

@sasumner
use this one to check:
config.zip

@donho said:

config.xml matters!

I presume that writeTechnologyEngine="0" in that config.xml is what is important?


So this N++ preference setting is going to impact how these characters appear:

image

To hopefully be very clear, this is how the invisible_utf8_chars_2_w_bom.txt file appears for me with "direct write" unticked:

image

And here's what it looks like for me with "direct write" ticked:

image

I'm not at all sure what this means for the future of this feature.

@donho How long is the "direct write" preference setting going to remain? Forever?
I turned it on and it has remained on forever for me.
No downside, only advantages.
To get good data on users that it wouldn't work well for, it should have been made to be on by default, and users could turn it off.
The way it was done, nobody notices it to turn it on and try.
Thus we get no feedback about users that it causes a problem for.

@sasumner

How long is the "direct write" preference setting going to remain? Forever?

I did read feedback that in some environments there's the performance issue when "direct write" is ON. So our experience can not be representative for the whole community. Furthermore, I don't want to scarify existing option (which could concern the performance issue) for a new feature which is benefit for less people.

The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON. What do you think?

The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON

Yes, that can work.
Or, we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.

@sasumner

we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.

Yes, it could work if Scintilla has a message for that. The Notepad++'s option regarding "direct write" in Parameters class cannot be counted on because it's applied to next session.

if Scintilla has a message for that.

SCI_GETTECHNOLOGY

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrecool-68 picture andrecool-68  Â·  86Comments

charliehoward4dp picture charliehoward4dp  Â·  53Comments

sokcuri picture sokcuri  Â·  79Comments

andrecool-68 picture andrecool-68  Â·  101Comments

EgorAnatolievich picture EgorAnatolievich  Â·  43Comments