Try sorting this:
https://pbs.twimg.com/media/
https://twitter.com/
​https://pbs.twimg.com/media/
Somehow, the last URL is positioned wrong.
Wait.., there is a zero-width character on the third (before the h in https) which is "​" (select to see that character). I think NP++'s show symbol should also show such characters.
The character is U+200B (Unicode name: ZERO WIDTH SPACE).
I think NP++'s show symbol should also show such characters.
What's your opinion on what it should look like?
A yellow line, much like when you make it show spaces and tabs.
@GhbSmwc I took the liberty of renaming your issue; hopefully that is okay with you. :-)
This is going to depend upon Scintilla.
This was "addressed" by the Scintilla project back in 2013: https://sourceforge.net/p/scintilla/feature-requests/980/
Their response was to add an API so that you can represent any character you want, "specially":
https://www.scintilla.org/ScintillaDoc.html#SCI_SETREPRESENTATION
The representation mechanism is the "white text on black box" variety, much like line-endings look when visually enabled in Notepad++:
Here's an example of what it would look like to set the representation to ""
(an empty string), the narrowest possible thing, which is harmonious with a "zero-width" character:
I suppose it is "workable" but I'd say it isn't ideal.
This request has come up more than once if you look back through the Issues history. I encountered this problem when "screen scraping" data out of certain HTML pages.
If this is not taken up, then maybe add a Caution to NPP interface to say that the option "View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.
Only way to detect a Zero Width Space is to move a cursor along a line of text while watching the character position number in the status bar,
Other issues raised covering the same topic
@Daksol
This request has come up more than once
And note that it remains an "open" request
"View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.
The "Show All Characters" verbage is historical and comes from a time when "space", "tab" and "end-of-line" characters were the only things that weren't shown. Today that menu item might be "Show space, tab, CR, LF"
The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5062#issuecomment-699563782
You can have these characters visible RIGHT NOW if you are willing to use a Pythonscript to control them.
@sasumner
The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: #5062 (comment)
The issue https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5062 talked about the BOM, which is visible, but removed by Notepad++, so I won't reopen it since it's not an issue. OTOH, I agree the invisible Unicode character is an issue.
So let's add Show Zero Width Non Breaking Space
menu command between Show End Of Line
& Show All Character
, and make Show All Character
command shows also ZWNBS chars.
@donho
Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)
5062 was also used as a springboard for the general discussion of invisible characters.
let's add Show Zero Width Non Breaking Space menu command
But...there are many more invisible UTF-8 characters than that, that users could want.
Hmmm...
Let me come up with a proposal for how to best handle this?
Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)
Yes, HexEditor does some dirty hacking to show these 3 bytes.
But...there are many more invisible UTF-8 characters than that, that users could want.
Then let's add "Show all invisible character" menu command - I don't think users care about showing which kind of invisible characters.
Let me come up with a proposal for how to best handle this?
OK. Please keep it simple & stupid.
@sasumner. Thanks for getting this back on the program, so to speak.
Some thoughts on what would be best way of handling hidden characters - this derived from the usecases I was encountering when I first discovered zero width space and friends.
My real world interaction with hidden chars has often been when troubleshooting web pages, urls et al. And what I have often found myself doing is putting a string into (say Excel) and adding formulae to show the Unicode code for each item. That then shows up the non-breaking spaces, zero-width spaces etc. Not very convenient to say the least.
@Daksol said
Some thoughts on what would be best way of handling hidden characters...
Notepad++ only has limited control over the display; the display is primarily controlled by Scintilla.
The only possible thing that N++ can do with these characters is what is shown with CR and LF in my previous posting HERE.
So any such characters would have to look like those.
@sasumner But indicator for this not_visible_chars must be the orange dot (same as for standard space)? I would prefer to somehow distinguish a normal space from the other characters.
@ArkadiuszMichalski said:
must be the orange dot
Scintilla controls the "orange dot" behavior, for but only for U+0020 character.
The best Notepad++ can do for making any new characters visible is like what it does for CR and LF.
In the default theme this means whiteish characters on a black background.
So here's a sampling of what could be done, stolen from a forgotten discussion thread on the Community site:
OK thx, looks good, at least for me.
Here's the complete list as of now, that I plan to implement:
For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide. I don't know if there is a standard abbreviation...does anyone?
Comments are most welcome.
More "simple and stupid" spec'cing out of this feature to come...
@sasumner
Are they (in the list above) all not displayed currently in Notepad++ ?
Are they (in the list above) all not displayed currently in Notepad++ ?
Correct.
I do not know if the list is complete, but all of those currently in the above list are currently invisible in N++.
Unless someone says "You forgot about 'foo' !" we can start with the above list.
@donho
Here's the file the above image was made from, if you want to "see" how N++ displays the invisible characters:
BTW, here is what is displayed for the invisible_utf8_chars.txt in Windows' notepad.exe :
@sasumner
For me FF, NEL, LS & PS are displayed:
My debug info:
Notepad++ v7.9.1 (32-bit)
Build time : Nov 2 2020 - 01:03:56
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Enterprise (64-bit)
OS Version : 2004
OS Build : 19041.630
Current ANSI codepage : 1252
Plugins : DSpellCheck.dll HexEditor.dll mimeTools.dll NppConverter.dll NppExport.dll NppXmlTreeviewPlugin.dll
Notice U+202E.
There may be some additional effort for Notepad++ (Scintilla) to actually respect something like the "right-to-left override" and actually display sections of a file in RTL form.
Here's what the Scintilla demo editor SciTE/Sc1 (v.4.3.0) shows for the invisible_utf8_chars.txt file:
@donho said:
For me FF, NEL, LS & PS are displayed
I can confirm that -- Oops, my error on those.
We can drop them from the list.
Notice U+202E.
Disregard what I said earlier about this.
It appears to already be working (reference @donho 's earlier screenshot -- the line with U+202E is in RTL).
I'm not sure yet why it isn't showing RTL for me, or in Scintilla demo editor!
@sasumner
For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide
So with one command, will all the invisible Unicode chars be shown as your screenshot (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8284#issuecomment-734023781) ?
Here is what I see in N++ 7.9.1 -- why don't I see the same as @donho for 2060, 2066 - 2069 ??
@donho said
So with one command, will all the invisible Unicode chars be shown as your screenshot
I didn't get to that part of the "simple & stupid" spec yet :-)
But yes, one "user" command.
As to Scintilla commands, it takes one SCI_SETREPRESENTATION call for each UTF+8 character you want to set a representation for.
Or, alternatively, to turn off with SCI_CLEARREPRESENTATION
@donho
I case you can't tell, I really like the "simple & stupid" thing :-)
Here is what I see in N++ 7.9.1 -- why don't I see the same as @donho for 2060, 2066 - 2069 ??
I have tested on my another laptop, both have the same result. I guess you are under the OS older than mine ?
Notepad++ v7.9.1 (64-bit)
Build time : Nov 2 2020 - 01:07:46
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 2004
OS Build : 19041.630
Current ANSI codepage : 1252
Plugins : HexEditor.dll JSMinNPP.dll mimeTools.dll NppConverter.dll NppExport.dll
@donho says:
I guess you are under the OS older than mine ?
Slightly older:
OS Name : Windows 10 Enterprise (64-bit)
OS Version : 1809
OS Build : 17763.1518
Current ANSI codepage : 1252
This introduces the complication of how well this entire feature works for all users of N++, under so many various versions of Windows...
I probably should do a larger survey of how other editors display this invisible_utf8_chars.txt
file...
VS2019 text editor does absolutely nothing special in showing this file. It is like the invisible characters are not even there.
I got this:
@ArkadiuszMichalski @donho So there's 3 of us in our sampling, and all 3 get different results.
Maybe attempting to do something for this feature is just asking for trouble, what with all the variation?
Uploading a new version of the file, this time with a byte-order-mark, to ensure that editors it is tried upon know that they are opening a UTF-8 encoded file:
@sasumner
I'm confused... now 2060, 2066, 2067, 2068 and 2069 in both files you provided don't display on my 1st laptop.
5 Unicode chars in both files you provided do display on my 2nd laptop.
With BOM I got this:
@ArkadiuszMichalski I see no changes in your two screenshots, except the lines that have been removed from the second file. This is not a problem, I'm just stating it for the record.
@donho said:
I'm confused... now 2060, 2066, 2067, 2068 and 2069 in both files you provided don't display on my 1st laptop.
5 Unicode chars in both files you provided do display on my 2nd laptop.
That confuses me as well. :-(
config.xml
matters!
@sasumner
use this one to check:
config.zip
@donho said:
config.xml matters!
I presume that writeTechnologyEngine="0"
in that config.xml is what is important?
So this N++ preference setting is going to impact how these characters appear:
To hopefully be very clear, this is how the invisible_utf8_chars_2_w_bom.txt
file appears for me with "direct write" unticked:
And here's what it looks like for me with "direct write" ticked:
I'm not at all sure what this means for the future of this feature.
@donho How long is the "direct write" preference setting going to remain? Forever?
I turned it on and it has remained on forever for me.
No downside, only advantages.
To get good data on users that it wouldn't work well for, it should have been made to be on by default, and users could turn it off.
The way it was done, nobody notices it to turn it on and try.
Thus we get no feedback about users that it causes a problem for.
@sasumner
How long is the "direct write" preference setting going to remain? Forever?
I did read feedback that in some environments there's the performance issue when "direct write" is ON. So our experience can not be representative for the whole community. Furthermore, I don't want to scarify existing option (which could concern the performance issue) for a new feature which is benefit for less people.
The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON. What do you think?
The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON
Yes, that can work.
Or, we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.
@sasumner
we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.
Yes, it could work if Scintilla has a message for that. The Notepad++'s option regarding "direct write" in Parameters class cannot be counted on because it's applied to next session.
if Scintilla has a message for that.
Another user request: https://community.notepad-plus-plus.org/topic/20428/highlight-non-breaking-spaces
Most helpful comment
Here's the complete list as of now, that I plan to implement:
For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide. I don't know if there is a standard abbreviation...does anyone?
Comments are most welcome.
More "simple and stupid" spec'cing out of this feature to come...