In a Notepad++ document that is encoded as UTF-8 (no BOM), many Unicode characters are not displayed, but the hollow square appears in their place. If a displayable Unicode character is added to a line containing undisplayable Unicode characters, those undisplayable ones suddenly appear. Removing the "good" one makes the others revert to the hollow square. A simple example:
ββ¬ββ ββ§β¨
Paste that line into NP++ and you will see all the characters. Remove the leading star β and the others become squares. Restore the star and the others re-appear.
All of the characters always should appear.
They only appear if an always-acceptable Unicode character is on the same line. If an always-acceptable Unicode character is in the document but not on the same line, certain Unicode characters, such as, but not limited to, the ones shown above, will not be displayed properly.
Notepad++ v7.5.1 (32-bit)
Build time : Aug 29 2017 - 02:35:41
Path : C:\Program Files (x86)Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS : Windows 10 (64-bit)
Plugins : ComparePlugin.dll mimeTools.dll NppConverter.dll NppExport.dll NppFTP.dll NppTextFX.dll PluginManager.dll SpellChecker.dll
This occurs with characters from many of the Unicode blocks.
I was able to reproduce this as well.
I tested with the Default Style in Style Configurator set to Courier New
, Consolas
, Arial
and Times New Roman
. The file was a TXT file and I tested encoding in UTF-8
, UTF-8 BOM
, UCS-2 BE BOM
and UCS-2 LE BOM
. All of them showed the same result.
I believe this issue would happen any time you enter a character that is NOT contained in the selected font and then add/remove on the same line a character which IS contained in the selected font.
IMHO seems like something not quite right with the font-substitution routines. This was in a TXT file encoded with
Notepad++ v7.5.9 (64-bit)
Build time : Oct 14 2018 - 15:19:55
Path : C:\Program FilesNotepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS : Windows 10 (64-bit)
Plugins : DSpellCheck.dll mimeTools.dll NppConverter.dll
I believe this issue would happen any time you enter a character that is NOT contained in the selected font
This may be, but it also appears to occur in other circumstances as-well. For-example, "β·" (U+23B7 "RADICAL SYMBOL BOTTOM") is present in Consolas, Courier New, DejaVu Sans Mono, and Lucida Console, but if you put that in a new text file, it won't show up with any of those fonts.
it also appears to occur in other circumstances as-well
Cannot confirm this example on my system.
The U+23B7 character is not in my _DejaVu Sans Mono_. There is U+23AE, followed by U+23CE.
The U+23B7 character is not in my _Courier New_ either. There is U+2321, followed by U+2500.
Same for my _Consoleas_, U+2321 followed by U+2460, same for my _Lucida Console_, this ends at U+0433.
So this example seems not to poke a hole into the theory, that only characters unavailable in the current font are affected.
The U+23B7 character is not in my DejaVu Sans Mono.
The same. There is smth magical about β, β (line 1 on my gif), π, β, β (but not β) that when present on the same line it fixes the issue?? But even when it is fixed you can broke it if
a) you will highlight the bracket
b) the fixing character is on the same side of the bracket!
I will paste my funny gif and close my issue #8305 about β symbol.
Duplicate of #442, #671, #675, #813, #870, #1621, #3458, #4056, #4086, #4490, #5513, #8305, #8756 may be many more.
So, we need to test Notepad++ 7.6.6 as it was good (?) in #4490. Also all of it (brackets also) is said in this comment (except it is wrong that only before shows elements after also works). https://github.com/notepad-plus-plus/notepad-plus-plus/issues/1621#issuecomment-260655014
And, yes it totally does not happen in MS Gothic. Strange.
It might be, somehow, related to SCI_SETTECHNOLOGY configuration.
@Ekopalypse https://trac.wxwidgets.org/ticket/17804#comment:34 Yep.
@ValZapod
but I don't see the issue with the larger autocompletion box.
Maybe this was already fixed with the scintilla version used by npp.
@Ekopalypse
with the larger autocompletion box.
You mean brackets? I think you use not Courier New Font? It is bad in it, and good in DejaVu Sans Mono. You can try from #442
@Ekopalypse,
the SCI_SETTECHNOLOGY
approach looks very promising on my system too.
I did include an execute(SCI_SETTECHNOLOGY, <n>);
into ScintillaEditView::init
to test it.
Techology 0:
Technologies 1, 2 and 3:
Both screenshots show the same file automatically loaded after start, the only thing I did was moving the cursor to the right bracket of the two marked brackets.
I used _Courier New_ here.
The new techologies seem to size the substituted chars better then techology 0. _Edit:_ But it has nothing to do with "fixed font" anymore, the substitutions seem to have quite variable widths.
I found that this symbol β (U+2198) is rendered in Consolas (it does not have this symbol) as ? in a square. Not just a square. But apart of that it is all the same.
@ValZapod
You mean brackets? I think you use not Courier New Font? It is bad in it, and good in DejaVu Sans Mono. You can try from #442
No, I mean the screenshot and discussion you linked to
https://trac.wxwidgets.org/attachment/ticket/17804/17804-SetTechnology1.png
There it has been reported that the words in an autocompletion box are bigger using directwrite
instead of default. I'm using RobotoMono font.
@ValZapod @Uhf7
I don't seem to be able to get the results you get with the Courier New font.
The β is always displayed, so I assume that something else has an additional effect.
Downloaded 7.8.6 x64 and did a retest
The "bracket issues" doesn't seem to happen for me. (??)
@Ekopalypse
I can make the β visible with the _Courier New_ font now too, using technology 0 and some hand-configured font linking, which looks actually a little ill:
What I did: There is a registry entry
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink
Under it, there are many multi-string values, named after fonts. There was no value named Curier New
. So I created a multi-string value named Courier New
and copied the data of the Lucida Sans Unicode
into it. A shot in the blue. Immediately after it, nothing improved, but after I rebooted my system, the missing characters became visible. And: If I move the cursor to the famous right bracket, then the small β mutates to a big β instead to an empty frame. So it certainly depends on the font linking setup too. And if I could set up the font linking in a way, that the font-linked β looks the same as the "normal" β (where ever it comes from), then everything would be fine with technology 0.
@Uhf7 Is not set for me.
The "bracket issues" doesn't seem to happen for me. (??)
@Ekopalypse The font never actually changed, you need to press on Enable global font. Windows 7?? It is EOL...
@Ekopalypse
then certainly another trick exists. Running out of thoughts here. On my system, the 32 bit version works exactly like the 64 bit version.
A major difference is the system itself: I use
OS Name : Windows 10 Pro (64-bit)
OS Version : 1607
OS Build : 14393.576,
you use Windows 7. I try it on my Windows 7 system ...
Ok, under Windows 7, the β works fine, with and without bracket highlighting, but some other characters don't. Using technology 0.
With the 64 bit version of Npp, the same characters are missing.
Using technology 1:
What else could you ask for? Looks perfect to me.
Technology 1, Windows 7, 64 bit version of Npp:
I dream, if you ask me. Compared to the current state.
@Uhf7 Now remove β and β. Oops. Cannot we somehow force the rendering that is used when yout type in β?
@ValZapod
The font never actually changed, you need to press on Enable global font
No, it was started with these settings. I switched to global overwrite to show that no other font is defined. If global override is NOT checked, the default setting takes precedence.
Yes, I still use Windows 7 why not? I don't do any "mission critical transaction" with windows OS anyway.
Cannot we somehow force the rendering that is used when yout type in β?
Npp has no setting for this yet. What you can do is to use one of the scripting language plugins,
like PythonScript, LuaScript ..., even NppExec can be used to set the technology to DirectWrite
@Uhf7 - hmm :-D what should I say - Windows 10 broke it :-(
Thanks for testing and btw. thank you for your contributing work. Much appreciated.
@ValZapod
I see no negative effect when removing the β, if I use technology 1:
or the β:
I certainly start to get messed up with my screenshot names here, that's why I don't post any pictures from re-inserting the β and the β successfully, but for me, with technology 1 everything works fine. Under Windows 7 and under Windows 10, I believe.
My knowledge of fonts, rendering, directwriting... refers to what I have posted.
I can't say for sure if it's a problem in Scintilla or Windows or the font used or ...,
but if using DirectWrite offers a way to either work around or solve this, then I would
vote to add something to the settings that would allow the user to set it.
@Ekopalypse
using a plugin or NppExec to make the characters display correctly is not exactly what I call a solution of the problem. It's a possible work-around, but wouldn't it be nice when the characters are displayed correctly without additional actions?
And, @ValZapod, how long would it take until we have a new Scintilla version? (Perhaps, the Scintilla developers will say: Use technology 1 or higher! That would be interesting)
I would feel better, if Npp itself would switch the technology to a working one. May be, it can be included in the configuration somehow, so that there is a safe fallback if the technology switch doesn't work on some systems.
@Uhf7 - 100% correct :-D
https://sourceforge.net/p/scintilla/bugs/1393/ is our bug. But there is this problem about brackets... That is not there.
An issue close to this one is #2287. It is the same problem they describe there, existing since 2016, and it is solved there by setting the technology to DirectWrite with the help of a plug-in.
Thank you for that solution, but this is something for insiders. As a new user, or as a user who is just using it for editing files without caring about development, this solution is this is very hard to find.
So I would fully support what @jefflomax said in #2287:
Notepad++ should support ligatures out of the box, not thru hacks or adding plugins users neither need nor want.
So I will try to push it to the master now, with a PR. If we not do this now, the next ones come in two years wasting their time with testing it again and again and again.
Found an old Windows Vista in my virtual machine park, the following screenshots support the necessity to make the DirectWrite technology feature configurable. That Scintilla can load Direct2D does not mean automatically that this produces better results on old systems.
Vista, Technology 0, Courier New
Vista, Technolgy 1, Courier New
Vista, Technology 0, DejaVu Sans Mono
Vista, Technology 1, DejaVu Sans Mono
That is just because there was no support for Unicode that far in Vista?
May be. But Unicode itself was already there under Vista. What bugs me more is, that technology 1 under this Vista seems to wreck "normal" characters nearby the β character, sometimes.
technology 1 under this Vista seems to wreck "normal" characters nearby the β character, sometimes
Screenshot?
@ValZapod
You wrote 2 days ago
https://sourceforge.net/p/scintilla/bugs/1393/ is our bug.
The Unicode character U+25C6 (β)displays in Npp with and without DirectWrite technology. Even in Windows 7.
So I cannot verify that this is exactly "our" bug. And it was 2012. And he used Windows XP. And I'm sure there are many effects which can lead to empty frames instead of correct characters. I simply don't believe that it's promising to go to them and ask them to fix exactly this issue now.
Screenshot?
The second screenshot of my Vista screenshots, headlined "Vista, Technolgy 1, Courier New".
Most "normal" characters in line 3 don't look like _Courier New_ anymore.
So I cannot verify that this is exactly "our" bug. And it was 2012. And he used Windows XP.
Okay, maybe open another issue?? Maybe also lets try @nyamatongwe?
Valerii Zapodovnikov:
Okay, maybe open another issue?? Maybe also lets try @nyamatongwe?
For Scintilla bug #1393, text shaping for East Asian text can be influenced by the locale used so displaying in a Japanese context may differ from a Chinese context. There are other bugs about this like https://sourceforge.net/p/scintilla/bugs/2027/.
Problems with displaying particular symbol characters may be different. They seem to occur when the specified font does not include some characters so Windows tries to use glyphs from backup fonts. Scintilla does not have much control over this.
For GDI (technology 0) you could try experimenting with the font creation setup call in SetLogFont inside win32/PlatWin.cxx. It is possible that the lfQuality and lfCharSet parameters will influence the behaviour.
DirectWrite was originally implemented for Windows Vista but that early version had some problems and DirectWrite has improved over time. Applications could default to using DirectWrite from Windows 7 if there are too many problems with Vista or add an option that users can select. Some people prefer GDIβs less anti-aliased (blurry) text.
Neil
@nyamatongwe Wow. Paste ββ ββ§β¨ in your notepad3, it will get broken! Nice, I will open an issue there. P.S. Or it is not yours? https://github.com/rizonesoft/Notepad3/issues/2404
Wow. Paste ββ ββ§β¨ in your notepad3, it will get broken! Nice, I will open an issue there. P.S. Or it is not yours?
Nope, The owner of Notepad3 is "Derick Payne" π
Wow, look here @Uhf7 https://github.com/rizonesoft/Notepad3/issues/2404#issuecomment-640456912 this is genious. One can choose technology that you Draw with.
@ValZapod - seems you misunderstood most of the thread.
This is what I suggested 13 days ago and what @Uhf7 is working on.
Well, it will be without a UI?
I don't think so, if you check his PR then you will see that he added it to the preference dialog.
Well, 4 variants vs 2 and need to restart to preserve plugin behaviours...
Also maybe try this advice from Sci author?
For GDI (technology 0) you could try experimenting with the font creation setup call in SetLogFont inside win32/PlatWin.cxx. It is possible that the lfQuality and lfCharSet parameters will influence the behaviour.
Too much noise for my taste, I'm out.
@ValZapod I saw the UI already, but had no really opinion about it, because it doesn't belong to this project, so it does not help me here. My opinion regarding the technology settings in the screen shot: Two options too many. The difference in text rendering is between the Windows GDI TextOut function on one side and the DirectWrite equivalent on the other side. The rest is about how to bring the rendering result of DirectWrite to the screen.
Somebody just proposed a patch for this! https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8756#issuecomment-679320347
Change
to
auto TLen = text.length();
if(TLen>1)TLen++;
if (0) { //unicodeMode
tlen = static_cast<int>(UTF16FromUTF8(text, buffer, TLen));
} else {
tlen = ::MultiByteToWideChar(codePage, 0, text.data(), static_cast<int>(TLen),
buffer, static_cast<int>(TLen));
}
Valerii Zapodovnikov:
Somebody just proposed a patch for this! #8756 (comment) https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8756#issuecomment-679320347 We already know that changing the text changes the presentation of this bug. The patch is not a fix.
Neil
This "fix" is at least a hint where the problem comes from: It comes directly from the Windows GDI text output functions for wide characters. I did some experiments based on this information.
The Windows GDI functions, which are used by Scintilla and which do not work correctly, are:
ExtTextOutW
GetTextExtentPoint32W
GetTextExtentExPointW
The common error of these functions seems to be, that they use squares instead of characters for some 'bad' Unicode characters, as long as there is no 'good' Unicode character in the text string.
I have no list of 'good' or 'bad' Unicode characters, this is only a term for it I invented here. But I can name two 'good' Unicode characters: 0x0000 and 0x200B. If one of those two characters is in the text, all other Unicode characters are displayed correctly. The 0x0000 character has been used by @KnIfER for the "fix". Unfortunately, it has a width, when we use it with the Windows GDI functions.
So I went for the 0x200B character (Zero width space) in my experiments. A possible fix is to append the 0x200B character silently to all text strings passed to the Windows functions mentioned above. Then they produce the correct character width's and the correct output.
To make this experiment fly without additional text copy operations, I modified the TextWide
class in a sneaky way. The VarBuffer
is now one character longer than the actual text and this additional character is the Zero width space. tlen
remains as it is, to avoid any behavior modifications.
class TextWide : public VarBuffer<wchar_t, stackBufferLength> {
public:
int tlen; // Using int instead of size_t as most Win32 APIs take int.
TextWide(std::string_view text, bool unicodeMode, int codePage=0) :
VarBuffer<wchar_t, stackBufferLength>(text.length() + 1) {
if (unicodeMode) {
tlen = static_cast<int>(UTF16FromUTF8(text, buffer, text.length()));
} else {
// Support Asian string display in 9x English
tlen = ::MultiByteToWideChar(codePage, 0, text.data(), static_cast<int>(text.length()),
buffer, static_cast<int>(text.length()));
}
buffer [tlen] = 0x200b;
}
};
After modifying the TextWide
class this way, I can use tlen+1
as character count for the ExtTextOutW
call and for all GetTextExtentPoint32W
calls, to smuggle in the 'good' Unicode character.
What remains here, is the GetTextExtentExPointW
call in SurfaceGDI::MeasureWidths
. Here, I had to increase the size of the poses
buffer, and I had to set the result parameter fit
to the actual length of the text. This can be done without side effects, because the maxWidthMeasure
parameter is equal to INT_MAX
, so that I assume, that all characters fit into this width anytime.
const TextWide tbuf(text, unicodeMode, codePage);
TextPositionsI poses(tbuf.tlen + 1);
if (!::GetTextExtentExPointW(hdc, tbuf.buffer, tbuf.tlen + 1, maxWidthMeasure, &fit, poses.buffer, &sz)) {
// Failure
return;
}
fit = tbuf.tlen;
This experimental fix runs on my system without assertions in debug mode and displays the correct characters using the Windows GDI functions.
I don't know whether such a solution would be accepted by Scintilla, but perhaps there is someone who wants to try it this way too ...
squares instead of characters for some 'bad' Unicode characters
Those are not squares. It is .notdef (yes, it is not .null, U+0000) glyphs of the font)) You can see it in Fontlab. That is why Consolas is showing not just a square, but "?" in a square! https://docs.microsoft.com/en-us/typography/opentype/otspec170/recom#shape-of-notdef-glyph
Here's another way to reproduce this, from #3747 originally reported with #813
Open a new Notepad++ file, set the encoding to UTF-8 and paste these symbols (Double Arrow Unicode characters) on the first empty line
ββββββββββ
Position the cursor before the last two characters and enter a newline, like this
ββββββββ
ββ
The last two characters should turn into blocks.
This comment has a nice video showing the issue
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5513#issuecomment-482701890
Here's the solution:
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5513#issuecomment-700255578
Why did you close it if #8756 is still open and this issue is still not fixed??? Did you report it upstream?
Most helpful comment
@Ekopalypse
using a plugin or NppExec to make the characters display correctly is not exactly what I call a solution of the problem. It's a possible work-around, but wouldn't it be nice when the characters are displayed correctly without additional actions?
And, @ValZapod, how long would it take until we have a new Scintilla version? (Perhaps, the Scintilla developers will say: Use technology 1 or higher! That would be interesting)
I would feel better, if Npp itself would switch the technology to a working one. May be, it can be included in the configuration somehow, so that there is a safe fallback if the technology switch doesn't work on some systems.