Powershell: Format-Hex should not try to render unicode control characters that affect the display

Created on 13 Sep 2018  Â·  16Comments  Â·  Source: PowerShell/PowerShell

Format-Hex currently filters out control characters in the ASCII range that would affect the console display. Needs to be updated to handle Unicode control characters. Also consider replacing use of period for non-printable characters with the Unicode symbol for non-printable to avoid confusion with actual periods.

Area-Cmdlets-Utility First-Time-Issue Resolution-Fixed Up-for-Grabs

Most helpful comment

0x00 displayed as space makes sense to me as I believe other hex renderers do this.

All 16 comments

Is "`u{00}" expected output format?

Any unprintable character (beyond the control characters) should be replaced with the box with question mark symbol.

@iSazonov I think @SteveL-MSFT means this one:

U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character

from https://en.wikipedia.org/wiki/Specials_(Unicode_block)

@ThreeFive-O Thanks for clarify.

Will this symbol be well displayed in the Windows 7 console with default config?

With Raster fonts (default on Windows 7) the replacement symbol is not displayed.
On other Windows (by default) the symbol is displayed as an empty square.

It does not look good.

We might have to special case Win7 and do something like detect if the font can display and and maybe just show a question mark.

Determining the capabilities of a font at runtime looks impossible.
We could add new parameter ReplacementCharacter with standard default U+FFFD � REPLACEMENT CHARACTER.

@iSazonov does U+FFFD render as whitespace on Win7? That might be good enough since the telemetry shows that only a minor % of customers are on Win7 and it's probably a small set of those customers using format-hex.

does U+FFFD render as whitespace on Win7?

Raster font is _default_ on Windows 7 console and the symbol is displayed as whitespace. User have to select TrueType font to see the symbol. I personally always do this.
If Unix consoles use TrueType font by default I think we could use the symbol.

Also I tried Char.IsControl() in code you link and get surprised results. It is again a problem on Windows 7. I don't know can we accept this for new Windows version and Unix-s.

I think this should be pretty straight forward as I expect the unicode control characters to be documented on the internet and just needs to be added to the already existing filter out list

Not sure if it's an ideal solution, but I've added a simple Unicode detect statement to ByteCollection.ToString() - https://github.com/PowerShell/PowerShell/pull/9762

Unicode control characters https://en.wikipedia.org/wiki/Unicode_control_characters

The control characters U+0000–U+001F and U+007F come from ASCII. Additionally, U+0080–U+009F were used in conjunction with ISO 8859 character sets (among others). They are specified in ISO 6429 and often referred to as C0 and C1 control codes respectively.

So right fix is (1) convert byte to Unicode char, (2) check char.IsControl, (3) replace controls with U+FFFD

Here it is, but I think replacing all control chars with /uFFFD makes the right hand column hard to read if the byte collection is mostly Unicode.

image

... a slight alteration made to display 0 values as a space.

image

Thoughts?

The Win10 raster font displayed /uFFFD correctly - does it display correctly on other systems? If Windows 7 raster fonts displays /uFFFD as a space then that would be acceptable to me.

@paulbailey1979 About Windows 7 see my comment above https://github.com/PowerShell/PowerShell/issues/7777#issuecomment-423795715 I think it is not a problem.

display 0 values as a space

LGTM. /cc @SteveL-MSFT Thoughts?

0x00 displayed as space makes sense to me as I believe other hex renderers do this.

@SteveL-MSFT this was actually done during #8674 as you can see here:
https://github.com/vexx32/PowerShell/blob/a2ae1684bd8e523fb3c2e17d273d75b16ced8059/src/Microsoft.PowerShell.Commands.Utility/commands/utility/UtilityCommon.cs#L260-L290

Closing this one for now; feel free to reopen if you think I missed anything with that implementation. Going to add a note to the PR and the associated doc issue.

GitHub
PowerShell for every system! Contribute to vexx32/PowerShell development by creating an account on GitHub.
Was this page helpful?
0 / 5 - 0 ratings