Powershell: Drop BOM

Created on 14 Jan 2019  路  8Comments  路  Source: PowerShell/PowerShell

PowerShell silently inserts a Unicode BOM at the beginning of output streams such as with echo, causing downstream processes to fail. For example, if you try to use echo to author a pip.ini file, pip will be unable to load the file due to the BOM.

Issue-Question Resolution-Answered

Most helpful comment

I thought PowerShell Core uses largely the BOM-less UTF8 variant for a majority of its standard encodings. :thinking:

Can you please verify this is happening with PowerShell Core, and not Windows PowerShell?

All 8 comments

I thought PowerShell Core uses largely the BOM-less UTF8 variant for a majority of its standard encodings. :thinking:

Can you please verify this is happening with PowerShell Core, and not Windows PowerShell?

@mcandre can you provide a short sample that reproduces the behavior you describe?

@vexx32 the help for Set-Content says the default encoding is ascii?

@chuanjiao10 text files don't need a BOM. Byte order is meaningless in UTF-8, and isn't even recommended. In fact, if the OP was correct that _streams_ were getting a BOM, this would probably be an egregious error. For instance if echo (aka Write-Output) was adding a BOM, that would be very bad.

However, I'm pretty sure that's not the case. Additionally PowerShell has been using utf-8 without a preamble (aka BOM) as the default $OutputEncoding since the 6.0 release candidate PR #5369, so my guess is that @mcandre is dealing with Windows PowerShell 5.x or older.

There's a lot of talk of war in here, but I still don't see any code to reproduce the issue originally described - @mcandre can you enlighten us?

@chuanjiao10 Listen. UTF-8 is the modern standard for text-encoding. And being a mixed-case encoding format, it does not need a BOM. A lot of programs work just fine using heuristics to correctly identify unicode characters in UTF-8 strings, and UTF-8 is for the most part backwards-compatible with ANSI.

On the other hand, a lot of Windows programs mistakenly insert a BOM into UTF-8 text, Notepad included, and other Windows programs such as Command Prompt will actually fail to correctly parse UTF-8 strings because they'd try to parse a BOM as if it were normal text, thus breaking a command.

Programs that _rely_ on using a BOM as a "magic number" to identify UTF-8 text, to put it simply, are doing it wrong. It is a good thing to develop a program that can correctly identify a BOM inside of a UTF-8 string (and correctly identify a unicode string by the presence of a BOM), but for the most part, it is simply bad practice to rely on that alone, as opposed to going the extra mile and actually using heuristics to reliably detect UTF-8 encoding _with or without_ the presence of a BOM.

P.S.: Please don't reference controversial political figures in here. Also, nobody here is saying that other languages shouldn't exist, so please don't go comparing removal of the UTF-8 BOM to "English first". That's bordering on hate speech.
P.P.S.: The English army is from England, the term you'd be looking for is American army.

Do you have a single fact to back that up?

With all due respect, you're being arrogant, stubborn, and needlessly hostile. I'm only going to politely ask you one more time to stop your harassment.

P.S.: And for the last time, STOP BRINGING POLITICS INTO THIS!

Back on subject, I agree with the original poster that PowerShell (and Notepad, also) needs to stop inserting a BOM when encoding UTF-8, but PowerShell also needs to be given the ability to ignore a BOM when it encounters one in a UTF-8 string.

Please respect our Code of Conduct and stay on topic.

As for this issue, PSCore6 defaults to UTF-8 BOM-less encoding for output, so unless a repro can be produced, this seems to be about Windows PowerShell which is not serviced by this repo.

Was this page helpful?
0 / 5 - 0 ratings