Sdk: Can generate the file without BOM?

Created on 5 May 2018 · 14Comments · Source: dotnet/sdk

Steps to reproduce

dotnet new xxx

Expected behavior

Generate the all file without BOM

Actual behavior

Generate the all file with BOM

Environment data

dotnet --info output:

Product Information:
Version: 2.1.104
Commit SHA-1 hash: 48ec687460

Runtime Environment:
OS Name: Windows
OS Version: 10.0.16299
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.104\

Microsoft .NET Core Shared Framework Host

Version : 2.0.6
Build : 74b1c703813c8910df5b96f304b0f2b78cdf194d

Source

seanmars

Most helpful comment

FWIW at least for C# to quote from the ECMA-334 5th Edition (https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-334.pdf):

Conformance (PDF Page 25):

A conforming implementation of C# shall interpret characters in conformance with
the Unicode Standard. Conforming implementations shall accept Unicode source
files encoded with the UTF-8 encoding form.

7.1 Programs (PDF Page 35):

Conforming implementations shall accept Unicode source files encoded with the
UTF-8 encoding form (as defined by the Unicode standard), and transform them
into a sequence of Unicode characters. Implementations can choose to accept and
transform additional character encoding schemes (such as UTF-16, UTF-32, or
non-Unicode character mappings).

Nothing in here says that it has to contain the BOM, so if you are looking for the end all be all it will not be found in the standard...

That being said every Visual Studio version we have ever used the templates have always contained the BOM. We have commit hooks that enforce it for us internally due to some of the issues @sharwell as mentioned. For us there was a portion of code that contained some exotic characters required by a third party library that was garbled by text editors not properly respecting the fact that the file was indeed UTF-8. As he says having the BOM avoids more issues than it causes. YMMV.

aolszowka on 7 Oct 2019

👍2

All 14 comments

I believe this has been fixed in 2.1.300. @peterhuene, I remember you looked into something similar in the past. Can you confirm?

livarcocc on 6 May 2018

An issue was fixed regarding modifying solution files with dotnet sln with dotnet/cli#8199 for 2.1.300.

@seanmars is there a particular generated file you're expecting to see the BOM? That is to say, what is the exact command you're running?

peterhuene on 6 May 2018

I think the .cs, .json, .css ... files not with BOM, only the sln file need BOM, right?
But now, if use dotnet new mvc(or webapi), all the file will generated with BOM.

Imgur
Imgur
Imgur

seanmars on 6 May 2018

I do see UTF-8 BOMs with a lot of source files in both https://github.com/aspnet/templating and https://github.com/dotnet/templating.

I found these related issues:
https://github.com/aspnet/templating/issues/500
https://github.com/dotnet/templating/pull/477

Originally it seems that this was done to force Visual Studio to treat the files as UTF-8, but that might have been before charset in .editorconfig was respected. Thus, it might be worth raising the issue again with both of the above repos to see if the time has come to remove the BOMs from the templates.

peterhuene on 7 May 2018

👍2

Is there any news?

seanmars on 22 Aug 2018

I am running into this:

C:\testapps\threeapp>type Program.cs
∩╗┐using System;

namespace threeapp
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");
        }
    }
}

C:\testapps\threeapp>dotnet --version
3.0.100-preview4-010345

richlander on 28 Feb 2019

Do we just need someone to go through the templates and remove the BOMs?

richlander on 28 Feb 2019

@vijayrkn @mlorbetske to comment. These templates live in dotnet/templating.

livarcocc on 28 Feb 2019

❗️ Source files need to be generated _with BOM_. Otherwise, certain editors will treat them in non-uniform manner and eventually someone will accidentally save the file with question marks (encoding error fallback character). Normally I see this in author names in files getting messed up, but recently we found a curly quote in dotnet/winforms which was incorrectly saved. These errors are easy to miss and (in many cases) hard to fix, so we create the file with BOM to avoid it altogether.

We have a secondary benefit that the BOM triggers an early exit in the automatic encoding detection algorithm in .NET, so editors like Visual Studio load files faster. It's a small win and not really significant compared to the problem above, but I find it interesting. 😄

sharwell on 28 Feb 2019

👍2

Given how contentious this has always been, I'd prefer to leave things as they are unless there's a compelling reason to change the content to exclude the BOM. It is worth noting that the presence or absence of the BOM is determined by the source content, so while say CS files have a BOM, it's not required that JS or RB files do - this is a choice that a template author can make to best suit their audience. With the comment by @sharwell, it seems like the most prudent thing to do for tools built on .NET consuming these content files is to leave the BOM in the content - is this agreeable?

If we want to have a longer discussion on this, I can move the issue to the dotnet/templating repo.

mlorbetske on 28 Feb 2019

That sounds reasonable to me.

peterhuene on 28 Feb 2019

I think there is no compelling reason to include or exclude BOM both. And i'm agree what the @sharwell say. But I believe that other files(like js, css...) also encounter the same problem. Why are they not using the BOM? It's really interesting! 🤪

seanmars on 28 Feb 2019

@seanmars Some file formats do not allow the BOM (e.g. JSON), and others have a history of not supporting it well (e.g. many Java-based tools, since BOM handling there is manual and often overlooked). C# templates have long contained the BOM so tooling surrounding these files is well-equipped to handle it correctly.

sharwell on 28 Feb 2019

👍1

FWIW at least for C# to quote from the ECMA-334 5th Edition (https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-334.pdf):

Conformance (PDF Page 25):

A conforming implementation of C# shall interpret characters in conformance with
the Unicode Standard. Conforming implementations shall accept Unicode source
files encoded with the UTF-8 encoding form.

7.1 Programs (PDF Page 35):

Conforming implementations shall accept Unicode source files encoded with the
UTF-8 encoding form (as defined by the Unicode standard), and transform them
into a sequence of Unicode characters. Implementations can choose to accept and
transform additional character encoding schemes (such as UTF-16, UTF-32, or
non-Unicode character mappings).

Nothing in here says that it has to contain the BOM, so if you are looking for the end all be all it will not be found in the standard...

aolszowka on 7 Oct 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings