From https://github.com/microsoft/vscode/issues/95685
TypeScript Version: 3.9.0-dev.20200420
Search Terms:
Code
For the JS
var ftext = [β©"Und", "dann", "eines"];
var a = 1;
Note there is an invisible paragraph separator character after the opening[
= on the second line.Bug:
TS starts reporting bogus errors:

It looks the paragraph character messes up the parsing and causes the error line numbers to be off by one
[Trace - 22:31:41.88] <semantic> Event received: semanticDiag (0).
Data: {
"file": "/Users/matb/projects/san/test.js",
"diagnostics": [
{
"start": {
"line": 2,
"offset": 8
},
"end": {
"line": 2,
"offset": 12
},
"text": "Cannot find name 'dann'.",
"code": 2304,
"category": "error"
},
{
"start": {
"line": 2,
"offset": 16
},
"end": {
"line": 2,
"offset": 21
},
"text": "Cannot find name 'eines'.",
"code": 2304,
"category": "error"
}
]
}
In JavaScript, Paragraph Separator is a line terminator; however, in the edits described above, VS Code says that edits are occur on line 2.
From TypeScript's perspective, line 2 occurs immediately after the Paragraph Separator. Ultimately, editors need to account for each language's newline settings if they intent to use line/offset encodings for commands, so I don't think we can fix this on the TypeScript side.
See also:
Here's the edit we send to TSServer when deleting the space after the = in a = 1:
[Trace - 23:20:02.22] <syntax> Sending request: updateOpen (12). Response expected: yes. Current queue length: 0
Arguments: {
"changedFiles": [
{
"fileName": "/Users/matb/projects/san/test.ts",
"textChanges": [
{
"newText": "",
"start": {
"line": 2,
"offset": 8
},
"end": {
"line": 2,
"offset": 9
}
}
]
}
],
"closedFiles": [],
"openFiles": []
The indexes one based so they look correct to me.
@RyanCavanaugh @DanielRosenwasser Does this also match what you see?
Yeah, that's what I'm seeing as well, but if I understand correctly the edit needs to happen at line 3 because the paragraph separator is a line terminator.
VS Code does not consider the character a line break. Looks like VS does though
@alexdima should be able to say if this is intentional or not. If you have this character in your code and it's not inside a string though, it's probably a bug
This issue has been marked as 'External' and has seen no recent activity. It has been automatically closed for house-keeping purposes.
Currently, VS Code considers the following characters/sequences to be line terminators: CRLF, CR or LF.
I feel that this is a difficult topic to get wide agreement on, since we need to work with multiple programming languages, each one with its own definition of what a line terminator is. So, if a compiler creates a diagnostic at line 100, it appears that line number is defined according to the programming language's own specification.
From my reading of https://www.unicode.org/reports/tr14/tr14-32.html , Unicode defines a mandatory line break after the following:
BK). In the BK class there are the following:000C - FORM FEED (FF)000B - LINE TABULATION (VT) - [OPTIONAL]2028 - LINE SEPARATOR (LS)2029 - PARAGRAPH SEPARATOR (PS)CR followed by LF, as well as CR, LF, and NL as hard line breaks:000Dx000A- CARRIAGE RETURN (CR) x LINE FEED (LF)000D - CARRIAGE RETURN (CR)000A - LINE FEED (LF)0085 - NEXT LINE (NEL)Here I'm looking at the most popular languages used in VS Code for which I could find a specification or where the decision is not pushed out to specific implementations (looking at you, C++).
It looks like the JS spec Table 33 defines the following as line terminators. I suppose TS uses the same:

Interestingly, the C# spec A.1.1 Line terminators defines the same line terminators:

HTML however defines newlines as just CR, LF and CRLF:

Python also defines just CR, LF and CRLF:

PHP also defines just CR, LF and CRLF:

Java defines LF, CR and CRLF:

YAML section 3.1.4 Line Breaks uses the following:

To sum up:
| char/seq | Unicode | JS | C# | HTML | Python | PHP | Java | YAML |
|----------|---------|----|----|------|--------|-----|------|------|
| CRLF | β
| β
| β
| β
| β
| β
| β
| β
|
| CR | β
| β
| β
| β
| β
| β
| β
| β
|
| LF | β
| β
| β
| β
| β
| β
| β
| β
|
| LS | β
| β
| β
| | | | | β
|
| PS | β
| β
| β
| | | | | β
|
| NEL | β
| | | | | | | β
|
| FF | β
| | | | | | | |
| VT | β
| | | | | | | |
I am not sure what would be best, since it would be very difficult to make VS Code end-of-line sequence dependent on the file's language and changing things now might break language extensions which follow our definition.
@dbaeumer Is the end-of-line sequence something that is specified in the Language Server Protocol?
Actually, a straight forward way to tackle this would be for us to prompt users and ask for permission to "fix" their files. Most likely, LS, PS or NEL are inserted by accident in files by copy-pasting and they are unwanted there anyways. I've opened https://github.com/microsoft/vscode/issues/96142 to track this.
@alexdima yes it is specified here: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocuments
The LSP support \r, \n and \r\n
Most helpful comment
Currently, VS Code considers the following characters/sequences to be line terminators:
CRLF,CRorLF.I feel that this is a difficult topic to get wide agreement on, since we need to work with multiple programming languages, each one with its own definition of what a line terminator is. So, if a compiler creates a diagnostic at line 100, it appears that line number is defined according to the programming language's own specification.
From my reading of https://www.unicode.org/reports/tr14/tr14-32.html , Unicode defines a mandatory line break after the following:
BK). In theBKclass there are the following:000C- FORM FEED (FF)000B- LINE TABULATION (VT) - [OPTIONAL]2028- LINE SEPARATOR (LS)2029- PARAGRAPH SEPARATOR (PS)CRfollowed byLF, as well asCR,LF, andNLas hard line breaks:000Dx000A- CARRIAGE RETURN (CR) x LINE FEED (LF)000D- CARRIAGE RETURN (CR)000A- LINE FEED (LF)0085- NEXT LINE (NEL)Here I'm looking at the most popular languages used in VS Code for which I could find a specification or where the decision is not pushed out to specific implementations (looking at you, C++).
It looks like the JS spec Table 33 defines the following as line terminators. I suppose TS uses the same:

Interestingly, the C# spec A.1.1 Line terminators defines the same line terminators:

HTML however defines newlines as just

CR,LFandCRLF:Python also defines just

CR,LFandCRLF:PHP also defines just

CR,LFandCRLF:Java defines

LF,CRandCRLF:YAML section 3.1.4 Line Breaks uses the following:

To sum up:
| char/seq | Unicode | JS | C# | HTML | Python | PHP | Java | YAML |
|----------|---------|----|----|------|--------|-----|------|------|
| CRLF | β | β | β | β | β | β | β | β |
| CR | β | β | β | β | β | β | β | β |
| LF | β | β | β | β | β | β | β | β |
| LS | β | β | β | | | | | β |
| PS | β | β | β | | | | | β |
| NEL | β | | | | | | | β |
| FF | β | | | | | | | |
| VT | β | | | | | | | |
I am not sure what would be best, since it would be very difficult to make VS Code end-of-line sequence dependent on the file's language and changing things now might break language extensions which follow our definition.
@dbaeumer Is the end-of-line sequence something that is specified in the Language Server Protocol?