Toml: Clarify multiline string ending in backslash

Created on 29 May 2021  路  4Comments  路  Source: toml-lang/toml

The spec seems a bit ambiguous about a particular edge case: a multi-line basic string that ends in a backslash.

Example:

s="""\"""

On one hand, the spec says

When the last non-whitespace character on a line is an unescaped \, it will be trimmed along with all whitespace (including newlines) up to the next non-whitespace character or closing delimiter.

Based on this I'd interpret that the example string is a valid (empty) string.

On the other hand, the spec also says

Any Unicode character may be used except those that must be escaped: backslash and the control characters other than tab, line feed, and carriage return

Since there is no other character after a string ending backslash, one could argue that it does not start an escape sequence, and is therefore unescaped (illegal) backslash itself.

Would be nice to have clarification on whether the last character of a multiline basic string is allowed to be a backslash or not.

Most helpful comment

I think these are invalid for the real intention of the spec:

"""\"""
"""\ """

All 4 comments

Although you say that you have a backslash at the end of your string, that is certainly not the case. In your example, you have """ which opens a multiline basic string, a sequence \" which represents an escaped double-quote character, a pair of double quotes "", and then... nothing. The example is simply invalid. Three quote marks without context don't necessarily end a multiline basic string.

So an illustration of an empty string value using the backslash would be something like this:

s="""\

  """

When the spec talks about backslashes at the end of lines inside multiline basic strings, the "lines" are actual _physical lines_. The backslash is immediately followed by optional whitespace then an end-of-line sequence. That is clearly defined by the text and backed up by the ABNF. See mlb-escaped-nl for the pattern, and compare that to allowed quote patterns and escape sequences.

Ah so my interpretation of "a line" in this context was wrong, I guess. I was thinking of lines in the string, when the spec means a line in the TOML source. Thanks for clarifying!

That is clearly defined by the text

I'm not sure that the spec is too obvious about this, or maybe you can point me to the right location? I don't think it would hurt to extend the already existing example about these whitespace escapes a bit:

# The following strings are byte-for-byte equivalent:
str1 = "The quick brown fox jumps over the lazy dog."

str2 = """
The quick brown \


  fox jumps over \
    the lazy dog."""

str3 = """\
       The quick brown \
       fox jumps over \
       the lazy dog.\
       """

# INVALID: The last backslash is not the last non-whitespace character on its line
# str4 = """
# The quick brown \
# jumps over \
# lazy dog.\       """

Note that I only added the invalid str4. The other stuff is already in spec. What do you think?

I think these are invalid for the real intention of the spec:

"""\"""
"""\ """

An extra example shouldn't be necessary, because of how TOML documents are defined. Before we can talk about strings, especially multiline strings, we have to talk about what lines are. The spec defines newlines early on. So once we get to multiline basic strings and line ending backlashes in the spec, it's clear what a line is. That's why the examples put backslashes before newlines: a backslash before a string delimiter does not end a line.

Was this page helpful?
0 / 5 - 0 ratings