Toml: why multiline basic string can ends with " inside now, while multiline literal string still can't ends with '?

Created on 12 Apr 2020  路  8Comments  路  Source: toml-lang/toml

I saw this is valid in v1.0.0-rc.1 now, which is different from v0.5.0:

str = """she said: "...""""

But what's the rule, how should I write the parser, and remember it while writing toml config file? Is this valid too?

str = """..."""""

even:

str = """...""""""

?

If they are all valid, then why this is invalid?

str = '''15: ''''''''''''''''''

I saw "You can write 1 or 2 single quotes anywhere within a multi-line literal string, but sequences of three or more single quotes are not permitted." in the spec.

Does this mean that, both multiline basic/literal can include ending quote marks, only if they are shorter than three? If so, I don't think current doc said it clear, neither explains nor examples.

PR is easy, but need to make sure at first.

Most helpful comment

@eksortso Just to make sure I understand this properly. Essentially, after you strip a verbatim string from three single or double quotes at the beginning and end, the part that's left can have single or double quotes resp. but not more than two consecutively (or three if the first is escaped).

Taking @LongTengDao's example a little further gives:

quot0=""" """ # valid
quot1=""" """" # valid
quot2=""" """"" # valid
quot3=""" """""" # invalid
apos0=''' ''' # valid
apos1=''' '''' # valid
apos2=''' ''''' # valid
apos3=''' '''''' # invalid

quot4="""""" # valid (6 in a row, empty string)
quot5="""" """ # valid
quot6=""""" """ # valid
quot7="""""" """ # invalid
apos4='''''' # valid (6 in a row, empty string) 
apos5='''' ''' # valid
apos6=''''' ''' # valid
apos7='''''' ''' # invalid

quot8="""""\"""""" # valid (5 at start, 6 at end) 
quot9="""""\"""\"""""" # valid (3 in the middle 5 at start, 6 at end) 

A little extreme maybe, but handy to test parsers.

All 8 comments

The intention is to illustrate that the first time the sequence ''' is seen inside a multi-line literal string it terminates the string, so for practical purposes you can't have three of them consecutively unless you intend to end the string.

You're right to point out that it's not the same as regular ML strings, though. I have no idea why a distinction is made between the two on this point. @pradyunsg, any input here?

Since I wrote the original PR, I'll do my best to explain. If we do need better examples than the ones in the README, then go ahead and post them.

Sorry @marzer, but triple quotes don't immediately end a string if there are quotes at the end of the string. I know that was the original plan. But it's slightly different now.

Sequences of one or two 's are allowed within multiline literal strings. What that means, by extension, is that a single set ''' is not necessarily the end of the string, because 1 or 2 quotes are allowed at the end. So a series of 3, 4, or 5 quotes terminates a MLL string that ends with 0, 1, or 2 quotes, respectively. And actually, in MLBs, you could even have 6 in a row, if the first one is escaped:

this-in-quotes-plus-two-more = """"THIS\""""""
#                                 ^^^^^-^^^

@LongTengDao, let me point out the strings in your examples. I'll keep highlighting turned off, because as of now, GitHub's TOML highlighting is out of date.

# Example 1
str = """she said: "...""""
#        ^^^^^^^^^^^^^^^
# Four quote marks at the end;
# the first one is part of the string,
# and the last three terminate the string.

# Example 2 is VALID.
str = """..."""""
#        ^^^^^
Same rule: 5 quotes, 2 on the inside.

# Example 3 is NOT VALID.
#str = """...""""""
##        ^^^^^!
# 6 single quotes in a row are not allowed.
# You can have 1 or 2 quotes in a row on the inside, but not 3.

# So Example 4 is also NOT VALID.
#str = '''15: ''''''''''''''''''
#         ^^^^^^!!!!!!!!!!!!!!!!

The original discussion is here: #640. This is an example of formatting that is intended to be obvious to users, even if it isn't immediately obvious or necessarily convenient for parser developers.

@eksortso Yah, I know the way the string ending thing works (i.e. that """ doesn't end the string if there's more " after it), but I was referring to the documentation of the ML literal strings which upon re-reading seemed ambiguous, though I guess it is the same.

@eksortso Do you mean this?

quot0=""" """ # valid
quot1=""" """" # valid
quot2=""" """"" # valid
quot3=""" """""" # invalid
apos0=''' ''' # valid
apos1=''' '''' # valid
apos2=''' ''''' # valid
apos3=''' '''''' # invalid

@marzer @eksortso Thank you! I read the history issue and PR I missed, and clear now.

@eksortso Just to make sure I understand this properly. Essentially, after you strip a verbatim string from three single or double quotes at the beginning and end, the part that's left can have single or double quotes resp. but not more than two consecutively (or three if the first is escaped).

Taking @LongTengDao's example a little further gives:

quot0=""" """ # valid
quot1=""" """" # valid
quot2=""" """"" # valid
quot3=""" """""" # invalid
apos0=''' ''' # valid
apos1=''' '''' # valid
apos2=''' ''''' # valid
apos3=''' '''''' # invalid

quot4="""""" # valid (6 in a row, empty string)
quot5="""" """ # valid
quot6=""""" """ # valid
quot7="""""" """ # invalid
apos4='''''' # valid (6 in a row, empty string) 
apos5='''' ''' # valid
apos6=''''' ''' # valid
apos7='''''' ''' # invalid

quot8="""""\"""""" # valid (5 at start, 6 at end) 
quot9="""""\"""\"""""" # valid (3 in the middle 5 at start, 6 at end) 

A little extreme maybe, but handy to test parsers.

@abelbraaksma @LongTengDao Those examples would serve as really good tests, I think!

Thanks @abelbraaksma, you just helped me fix some bugs! :D

Was this page helpful?
0 / 5 - 0 ratings

Related issues

genericptr picture genericptr  路  4Comments

chillum picture chillum  路  4Comments

ChristianSi picture ChristianSi  路  4Comments

tamasfe picture tamasfe  路  3Comments

keiichiiownsu12 picture keiichiiownsu12  路  4Comments