Are the following cases valid?
x="""""""""
y="""X""""""
Y"""
Personally, I don't think so and according to the spec I would say that the triplet quotation marks inside the multiline string ought to be escaped to prevent confusion with the closing triplet quotation marks. The ABNF definition however allows this syntax. It also is ambiguous; instaparse returns two possible parsing routes for this input.
When I change the input to
x="""\"\"\""""
y="""X\"\"\"\"\"\"
Y"""
then the ABNF is still ambiguous. One of the parsing results is a keyvalue pair with key = "x" and the value is the rest of the document, so eating key = "y" as part of the value of "x" as well.
The minimal case for triggering the ambiguous behavior is this one:
x=""""""
y=""""""
Triplet quotation marks inside multiline strings are clearly forbidden, hence that seems to be a problem with the ABNF.
As an outsider to the core TOML community, I hardly think it is my place to make a comment on this, but here goes:
I am not sure whether refining the triple-quote problem will be trivial to resolve without some kind of backtracking in the ABNF parser. While this is certainly possible in a technical sense, it obviously introduces its own set of issues. More fundamentally, however, it appears that trying to specify ``any sequence of characters not including the triple-quote delimiter'' would violate a rule of context-free grammars, though I must say that I am no expert in formal languages.
And now, for my next trick, a completely unwarranted proposal.
Though v0.5.0 has the stated goal of being ``close to final'' I think this issue is problematic enough to necessitate a breaking change to the TOML specification. In this regard, the back-tick seems like a good alternative to triple-quotes for delimiting multi-line strings.
Thank you for your time and consideration.
I don't think triple-quoted strings are particularly challenging to express in ABNF. Here is a rough attempt at how it might be done, for strings enclosed in three double quotation marks (adaptation to three single quotation marks is easy, except that in TOML those are "multi-line literal strings" which cannot contain escape sequences) and without trying to get all the minor details of the ABNF syntax right (I'm not an expert):
abnf
tq-string = 3quotation-mark tq-content 3quotation-mark
quotation-mark = %x22 ; " (copied from toml.abnf)
tq-content = [quotefree] *quote-block
quotefree = 1*(raw-char / escaped)
raw-char = ... ; any character that's not a quotation-mark or otherwise forbidden in strings
escaped = ... ; see current toml.abnf
quote-block = 1*2quotation-mark quotefree
@ChristianSi Yes, that look sane to me from a logical standpoint.
@onalant A backtick could work, but then you'd need yes another character for doing multi-line literal strings too. That would not make the configuration any easier to work with for humans. RIght now, it's double quotes for basic strings, single quotes for literal strings and tripling them means "multi-line". That's easier to remember than having four distinctive single-character string delimiters IMO.
More importantly, this would break backward compatibility quite a bit, since the multi-line strings using the triple quotes have been in the specification for a long time now.
The issue might be problematic for those who base a parser or validator on the ABNF, but let's fix it where the problem lies: in the ABNF, not in the specification.
That makes perfect sense. Thank you for your time.
This looks pretty good. We'll need two variants for single-quote and double-quote versions. We'll also need to account for a single instance of quotes and a double instance of quotes, since quotefree always requires at least one non-quote character.
Let me see what I can do tonight.
Two weeks later, finally got something that works. See the PR for details.
I'd been stuck on an implementation that worked for multiline basic strings but got stuck in Instaparse for multiline literal strings. I wanted to get both right.
How's it look to you all?
Or more to the point, would this work?
Haven't spent much time thinking about this, but here's a hot-take style thought: Unless there's some "major" backward compatibility concerns around it, anything that's not exactly 1 or 3 quotes should probably be invalid (unless escaped).
For double-quote marks and multiline basic strings, I'm arguing on #648 that it's better for end-users to allow one or two quotes just inside the delimiters.
pradyunsg: But surely you don't want to forbid one or two quotes at the start of a string, provided they are followed by a non-quote character? Those are allowed in Python, and considering that they are allowed anywhere else in a multi-line string (provided they are followed by a non-quote character), it would be highly illogical to forbid them at the start.
""""This is just a pointless statement," she said."""
# "This is just a pointless statement," she said.
"""""Why would anyone enclose my thoughts in double double quotation marks?"" she wondered."""
# ""Why would anyone enclose my thoughts in double double quotation marks?"" she wondered.
@mojombo ^
I'm gonna have to defer to you here to decide. This issue is about how we want to handle additional quotes, in triple-quoted strings. It's super vague and mostly just a polishing of the current specification. However, to make the ABNF canonical, we should have this figured out.
Bumping this to critical path.
I spent a bit of time playing around in a text editor removing and adding \ and quotes -- I think @eksortso's suggestion of "one or two quotes inside is OK" sounds like a good approach that's easy to explain and understand. :)
Since #648 has merged, this can be closed. :)
Most helpful comment
Bumping this to critical path.
I spent a bit of time playing around in a text editor removing and adding
\and quotes -- I think @eksortso's suggestion of "one or two quotes inside is OK" sounds like a good approach that's easy to explain and understand. :)