Toml: Proposal: Regular Expression literals

Created on 21 Feb 2021 · 4Comments · Source: toml-lang/toml

I'd love to see some sort of Regular Expression syntax in TOML, whether (preferably) as its own data type or as an alternate string syntax.

I suggest the seemingly-ubiquitous slash-delimited ECMAScript syntax as the forward slash is a unique initial character in the entire val subtree (unless I've missed something), so it slots nicely into the current syntax definition.

Pros

Big ones:

It catches regular expression syntax errors (which are plentiful in config files) at the validation step for parsers
- In the text editor as well, to at least some degree
As its own data type: It enables specifying either a regular expression or a literal string - a common and powerful pattern useful when, for instance, specifying file paths

Small ones:

It enables syntax highlighting in text editors
~~It could support the common regex flags, like i for case insensitivity, which is sometimes nice to have for power users~~
As its own data type: Parser APIs can return a native regex type directly

Cons

Seeing as there's no one specific regex standard, I imagine TOML would have to take one of these paths to accomplish this:

Define its own regex flavor (Yikes)
Pigeonhole itself to one sepcific regex flavor (perhaps abandoning goal of being widely useable)
Disregard the internal syntax of the regex literal, proceed with the understanding that the regex could be interpreted differently on different platforms (Introduces complexity for the users, also prevents complete validation in a language-agnostic environment like a text editor)
Allow the flavor to be specified within the syntax somehow (Introduces complexity for the language and its developers; generally kinda jank)

Number 3 above seems to be the most feasible solution to me - the language spec can neglect the specifics of the implementation, text editor validators can check for the common things like matching brackets while leaving the specifics up to the parser, and the parser can only worry about the regex flavor relevant to it.

If that's good enough, I think this could be a powerful (yet easy to implement) feature for the language.

new-syntax wontfix

Source

jacobconley

Most helpful comment

TOML does not have a blessed regex format, and as you've noted, it doesn't make sense for it to have one. Different regex engines have very different behaviors and I _really_ don't like the idea of making something that fundamental to a literal (i.e. how it is interpreted) as ambiguous as this would be.

If this is for highlighting/completion in text editors, I don't see why this needs to be in the TOML standard -- folks are very welcome to write plugins and syntaxes for TOML that special case the literal strings to be treated a certain way. MagicPython makes such a choice for raw literals in Python, and the same can be done in a TOML highligher as well. This obviously does mean that someone's making opinionated choices, which someone might disagree with, but that's fine for something that's opt-in. When a change is made in a standard that's implemented across various programming languages and whatnot, such opinionated choices do need to have a strong unambiguous behaviour, which this would not.

Overall, I think this addition would not justify the sheer complexity it'd bring to TOML. It adds an ambiguous literal, pushes additional complexity onto TOML's parser implementations and doesn't really bring much value over what is already possible. I don't think this needs to happen at the "TOML standard" level.

pradyunsg on 21 Feb 2021

👍3

All 4 comments

Why are literal strings not enough for this?

Literal strings are surrounded by single quotes. Like basic strings, they must appear on a single line:
# What you see is what you get.
winpath  = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted   = 'Tom "Dubs" Preston-Werner'
regex    = '<\i\c*\s*>'
Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem.

pradyunsg on 21 Feb 2021

Literal strings don't really do either of these things

It catches regular expression syntax errors (which are plentiful in config files) at the validation step for parsers
In the text editor as well, to at least some degree

As its own data type: It enables specifying either a regular expression or a literal string - a common and powerful pattern useful when, for instance, specifying file paths

jacobconley on 21 Feb 2021