I'd love to see some sort of Regular Expression syntax in TOML, whether (preferably) as its own data type or as an alternate string syntax.
I suggest the seemingly-ubiquitous slash-delimited ECMAScript syntax as the forward slash is a unique initial character in the entire val subtree (unless I've missed something), so it slots nicely into the current syntax definition.
Big ones:
Small ones:
i for case insensitivity, which is sometimes nice to have for power usersSeeing as there's no one specific regex standard, I imagine TOML would have to take one of these paths to accomplish this:
Number 3 above seems to be the most feasible solution to me - the language spec can neglect the specifics of the implementation, text editor validators can check for the common things like matching brackets while leaving the specifics up to the parser, and the parser can only worry about the regex flavor relevant to it.
If that's good enough, I think this could be a powerful (yet easy to implement) feature for the language.
Why are literal strings not enough for this?
Literal strings are surrounded by single quotes. Like basic strings, they must appear on a single line:Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem.# What you see is what you get. winpath = 'C:\Users\nodejs\templates' winpath2 = '\\ServerX\admin$\system32\' quoted = 'Tom "Dubs" Preston-Werner' regex = '<\i\c*\s*>'
Literal strings don't really do either of these things
- It catches regular expression syntax errors (which are plentiful in config files) at the validation step for parsers
In the text editor as well, to at least some degree- As its own data type: It enables specifying either a regular expression or a literal string - a common and powerful pattern useful when, for instance, specifying file paths
TOML does not have a blessed regex format, and as you've noted, it doesn't make sense for it to have one. Different regex engines have very different behaviors and I _really_ don't like the idea of making something that fundamental to a literal (i.e. how it is interpreted) as ambiguous as this would be.
If this is for highlighting/completion in text editors, I don't see why this needs to be in the TOML standard -- folks are very welcome to write plugins and syntaxes for TOML that special case the literal strings to be treated a certain way. MagicPython makes such a choice for raw literals in Python, and the same can be done in a TOML highligher as well. This obviously does mean that someone's making opinionated choices, which someone might disagree with, but that's fine for something that's opt-in. When a change is made in a standard that's implemented across various programming languages and whatnot, such opinionated choices do need to have a strong unambiguous behaviour, which this would not.
Overall, I think this addition would not justify the sheer complexity it'd bring to TOML. It adds an ambiguous literal, pushes additional complexity onto TOML's parser implementations and doesn't really bring much value over what is already possible. I don't think this needs to happen at the "TOML standard" level.
Ah I was expecting that answer. Thanks anyways for your quick repsonse @pradyunsg
Most helpful comment
TOML does not have a blessed regex format, and as you've noted, it doesn't make sense for it to have one. Different regex engines have very different behaviors and I _really_ don't like the idea of making something that fundamental to a literal (i.e. how it is interpreted) as ambiguous as this would be.
If this is for highlighting/completion in text editors, I don't see why this needs to be in the TOML standard -- folks are very welcome to write plugins and syntaxes for TOML that special case the literal strings to be treated a certain way. MagicPython makes such a choice for raw literals in Python, and the same can be done in a TOML highligher as well. This obviously does mean that someone's making opinionated choices, which someone might disagree with, but that's fine for something that's opt-in. When a change is made in a standard that's implemented across various programming languages and whatnot, such opinionated choices do need to have a strong unambiguous behaviour, which this would not.
Overall, I think this addition would not justify the sheer complexity it'd bring to TOML. It adds an ambiguous literal, pushes additional complexity onto TOML's parser implementations and doesn't really bring much value over what is already possible. I don't think this needs to happen at the "TOML standard" level.