Toml: Few ideas

Created on 24 Jan 2015  路  10Comments  路  Source: toml-lang/toml

Let me begin with a little story: About half a year ago I was inspired by TOML to make another simple language for storing structured data. I modified an existing TOML parser and even wrote a spec, but never actually used it... Until now. Recently I decided to reconsider using TOML, so I came here and reread the spec. It still doesn't have all the features that I need for more complicated structures, but it's much closer to my needs. So close, actually, that I decided to make my language a _strict superset of TOML_. Then I thought to myself: Why not just try to propose these changes to TOML itself? And here I am.

You can play with this web-based TOML<->JSON converter to test all of these features (multi-line strings and dates not yet implemented).
But first, take look at this:

# part of actual config for my RESTful API generator
[user.property.id]
type = str
validate.match = /[0-9A-Z_a-z]+/
normalize.strip_whitespace

[user.property.email]
type = email

[user.property.avatar]
type = picture
validate.filesize.max = 5M
normalize.transcode.small = { w=32, h=32, type="png" }
normalize.transcode.big =   { w=64, h=64, type="png", no_zoom }

Here's a list of things that I'd like to add (ordered by importance, most needed first):

  1. Allow keys in key-value pairs to actually be paths like a.b.c = 42. Very useful when you need to define an object that just happens to have only one or two properties.
    Of course you could just use underscores (validate_filesize_max) and split it on the application side, but I think it should be handled by the parser, so the example application can just take all the properties of the validate object (which is usually only one) and interpret them as it wishes.
  2. Inline Objects. Seriously, we need this. It fits in the language definition: It's "obvious": People are already familiar with such notation. It definately is minimal: adding this to an existing TOML parser written in PEG took me literally 5 seconds, yeah, I was surprised as well.
    There are situations (like user.property.avatar.normalize.transcode in the example), that just can't be handled in TOML without repeating tons of variable names.
  3. First-class regex. Do I need to justify that? Would be very cool to have. Could go between /s and "bare regex" would just be a full regex beginning with ^ and ending with $
  4. Allow bare string as values to make keys and strings more consistent. Not sure about multi-line strings as keys, tho. That would be silly, but would basically unify definition of key and string.
  5. Extra suffixes for numbers, in particular: k, M, G and %, equivalent to e3, e6, e9 and e-2, respectively and ki, Mi and Gi, similar, but multiplies value by 2^10, 2^20 and 2^30 (more appropriate for file sizes). So, 1ki would mean 1024
  6. Drop true/false keywords. Allow value-less keys (meaning true) and interpret lack of key as false (not backwards-compatible).
    The cool feature that it enables are sets:

set = { "foo", "bar", "quix" } # when values are not unique we get error

Before somebody points out that it's not minimal: note that my parser is currently just 200 lines of code!

Most helpful comment

Introducing a new "null" value in TOML would open a huge can of worms. Let's never have nulls in configuration files.

I'd be open to hearing proposals to list keys in ways that would set them all true. Could be obvious, could be clear, could be useful, but could be taken up in the future.

All 10 comments

There are some good points you have made here but I think that it might be difficult to discuss your proposal as a whole.
Because your ideas are not directly related to each other and can be added independently I
suggest that you open an individual issue for each of them.
Inline tables are already discussed here: #235.

I'm currently writing a spec for my superset of TOML. Of course I'm gonna name it NOML (I'm so original, I know). I think that splitting some extra features into a separate language is actually a better idea. Leaving open for now.

Hey, @phaux, thanks for these thoughts. There are some interesting ideas in here. I'll address them each quickly and maybe explore some of them via PRs.

  1. Allow keys in key-value pairs to actually be paths. This might be nice. I need to spend some time exploring the ramifications of this, but I don't hate it. =)
  2. Inline Objects. Thanks for the feedback, this is still likely to go in.
  3. First-class regex. I understand where you're coming from, and I've put first-class regexen in specs before (BERT), but I just don't see it as a common enough use case to add to TOML.
  4. Allow bare string as values. I don't like this at all. One of the primary reasons I created TOML in the first place was because the ambiguity/non-obviousness of bare strings in YAML drives me absolutely bonkers.
  5. Extra suffixes for numbers. I could be convinced that these would be a valuable addition.
  6. Drop true/false keywords. I find explicit booleans extremely valuable, and I'm not quite sure what you're getting at with your explanation. Can you elaborate?

I vote for (1) and (2).

And (7): Allow value-less keys, but not for inline objects like @phaux said in (6). e.g.

key = #nothing, default: null or false or an empty object? or ignore the key?

Now I'm going to roll my own parser. :wink:

I currently went back to using CSON for my config files. My TOML parser still works though. I moved it from codepen there.

Please allow keys in key-value pairs to actually be paths. A the moment there is no readable syntax available for the case of multiple small tables embedded at deeper level of hierarchy:

  • Inline Table syntax does not fit because it is restricted to a single line;
  • Table syntax (absolute path in square brackets) does not fit because of excessive repetition of fully-qualified name of super-table. Bits of useful information are hard to spot in this flood.

Regarding (6) I would actually go with allowing the equals sign to be optional, i.e.
key
and
key=
would both result in the same thing.
Internally I would prefer an an empty or null or some such value. I find that config files do not always need the presence of an explicit true/false, in a flags file, for example, I would find it preferable to just check for the presence or absence of a flag. I do think it should not be the same as one of the other types, though, such as defaulting to false or true (I support keeping the keywords but allowing a value-less key).
I would not support having the absence of a value mean false, though, as there is a difference between a declaration of a value as false and the absence of a value (which would normally be interpreted as 'use the default value' which may be either false or true (or a non-Boolean value))

Introducing a new "null" value in TOML would open a huge can of worms. Let's never have nulls in configuration files.

I'd be open to hearing proposals to list keys in ways that would set them all true. Could be obvious, could be clear, could be useful, but could be taken up in the future.

  • Keys as paths would be neat. I'm a little worried what it says about your data model that you want it, though.

  • We've got inline objects now. 馃帀

  • Let's not require every implementation of TOML to incorporate a PCRE parser. You can't just parse the slashes, by the way, because these need to work:

    re_a = /(/)/
    re_b = /\//
    re_c = /[/]/
    re_d = /"/
    

    (I know this reasoning also applies to dates/times)

    • Barewords would be nice, but I understand that they're not added because barewords make it hard to add new syntax without breaking existing files.

    • Suffixes are an example of the barewords problem. If barewords get turned into strings, then this file will be considered valid, but parsed differently, by a TOML implementation that supports magnitude suffixes and one that doesn't:

      mst = 3k

      A parser that supports barewords but not magnitude suffixes will get this

      mst = "3k"

      If the parser adds support for magnitudes, it'll turn into this

      mst = 3000

    • Let's not add null. Typically, you can just use false, or the absence of a key, to communicate the same thing.

    • I would really like the ability to have functionality that is enabled by default and the user can explicitly turn it off. So let's not remove explicit booleans (even if implicit ones are added). Besides, false makes a good stand-in for null where people need it.

I reckon this can be closed.

AFAICT, everything except 1 and 5 have been decided upon. 1 got #499 and 5 has #427 for discussion.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jakwings picture jakwings  路  3Comments

uvtc picture uvtc  路  3Comments

jdfergason picture jdfergason  路  4Comments

chillum picture chillum  路  4Comments

ChristianSi picture ChristianSi  路  4Comments