Toml: Revisit array of table syntax

Created on 3 Mar 2015  Â·  70Comments  Â·  Source: toml-lang/toml

I found the type "array of table" not especially easy to grasp in TOML. This is because the syntax of an array of table is different from an array of, say, integer. It is not the case in JSON. So while I found TOML clearer than JSON regarding table, I found JSON easier to understand regarding array of table.

In the end, I was wondering if the type "array of table" was absolutely necessary in a config file. My point is that the way the data are stored (table of table vs. array of table) might be of low interest for the end-user that just want to modify some parameters.

I understand that, compared to "table of table", "array of table" has:

  • ordered elements
  • unnamed elements.

Among actual TOML usage, is there a situation where "array of table" is much more efficient than "table of table" ?

To be a little more specific, here are the comparison of both syntax (not exactly similar of course):
Array of table:

[[products]]
    name = "Hammer"
    sku = 738594937

[[products]]

[[products]]
    name = "Nail"
    sku = 284758393

Table of table:

[product.hammer]
    name = "Hammer"
    sku = 738594937

[product.empty]

[products.Nail]
    name = "Nail"
    sku = 284758393
    color = "gray"

IMO, the advantage of using table of table only are:

  • a unique syntax (less confusing)
  • each table has a name (less need of referring to the manual)

My question is certainly not if it is relevant to remove the array of table from the specs, but to see if a best practice could encourage to avoid array of table against table of table.

new-syntax

Most helpful comment

Arrays of tables look horrible, and would be the main thing pushing me away from using TOML. I think the concept is fine, but the syntax is poor.

Alternative 1:

[analyzers.filter]
  [#]
  type = "icu-tokenizer"

  [#]
  type = "lowercase"

  [#]
  type = "length"
  min = 2
  max = 35

Alternative 2:

[analyzers.filter]
  [#1]
  type = "icu-tokenizer"

  [#2]
  type = "lowercase"

  [#3]
  type = "length"
  min = 2
  max = 35

Alternative 3:

[analyzers.filter]#
type = "icu-tokenizer"

[analyzers.filter]#
type = "lowercase"

[analyzers.filter]#
type = "length"
min = 2
max = 35

(edited to add indentation, which would be optional)

All 70 comments

Arrays of tables are most useful when you don't know ahead of time how many things will be present in the array. If you know you're only ever going to have three things, then I absolutely agree with you: you should just represent them as individual tables.

But that's not always the case. Here's a good, real-world use case: I want to allow my users to define a pipeline in their configuration file. I don't know ahead of time how many things could be in the pipeline (it could be 1, 2, 3, ...), but I do know that each thing in the pipeline might have an arbitrarily complex initialization process (so I would like to have a table for each element in the pipeline so as to be able to flexibly specify each element's parameters). Using an array of tables is the most natural thing here, giving you things like

[[analyzers.filter]]
type = "icu-tokenizer"

[[analyzers.filter]]
type = "lowercase"

[[analyzers.filter]]
type = "length"
min = 2
max = 35

This isn't possible with tables of tables (you very well may lose the ordering of the filters, which is really important in a pipeline, depending on your parser's internal storage implementation). I don't think there's a generic guideline against arrays of tables, other than to think about your data types and make your configuration match. Here, what I'm asking people to configure is indeed an ordered list of things, so it makes sense to represent that as an ordered list inside my configuration file.

+1 to what @skystrife said. I use TOML for a similar purpose.

However, I also agree with @maxhaz about the table array confusion. I've been playing around with TOML for a while now, and I still find the syntax with the double braces oddly annoying. I don't have a better proposition right now though (besides merging the concepts of tables and table arrays, but that might prove difficult or require significant tradeoffs).

Thank you for both answers. This usage is quite convenient indeed, I agree.
I now see an array of table as a way to add instance of an object (in @skystrife example, an instance of a filter). Unless I am mistaken, a similar structure is widely used in xml config files.

Then, the available keys in an instance (e.g. type, min, max) could be defined in a doc or a schema (and self-documented by example in the toml file).

@skystrife, to play the devil's advocate, I can easily rewrite your file without losing any information while avoiding arrays of tables.

[analyzers.filter.1]
type = "icu-tokenizer"

[analyzers.filter.2]
type = "lowercase"

[analyzers.filter.3]
type = "length"
min = 2
max = 35

Although this requires the user to explicitly number the tables, it also makes it possible to add properties to tables later (which you always could do with all non-array tables) and, if a smart sort was used, to insert tables into the middle of the array (e.g. if you sorted "1_1" in between "1" and "2" or alternatively you could number them in the good old Basic style, "10", "20" and then insert "15").

It's of course not surprising that you can simulate an array with a table (and vice versa), it's just that specifically in TOML, tables can be manipulated more easily than arrays and with more flexibility. And if TOML is to be a _minimal_ format and if json->toml->json need not round-trip (which I think it already doesn't due to null), then I think @maxhaz has a point.

Arrays of tables look horrible, and would be the main thing pushing me away from using TOML. I think the concept is fine, but the syntax is poor.

Alternative 1:

[analyzers.filter]
  [#]
  type = "icu-tokenizer"

  [#]
  type = "lowercase"

  [#]
  type = "length"
  min = 2
  max = 35

Alternative 2:

[analyzers.filter]
  [#1]
  type = "icu-tokenizer"

  [#2]
  type = "lowercase"

  [#3]
  type = "length"
  min = 2
  max = 35

Alternative 3:

[analyzers.filter]#
type = "icu-tokenizer"

[analyzers.filter]#
type = "lowercase"

[analyzers.filter]#
type = "length"
min = 2
max = 35

(edited to add indentation, which would be optional)

@jodastephen Alternative 3 is "comment at the end of a line"; the second is as good, but I prefer the first one because I can add something in between without incrementing every tag after it.

+1 @jodastephen, the syntax for array of tables is indeed counter-intuitive

I like variant 1 best too. Also the possibility mentioned by @franklinyu to have multi-dimensional arrays of tables, which I will shamelessly copy and paste here:

[nested_array_table]
  [#]
    [##]
    value = 1
    [##]
    value = 0

  [#]
    [##]
    value = 0
    [##]
    value = 1
    comment = "bottom right diagonal element"

However, alternative 3 enables also multi-dimensional arrays but I think alternative 1 is better.

[nested_array_table#]
    [nested_array_table##]
    value = 1
    [nested_array_table##]
    value = 0

[nested_array_table#]
    [nested_array_table##]
    value = 0
    [nested_array_table##]
    value = 1
    comment = "bottom right diagonal element"

Edit: Fixed Indentions

@mdickie Hmm, you mean that the second [nested_array_table#] should be indented one step further than the first one?

Oh, sorry I fix that. It should be agnostic of indentions, so that unfixed version should also work.

It's interesting that GitHub currently renders it correctly, since nested_array_table# should not yet be a valid bare table name. For quoted table name in @mdickie's alternative, I guess we can do

[dog."tater.man"#]
    [dog."tater.man"##]
    value = 1
    [dog."tater.man"##]
    value = 0

[dog."tater.man"#]
    [dog."tater.man"##]
    value = 0
    [dog."tater.man"##]
    value = 1
    comment = "bottom right diagonal element"

We're on the cusp of 1.0. Arrays of table syntax isn't changing.

So no multidimensional arrays of tables then? This would mean, everything which starts in JSON with

[[{

will still not be representable in TOML, which is kind of a pity.

I propose you avoid usage of # for this... It is only going to make parsing complicated.

How about you use ? (or *)? Example:

[analyzers.filter.?]
type = "icu-tokenizer"

[analyzers.filter.?]
type = "lowercase"

[analyzers.filter.?]
type = "length"
min = 2
max = 35

A star would be reasonable because Markdown already uses them in lists.

@dejlek I guess you mean that parser need to distinguish # in array, from # indicating the begin of comment? Then I prefer * over ? for same reason mentioned by @mdickie.

I'm also not madly in love with the current syntax for complicated scenarios, but it does an admiral job for simple ones. TOML 1.0 is imminent, so things aren't going to change at this point, but we can definitely talk about some changes in this area when it's time to think about 2.0.

@mojombo I have full respect for your decision about this. I think it is a shame that you are closing this issue though, since it has not been solved and you are hiding/losing the useful information posted by the commenters above.

That's a fair point. I'll reopen and label appropriately.

The situation is particularly bad with recursive data structures. Take the following recursive go struct:

SinkConfig struct {
    Transform *TransformConfig
    Sinks     []*SinkConfig
    Output    *OutputConfig
}

Here's a TOML representation of a value in this recursive schema:

[Transform]
  TransformType = ""

[[Sinks]]
  [Sinks.Transform]
    TransformType = ""

  [[Sinks.Sinks]]
    [Sinks.Sinks.Transform]
      TransformType = ""

    [[Sinks.Sinks.Sinks]]
      [Sinks.Sinks.Sinks.Transform]
        TransformType = "Prune"
      [Sinks.Sinks.Sinks.Output]
        OutputType = "Stdout"
    [Sinks.Sinks.Output]
      OutputType = "Stderr"

Beautiful. Here the repetition of the array field name and it's ancestors really hurt readability. YAML does slightly better:

sinks:
- transform:
    transformtype: ""
  sinks:
  - transform:
      transformtype: ""
    sinks:
    - transform:
        transformtype: Prune
      output:
        outputtype: Stdout
    output:
      outputtype: Stderr

I understand it is a design aim of TOML to include the full path of keys to a table value, but for an arrays of tables the same path may appear not only at every element of the same array but at different locations in the file in different structures that share the same route. I think either it needs to include a specific index, which is verbose and annoying when editing file, or we have to lose the context when we enter an array of tables, so that the table naming looks like we started a new root, as if we are in a new TOML file.

This would look something like this (although note these are all 1-element table arrays):

[Transform]
  TransformType = ""

Sinks = [
  [Transform]
    TransformType = ""

  Sinks = [
    [Transform]
      TransformType = ""

    Sinks = [
      [Transform]
        TransformType = "Prune"
      [Output]
        OutputType = "Stdout"
    ]
    [Output]
      OutputType = "Stderr"
  ]
]

There could be a different/better syntax. But I think accepting that elements of a table array are anonymous is a way out of this ugliness for certain cases. Or at least to allow a context-free syntax...

Using inline tables almost gets you there:

Sinks = {Transform = {TransformType = ""}, Sinks = [
  {Transform = {TransformType = ""}, Sinks = [
    {Transform = {TransformType = ""}, Sinks = [
      {Transform = {TransformType = "Prune"}, Output = {OutputType = "Stdout"}}
    ], Output = {OutputType = "Stderr"}}
  ]}
]}

but I agree this is overly ugly and is a sort of hacky workaround for the "inline tables must have no newlines" rule. If you relax that and allow multi-line inline tables, you can get the following:

Sinks = {
  Transform = {TransformType = ""},
  Sinks = [{
    Transform = {TransformType = ""},
    Sinks = [{
      Transform = {TransformType = ""},
      Sinks = [{
        Transform = {TransformType = "Prune"},
        Output = {OutputType = "Stdout"}
      }],
      Output = {OutputType = "Stderr"}
    }]
  }]
}

which I think, while still ugly, is at least serviceable.

So I don't really understand why people are strongly against the array of tables syntax, or why they would prefer to use # symbols. To me, it's simple, easy to read, and easy to write.

While it is unfortunate that it might require some explanation before people know what the double bracket syntax means when reading a config file, reading and understanding the whole TOML spec still only takes 5-10 minutes, which IMO is good enough that it doesn't really need to be immediately understandable. Especially since it's a relatively niche use case which most people can just ignore anyway.

@skystrife you're quite right that does get enough of the way there for me, particularly with relaxed newlines.

@michael-younkin the syntax is not the issue. The issue is that as soon as the key.subkey.subsubkey identifiers become ambiguous as in nested arrays then they lose their value and obfuscate rather than clarify where we are in the structure. I'm not suggesting '#' signs would be any better. And I also don't think that it's a niche use case when TOML is thought of as a JSON/YAML substitute and recursive data structures are frequently used.

TOML is though of as a JSON/YAML substitute and recursive data structures are frequently used.

It's not. TOML is a configuration file format. Sometimes, JSON or YAML are used for configuration files, so there are overlapping use cases. TOML is not a general purpose replacement for JSON or YAML.

That makes sense, and changes how I would think about it. In my use case I happen to have a recursive data structure that happens to be config, though unlike the rest of the config where each value is clearly named. I can use an inline table to represent this. Though then I'd like to be able to have a specific key marshalled as an inline array rather than a table array.

Also thank you for your excellent library, much appreciated.

My objection to the [[]] syntax is it's not really obvious to me how to represent this:

a: [ { b: 1, c: { d: 2 } } ]

I instinctively want to write something like

[[a]]
b = 1
[[a.c]]
d = 2

but that's wrong. The correct answer is:
[[a]]
b = 1
[a.c]
d = 2

which feels odd. It's not obvious to me that [a.c] is related to [[a]].

I'd be happier with a notation like:

[a[]]
b = 1
[a[].c]
d = 2

which looks way more obvious to me.

The ugliness in syntax of the array of tables is the main reason we're not using it at my company. It's especially bad in deeply nested structures.

I totally agree with @skystrife that an acceptable solution would be to simply allow newlines (and trailing commas) in inline tables. Compare the following:

[[shared_settings.logging.handlers]]
  level = "info"
  name = "default"
  output = "stdout"

[[shared_settings.logging.handlers]]
  level = "error"
  name = "stderr"
  output = "stderr"

[[shared_settings.logging.handlers]]
  level = "info"
  name = "access"
  output = "/var/log/access.log"

[[shared_settings.logging.loggers]]
  handlers = ["default","stderr"]
  level = "info"
  name = "default"

[[shared_settings.logging.loggers]]
  handlers = ["default"]
  level = "info"
  name = "access"

To the one with inline tables with newlines:

[shared_settings.logging]

handlers = [
    {
        name = "default",
        output =  "stdout",
        level = "info"
    }, {
        name = "stderr",
        output =  "stderr",
        level = "error"
    }, {
        name = "access",
        output =  "/var/log/access.log",
        level = "info"
    }
]

loggers = [
    {
        name = "default",
        level = "info",
        handlers = ["default", "stderr"]
    }, {
        name = "access",
        level = "info",
        handlers = ["default"]
    },
]

In my opinion the second one is nicer even though it has some more line mainly because the two different lists of tables are clearly separate on first glance.
Very deeply nested structures (4+ levels deep) would also benefit from this change small change to spec.

I won't talk about ugliness, but in regards to obviousness there is certainly room for improvement.
I think the main issue is what felix9 outlined.
Essentially the syntax [a.b.c] does not do just the obvious thing, but has the added complication of checking if each level is an array and then implicitly selecting the last element of that array. Once you realize that, it becomes simple to use this syntax, but it still has its drawbacks. For example, without this complication any [whatever.section] can be moved around and reordered, while with this array mechanic the order of some sections becomes significant, and they are not marked in any distinguishable way.

So I do agree an alternative syntax is needed. In particular, the currently implicit "select the last array element" operation needs to have an explicit syntax instead.

The "add a new array element" operation can continue using its current syntax of doubled brackets or can be changed to a more obvious (at least to PHP programmers 😛) inline syntax. It is possible to even overload the same syntax as "select the last array element" with the presumption that "add a new array element" always happens at the deepest level, while "select the last array element" always happens at an upper level in the hierarchy and never the deepest one. I can not say if such overloading is obvious enough to everyone though.

Some possibilities:

  • current syntax for "add", .# for "select last"
[[a]]
b=1
[a.#.c]
d=2
  • inline [] for "add", .# for "select last"
[a[]]
b=1
[a.#.c]
d=2
  • .# for both, if at the end then means "add" otherwise means "select last"
[a.#]
b=1
[a.#.c]
d=2
  • like the previous but with [] instead of .# (henceforth known as "the felix9 syntax" 😉)
[a[]]
b=1
[a[].c]
d=2

Naturally, using any other symbol instead of # is fine as well. Earlier * and ? have been suggested.

I like @felix9's end-bracket syntax proposal for two reasons:

  1. The brackets identify a table array's elements wherever it is used after its declaration.
  2. The brackets show up at the end of the table array's name, always.

End Brackets Refer To Table Array Elements

Consider the given example:

# Using end-bracket notation, this would be valid.
[a[]]
b=1
[a[].c]
d=2

So [a[]] declares a table array, and specifically that array's first element. And here, [a[].c] refers to c inside a's first element. No confusing a with a simple table, which is great to keep in mind.

Many programming languages work the same way as this. Brackets declare an array, and then brackets refer to the array's elements. The only difference in TOML would be that we'd repeat the header "declaration" to mark the next element of the array.

So [a[]] would start the first element, then a[].c in an assignment would use that first element, and [a[]] again would start a second element.

And if we adopt dotted keys per #499 and #505, then all instances of a[] will refer to that first element until the next [a[]] is encountered..

Reading The Ends Of Identifiers

The importance of looking at the end of the identifier becomes clear if you're working with deeply nested configurations, per @JelteF above. If you're trying to determine which element of which table array you're in, just look at the _end_ of the name. The same principle would apply if we consistently referred to table arrays and their elements with end brackets.

# Using the felix9 notation and looking just at headers:
[shared_settings.logging.handlers[]]    # Start of "handlers[]"
[shared_settings.logging.handlers[]]
[shared_settings.logging.handlers[]]
[shared_settings.logging.loggers[]]     # Start of "loggers[]"
[shared_settings.logging.loggers[]]

Questions Raised

End-bracket syntax does beg a question. Consider this:

first_10_fibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 33]

Can we use brackets on _any_ array' s name? We could say yes, optionally. But arguably, we want to reserve end brackets for table arrays and their elements.

# For consistency's sake, should this also be valid?
first_10_fibonacci[] = [0, 1, 1, 2, 3, 5, 8, 13, 21, 33]

Another begged question: how do we declare an _empty_ table array? We could just use a = [] and let the consumer figure it out. Or, given the above, we could use a[] = [], which is perhaps clearer. What do you think?

Can we use repeated bracket references to _declare_ multiple elements with inline tables? I'm against them, but what do you think?

[shared_settings.logging]

# Does this solve a problem? Or would it be ripe for abuse?
handlers[] = {name = "default", level = "info",  output = "stdout"}
handlers[] = {name = "stderr",  level = "error", output = "stderr"}
handlers[] = {name = "access",  level = "info",  output = "/var/log/access.log"}

# Because we can already use multi-line arrays with inline tables:
loggers = [
    {name = "default", level = "info", handlers = ["default", "stderr"]},
    {name = "access",  level = "info", handlers = ["default"]},
]

(Go to #516 to argue about multi-line tables. I'm against them, too.)

Considering backwards compatibility, the existing syntax could be kept as deprecated. The parser can identify table array element references, even if end brackets are not used. But these would still be confusing for humans to read.

As another data point, now that I have more experience working with TOML, I have to say that I've grown to dislike the [[table.subtable]] syntax. It has caused a lot of confusion among people learning TOML, and IMO is difficult to read and write. It's also more difficult to deserialize (depending on what you're using it for), since it usually requires more validation. So I tend to avoid it whenever possible.

That said, I also don't like the [a[]] or [a.#] syntax, since IMO they are also confusing.

Another proposal: you can specify array indexes:

[a.b[1].c]
x = 1

is equivalent to

{ a: { b: [ null, { c: { x: 1 } } ] } }

And there are two special array indexes.
last means the last element
new means a new element at the end

[a.b[new]]
x = 1
[a.b[last].c]
x = 2
[a.b[new].c]
x = 3

is equivalent to

{ a: { b: [ { x: 1, c: { x: 2 } }, { c: { x: 3 } } ] } }

The informal explanation of this syntax is:

  • Every [bracket] section in the file is a table.
  • The thing in the [brackets] is the address of the table.
  • The address is in familiar programming notation, except for the keywords 'last' and 'new'.

I think that arrays of tables is the only feature of TOML that I had issues learning initially, mostly because I encountered them in the cargo manifest before reading the spec. Once I read the spec, things made more sense and it clarified the use case in cargo's manifest. I consider this a huge success for a configuration language (or any language, really).

It seems that TOML's goals are to become widely used in production. Anecdotally, realizing that goal requires a commitment to backwards compatibility once a certain popularity threshold is met. Any future way of specifying arrays of tables only adds complexity without a breaking change. That would suggest that entertaining alternatives at this point could hamper adoption.

I'd also caution against a phenomenon that I see fairly often: people often see something as complex and proceed to suggest solutions which they believe are simpler. I almost never disagree with the problem statement, but usually the suggested solutions don't improve the situation, at least in the case of specifications that have been widely implemented and discussed, with a focus on being minimal.

I think the current syntax is the 'least evil' option at this point and the focus should be on documentation and education.

Setting aside solutions for now, can anyone talk about what confusions they, or their colleagues, have experienced with table arrays or their syntax?

The dotted keys leave me confused in this context. Say you have a subtable x.a and a table array x.b. If you refer to x.a.key, it's a reference to a key inside a. If you refer to x.b.key, it's a reference to a key inside the nth element of b. So that dot is doing double duty. That can get confusing.

@eksortso Exactly what you said - there is an implicit "if it's an array, get the last element of the array" operation at every dot that is very non-obvious. I'd like that to be replaced with an explicit notation. I guess my previous post about it went too much into "tl;dr" category though...
Edit: or if the current syntax is kept, this aspect needs to be explained in the documentation more prominently.

@georgir Your prior post is what moved me to push for the end-bracket notation for table arrays. I recommended deprecating the existing syntax, which is not very obvious and dips below minimal into the realm of literally "confusing." The current syntax would still work if the end-bracket notation took its place.

@ahmedcharles I understand your concerns. But I maintain that the current syntax is more "evil" than you think it is. And forcing users to adhere to a confusing standard is a wicked thing to do. A better (and obviously backwards-compatible) table array syntax is called for, in order for TOML to reach its full potential.

@mbyio You dislike the current syntax, and object to other forms discussed, including end brackets, as still being too confusing. So then, what other syntax proposals would you consider to be _less_ confusing?

523 has a suggestion of "[table.[ArrayOfTable].leaf] and other combinations, instead of just having [[table.ArrayOfTable.leaf]]", linking to https://github.com/betrixed/toml-zephir as an implementation apparently.

Any updates on this?

@cxw42 It's labeled "post-1.0", so don't except any updates soon. First TOML 1.0 has to be finalized, and that will take time.

@ChristianSi thanks for pointing that out! I just ask that 1.0 leave syntactic room for some of the options above, or other experimentation in this area.

Why not [[foo].[bar]]? By default creating new elements at the rightmost side, so to create a new element at another level you'd do:

[[foo]]
[[foo].[bar]]

This reads as: table in array 'foo' has table in array 'bar'

(or should it be [[[foo].bar]]?)

Additionally,

[[[foo]]]

reads as: table in array in array 'foo'. and the same thing applies, to create a new entry you drop a pair:

[[foo]]
[[[foo]]]

Bonus points for backwards compatibility!

@SoniEx2 I like the first part. It makes it crystal clear that a name surrounded by extra brackets refers to a table array. I also like how you make it explicit that, if the rightmost key is bracketed, then you're referring to a new element of that table array.

It makes the parts of the README.md example a lot more obvious!

[[fruit]]  # 1st fruit element
  name = "apple"

  [[fruit].physical]  # physical subtable, in 1st fruit element
    color = "red"
    shape = "round"

  [[fruit].[variety]]  # 1st variety element, in 1st fruit
    name = "red delicious"

  [[fruit].[variety]]  # 2nd variety in 1st fruit
    name = "granny smith"

[[fruit]]  # 2nd fruit
  name = "banana"

  [[fruit].[variety]]  # 1st variety in 2nd fruit
    name = "plantain"

But, the triple-bracket [[[foo]]] thing is confusing. Do we actually need a syntax for table-array arrays? For configurations, it's probably not necessary, and it would be a lot more clear to just nest those arrays and use key names, like fruit and variety in the above example (or like foo and bar in your first example) to refer to the outer and inner arrays respectively.

Once TOML v1.0 gets released, I'd love to introduce a version of this syntax that is limited to just two bracket pairs deep. The existing syntax could be deprecated, but still be kept around for backwards compatibility.

the existing syntax?

I thought my thoughts were just an extension/superset of the existing syntax?

(or, at least, that's what I was going for...)

@eksortso I like your example. I would caution that as soon as any syntax is limited to a specific nesting depth, someone will come up with a use case that needs more than that depth :) . I know TOML isn't trying to be a universal serialization language, but I hesitate to add constraints without also providing escape hatches (even if they look uglier).

Thanks, though I can't take credit for the example. I just adapted the example from the Array of Tables section of the readme. But using that example did clarify a few ideas that I had found difficult to express with just my own words.

I still entertain the notion that TOML could become a full-service data description language. In fact I've seen some examples of data sets written in TOML, so it's plausible. But that may have been because it made sense to the person using it to work within TOML's constraints. Its format makes it appealing for describing simply constructed data sets. That may be worth embracing, even if it rules out deeply nested or overly complex data structures.

The two brackets' depth involves using those brackets for different purposes. The outer brackets indicate a new context within the document, i.e. a new table's definition. Those outer brackets surround the entire path. Inner brackets would surround undotted names only, and would indicate a table array and its current element. So the format isn't a limitation. It's what needs to be obvious about the data structures involved.

So I've gotta face facts: TOML discourages deeply nested structures. For people who encounter it, that may be a good thing.

I thought my thoughts were just an extension/superset of the existing syntax?

(or, at least, that's what I was going for...)

It would be awkward to have two very similar, fully-supported syntaxes for nested table arrays. That's why I was thinking that one syntax would ultimately displace the other.

Fair, so the second option ([[[foo].bar]]) would be better. That one is backwards compatible because the idea is that you put the full path within [].

As you currently do:

[[full.path.here]]

It also should be slightly easier to parse. (first [ indicates that it's a key, second [ indicates we're adding to an array, third and subsequent [ indicates path to get there. if that make sense. (if not, I can try to explain it differently.))

Woah, when you said "backwards compatible," I thought that just meant that newer parsers that could parse the new syntax could still parse the old syntax. The new syntax doesn't conflict with the old syntax, so a parser that can read both types is feasible.

But that doesn't mean that the syntaxes ought to act the same in the same situation. Let's say you have a table baz, and you want it to contain a table array qux. The old syntax expresses this as [[baz.qux]]; two brackets surround the _entire path_. But the new syntax, as I interpreted it above, would express this as [baz.[qux]]; brackets surround the _right-most name_, and the double right brackets at the end indicate a new element of the table array baz.qux.

So when I see [[[foo].bar]], it looks like a strange fusion of the two syntaxes. It's confusing, and it doesn't seem to be necessary. I can't tell if that's a table array foo's current element containing a single table bar, or if both foo and bar are table arrays and the newest element of bar is being introduced.


So maybe I've misinterpreted your syntax. Further explanation may be necessary. Can you show me how you would write a single table bar inside the first element of foo? And then, show me how you would write the first element of bar inside a table named foo?

For the first form (a single table bar inside the first element of foo), the old syntax would have:

[[foo]]  # table array
  [foo.bar] # single bar in 1st foo

The syntax as I'm interpreting it would have:

[[foo]]  # table array, same as before
  [[foo].bar]  # single bar in 1st foo

For the second form (the first element of bar inside a table named foo), we have the old syntax:

[foo]  # single foo
  [[foo.bar]]  # 1st element of bar in foo

And my interpretation of the new syntax:

[foo]  # single foo
  [foo.[bar]]  # 1st element of bar in foo

Granted, my interpretation isn't as easy to parse, except that all table sections begin with a line that starts with a left-bracket [. But I think the readability is a more valuable aspect of the syntaxes we're considering. Consider that last [foo.[bar]] above. That foo is a single table, that bar names an array in foo, and that (because it's right-most) [bar] is an element on bar. You know what each name stands for, and you can see clearly when elements of table arrays are in play. That readability is what attracts me to this interpretation.

Sorry for the long-winded response. Anyway, can you show me how you'd write the two forms described above?

Uh. I'm actually more confused now.

Can you show me an example using inline tables and arrays, and show me how you'd expect it to look?

(I hadn't thought of the possibility of [[foo].bar] - that kinda breaks what I was going for with "easier to parse". Oh well.)

I'll try. Here's one example.

foo = {bar = [{a=1}, {b=2}]}

foo is a table, and bar is an array of two tables. With the "two brackets deep" syntax, that can be written like this:

[foo.[bar]]
a = 1
[foo.[bar]]
b = 2

This is very close to the data sets I've seen in the wild, which used classical [[foo.bar]] notation.


Here's a second example.

foo = [{bar = {a=1}}, {baz = {b=2}}]

Here, foo is an array of two tables, but each element contains a table value, assigned to bar and baz respectively. That would be equivalent to the following. The first [[foo]] is probably not strictly necessary. But the second [[foo]] is very much necessary, and you can probably figure out why.

[[foo]]
[[foo].bar]
a = 1

[[foo]]
[[foo].baz]
b = 2

Yeah, I'd have used

[[foo.bar]]
a = 1
[[foo.bar]]
b = 2

for the first, and as stated for the second.

In other words, full path must be within []. This makes it recursive but I like it that way tbh and I think it looks neater.

I like @eksortso 's proposal, even if it was based on a misunderstanding. I think that enclosing just those dotted name parts that are actually tables in an additional pair of brackets is easy to grasp and read, and fairly easy to write.

I might be completely off here, but instead of repetition why not have the separator be first characters? At least for me it makes it a bit easier to read, even if I have to backtrack

[analyzers]
    [[.filter]]
        type = "icu-tokenizer"
    [[.filter]]
        type = "lowercase"
    [[.filter]]
        type = "length"
        min = 2
        max = 35

[Transform]
    TransformType = ""

[[Sinks]]
    [.Transform]
        TransformType = ""

    [[.Sinks]]
        [.Output]
            OutputType = "Stderr"
        [.Transform]
            TransformType = ""
        [[.Sinks]]
            [.Transform]
                TransformType = "Prune"
            [.Output]
                OutputType = "Stdout"              

[[fruit]]  # 1st fruit element
  name = "apple"

  [.physical]  # physical subtable, in 1st fruit element
    color = "red"
    shape = "round"

  [[.variety]]  # 1st variety element, in 1st fruit
    name = "red delicious"

  [[.variety]]  # 2nd variety in 1st fruit
    name = "granny smith"

[[fruit]]  # 2nd fruit
  name = "banana"

  [[.variety]]  # 1st variety in 2nd fruit
    name = "plantain"

[foo]
    [[.bar]]
    a = 1
    [[.bar]]
    b = 2

oh... i see the problem now... :(

it's impossible to know if the varietyis a property of fruit or fruis.physical :(

@tw1nk That is true. Folks have suggested these sorts of nested dot notations before, but each variant raises this sort of confusion.

But the repetition of keys serves an actual purpose. Over time I've come to accept the idea that, even though deep nesting in TOML is possible, the syntax encourages flattening complex data structures. A relatively flat, hand-written configuration structure makes sense. Deeply nested data types are another story though. So the amount of name repetition is acceptable for practical concerns, even for simple data exchange.

Let me see if I'm on the right track here.

For reasons unrelated to this issue, I've decided to set package.autoexamples to false in my Cargo.toml. In Rust's package manager, this means that I now must create an array of tables for each example. So I currently have the following:

[[example]]
name = "gcd"

[[example]]
name = "merge_sort"

[[example]]
name = "quick_sort"

# ...

However, this is quite ugly, as others here have noted. I also understand that TOML has both arrays, and inline tables, which immediately made me think I could implement this in a more natural way already, with something like:

example = [
    { name = "gcd" },
    { name = "merge_sort" },
    { name = "quick_sort" },
    # ...
]

However, there are seemingly two issues, one preventing me from doing this at all, and one minor naming thing:

  1. This doesn't work because I can't seem to assign to the top level (is this a correct understanding of the situation?)
  2. The plurality of the name is now wrong (although depending on convention, this may be a Cargo issue)

I think I'd be perfectly happy with TOML's arrays of tables if I could just use arrays of inline tables like this at the top level. Thoughts?

@nixpulvis The TOML given in your examples _should_ be valid, and should result in the same data structures. There's only going to be issues assigning if you mix methods, e.g.:

example = [
    { name = "merge_sort" },
    { name = "quick_sort" },
]

[[example]] # boom
name = "gcd"

Mind you, by "your examples should be valid", I mean "should" as per the spec. Some parsers treat inline tables vs. regular tables differently, same for [[array of tables]] vs [{table}, {table}]. Ideally they shouldn't be treated differently (because that runs counter to TOML's unambiguous design), but YMMV. I've no idea how cargo handles these things but it might just be a Quality-Of-Implementation thing.

@marzer interesting, I didn't try putting the example literally at the top level. :rofl: When I put it before everything else it works!

The issue is now that when I do example = [...] anywhere after a [...] (as I very much would like to do) it treats it as an entry into that table.

I do not have a solution offhand, but at least in Rust this is very close to what I want.

Ah, well then what you're experiencing is correct TOML behaviour. [tables] and [[array tables]] effectively delimit sections of the document; whatever appears underneath them is a part of them, _except_ another [table] or [[array table]], which starts a new section. A key = value pair will always be a member of whatever the current 'section' is, so the only way to have stuff be a member of the top level of the document (the 'root' table) is to simply list it before any other table headers.

So really the only solution is to re-structure your document.

@marzer global state bites again :frowning_face:

I generally really like TOML, however this is unfortunate. I think I'd personally solve this with commas and semicolons. For example:

[package]
name = "foo"
version = "0.0.1"

would become:

[package]
name = "foo",
version = "0.0.1";

Although, a bad parser may make this confusing to people, I can imagine.

It just really sucks that I'm forced to move my array to the very top of my document, just because I want to change the format I write it in. This is counterintuitive, and forces a poor configuration structure upon me.

I'm half tempted to suggest [] as the "root":

[package]
name = "foo"

[]
example = [
...
]

[thing]
etc

but I think this is invalid toml:

[package]
name = "foo"

[other]
thing = "bar"

[package]
version = "0.0.0"

edit: another option would be to allow all bare keys to go at the end of the TOML, using a separator similar to markdown's hr:

[stuff]
[things]
---
extra = {}

but disallow headers there. and if this is used, you can't have bare keys at the start.

What? It's not counter-intuitive at all. Things belong to whatever header they appear under, which is how headings generally work in just about any type of document ever.

It's true that if you're going for a more JSON-like representation then it's a bit awkward in TOML, but that's because TOML is meant to be 'flat'. If you fight against that it will get complex, but that's true of all formats- trying to make them something they're not meant to be is asking for trouble. If you think of TOML more like "INI but less shit" you will have an easier time with it.

Forcing some keys to be at the top (for stylistic reasons), is very counterintuitive to me. I mean, it makes sense when you think about the details of TOML, but it's not how one would expect a config format to behave in my opinion.

Perhaps a better word would be, counterproductive, or just gross.

It's not arbitrarily "forcing some keys to be at the top", it's just the top-level keys go literally at the top-level of the document.

The current syntax is the only thing I find unintuitive about TOML. What about:

[products[]]
    name = "Hammer"
    sku = 738594937

[products[]]

[products[]]
    name = "Nail"
    sku = 284758393

This is similar to array initialization in many languages and would at least give people some hint of what is going on here. PHP looks to be the only language that uses this syntax for appending to arrays.

I was thinking about this just the other day myself :) . Along similar lines, but with different syntax, what about a verb-noun structure in section headers? E.g.:

[next products]
name=foo
[next products]
name=bar

resulting in products = [{name => "foo"}, {name => "bar"}].

next ... would add an array element. We could catch errors this way, too. For example:

[product]
foo=bar   # now product is a table {foo=>'bar'}
[next product]
bat=baz   # now product is an array of tables [{foo=>'bar'}, {bat=>'baz'}]
# ... much later
[product]   # fatal error: trying to turn an array back into a single table
            # Issue an error message says "Please use '[next product]' to add to the 'product' array"

This would also make room in the syntax for future expansion, by expanding the verb set.

(Apologies if someone already suggested this and I missed it in my review of the thread!)

@mkerost: I don't see that as an improvement. The syntax would be very similar to the current one and it would be harder to remember than the simple rule: "just double the opening and closing bracket". Also, every [products[]] does not initialize an array, but adds a member to it. I don't think that [] is used for that purpose in any (reasonable) programming language – except for PHP, as you say, but

$cart[] = "foo";  // add "foo" to $cart

is terrible and certainly not a model to follow!! is strange and doesn't suggest itself as a good and intuitive model to follow.

@cxw42: Your proposal is appreciated, but I'd say it's bad for several reasons. First, it makes arrays of tables look like tables:

[product]  # This seems to be a table
foo="bar"

But later (maybe much later) in the same document:

[next product]  # But now it has been turned into an array. SURPRISE SURPRISE!!!
bat="baz"

Also, TOML is not a programming language and should not look like one. Hence no keywords, please!

Finally, keywords would tie TOML to one specific natural language (English), but it should be language-neutral.

@ChristianSi : You summed up the downside to this approach, but terrible is in the eye of the beholder. I find double brackets surrounding a key to be "terrible" because there is absolutely no intuition what it means. My proposal, to quote myself, "at least give people some hint of what is going on here".

The current table array syntax and any alternative table array syntax that doesn't use 0,1,2...n labeling is never going to be completely intuitive. The reason is single bracket table keys refer to a single thing and can only be defined once, while table array keys refer to multiple things and will be defined identically multiple times. All new people will look at this syntax confusingly and need to go to the TOML reference guide to understand what is going on and why some table definitions can be defined once while other table definitions can be defined multiple times.

The difference here is that, for the current double bracket syntax, this is completely novel looking and there is nothing a programmer has to go off of in its relation to other programming languages to remember what it means. If anything, double brackets looks like a templating/substitution syntax and not related to arrays. So, I'll understand for a minute what it means, but it's likely that I'll come back a week later and have forgotten, because novel patterns are harder to put into long term memory.

With the syntax I proposed, most programmers will understand the syntax has something to do with arrays. You are right that they may be confused when they see this syntax used multiple times ("hey wait, you can only initialize something once..."). But like I said before, table syntax will never be completely intuitive. A person will always need to go to the TOML reference guide to be certain about what the syntax means. At least with my proposal, the syntax conjures association with arrays and offers a foothold into remembering what it means.

I am only offering my outsider thoughts here and don't mean to get in a back and forth. I've put as much as I want into my argument and am quite OK if you think it has major holes or there's just no way to get around the syntax feeling "terrible". If you feel this way, I don't think it is a good use of your time beyond just saying "nope, terrible".

@mkerost Thanks for sharing your thoughts on this. I can certainly sympathize with any effort to make table arrays more approachable. I've never had to deal with arrays in PHP, so I'm not allergic to a postfix-[] syntax to introduce an array element, just so you know.

But the use of double brackets, in the context of the rest of TOML, does makes sense, and users can differentiate between single and double brackets. So I can no longer recommend making an effort to refine the existing syntax when it already does what it should be doing.

One objection you have is having to go back to the reference. I don't believe that users would go back to the reference to remember what double brackets do if they've seen them before. But they could. There's no shame in looking things up if they're not familiar. Just now I went to the spec and found the first instance of [[. It took me straight to the Array of Tables section. That's the essence of Obviousness. In fact, it's our job to make that spec so clear that once you've looked something up, it sticks. I'll come back to that.

The problem is, when dealing with more complicated concepts, we can only make things so clear. A complex data structure, to a newcomer, would need to be revisited from time to time to be fully understood, no matter what. With repeat exposure and with repeated usage, that complex form becomes commonplace, and the pain goes away. But that pain won't go away any faster if we switched to a different syntax. The current syntax can do this job alright. And if we keep hashing out new syntax to use for this complex concept when there's already sharply defined syntax for it then, well, all we're doing is bikeshedding.

I could be wrong. But arrays of tables can be described to users in a way that they can understand what they do and how they work. Maybe that's where we could use some help. If you've got some ideas for _describing_ table arrays more succinctly in the documentation, we'd love for you to share them with us. An alternate syntax won't help much, but an alternate description sure could.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pradyunsg picture pradyunsg  Â·  4Comments

clarfonthey picture clarfonthey  Â·  4Comments

emilmelnikov picture emilmelnikov  Â·  4Comments

keiichiiownsu12 picture keiichiiownsu12  Â·  4Comments

tamasfe picture tamasfe  Â·  3Comments