The only remaining idea from #292 that has not been decided upon and does not have a dedicated issue.
I mean, I don't know how much I like it myself but, hey, this needs discussion so, here's a dedicated issue for it.
[document]
title = "Hello!"
meta.charset = "utf-8"
Compare (this is a slightly modified example from the spec):
[[catalogue."Cash & Carry".fruit]]
name = "apple"
[catalogue."Cash & Carry".fruit.physical]
color = "red"
shape = "round"
[[catalogue."Cash & Carry".fruit.variety]]
name = "red delicious"
[[catalogue."Cash & Carry".fruit.variety]]
name = "granny smith"
[[catalogue."Cash & Carry".fruit]]
name = "banana"
[[catalogue."Cash & Carry".fruit.variety]]
name = "plantain"
versus
[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical.color = "red"
physical.shape = "round"
variety = [
{ name = "red delicious" },
{ name = "granny smith" },
]
[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
{ name = "plantain" },
]
First version is harder to read because it is cluttered with repeating (and absolutely meaningless) catalogue."Cash & Carry".fruit prefix.
I believe that proposed feature gives a huge boost in readability for complex, deeply-nested configurations.
Proposed feature enables intuitive syntax for some simple cases of array-of-tables issue #309
Thanks for a nice example @lmna. Also for @dstufft's example from #413:
[a]
value = 1
[a.b]
value = 2
[a.c]
value = 3
[a.c.d]
value = 4
[a.e]
value = 5
It becomes:
[a]
value = 1
b.value = 2
c.value = 3
c.d.value = 4
e.value = 5
Much nicer! ^>^
This could be a very nice and powerful addition to TOML. Let's go through a few ramifications to see if there are any traps.
This would allow any TOML document to be expressed without any bracket-style tables at all. The last example above could also be expressed as:
a.value = 1
a.b.value = 2
a.c.value = 3
a.c.d.value = 4
a.e.value = 5
More realistically, you'd be repeating longer key names. Perhaps something like this is better to see what that would feel like in reality:
3dprinter.extruder1.material = "PLA"
3dprinter.extruder1.temp.max = 242
3dprinter.extruder1.temp.min = 238
3dprinter.extruder1.temp.unit = "F"
3dprinter.extruder1.color = "red"
3dprinter.extruder1.feed_rate = "23"
The repetition becomes annoying in this case and it would be natural to switch to bracket tables to reduce that repetition, so I don't think that's a hit against the proposal.
To remain consistent with tables, we would need tables expressed this way to adhere to the same non-re-opening restriction. Thus, the following would be invalid:
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3 # INVALID - reopens table [a.b]
That's easy enough to say and enforce, no different than tables already behave.
@lmna is absolutely right in that this proposal could be used to work around the confusing quirks of array table syntax and clean those up, which would be very nice because that is indeed TOMLs least elegant bit. I'm guessing most situations could be represented cleanly with thoughtful use of "path keys" and inline tables. A big win for TOML.
I can't think of any big downsides. TOML remains unambiguous, as this is simply an alternate table syntax along with regular tables and inline tables. It's quite obvious what's going on and since "." is already forbidden in keys, would be backwards compatible with 0.4.0.
Perhaps one could argue that this addition would make TOML less minimal (OMG 3 ways to define tables!!!!), but it would help clean up some TOML docs that would otherwise be more verbose and less obvious, a tradeoff worth serious consideration.
Let me draw up a PR to see what this might look like in the spec/ABNF.
I can't think of any big downsides.
+1
Let me draw up a PR to see what this might look like in the spec/ABNF.
Maybe #446 would come into play here?
a.key = 1
unrelated-table.key = 1
a.b.key = 1
If the above is invalid, which it is IMO, so should it's table equivalent.
Aside, https://github.com/pradyunsg/toml/tree/dotted-keys. :)
@pradyunsg Ah, excellent, please submit as a PR, I didn't start on one yet.
This is a pretty neat idea! It may be helpful to take a look at where existing projects may use this to see what the impact could be perhaps? I'm personally most familiar with Cargo, so I'll stick with that :)
The first thing that comes to mind for Cargo is the [dependencies] section:
[dependencies]
libc = "0.2"
serde = { git = "https://github.com/serde-rs/serde" }
my-crate = { path = "path/to/my-crate", version = "0.2" }
Today I (and I think a number of others) like how dependencies tend to be easily scannable top to bottom, one line each. With this extension I could imagine some people may switch idioms to maybe do something (pessimistically) like:
[dependencies]
libc = "0.2"
serde.git = "https://github.com/serde-rs/serde"
my-crate.path = "path/to/my-crate"
my-crate.version = "0.2"
Readability-wise I think that unfortunately a conversion like this is a net-loss (subjectively at least). Scanning the dependency list it's not clear if "serde.git" is the name of a dependency or not, you'd have to have prior knowledge to mentally strip away after the . to know that the dependency name is "serde". Similarly for "my-crate" I think (personally) it looks a little worse as it's now spread over two lines.
Now that of course doesn't mean we shouldn't accept a change like this! This sounds very similar to the old inline tables discussion where some things can definitely get worse, yet many patterns get much better. I remember that way-back-when we basically designed the features of Cargo.toml around the syntax and features of TOML itself, and I'd suspect that most consumers of TOML would do similarly. I think that means for Cargo we wouldn't show examples and otherwise wouldn't recommend syntax like this in the [dependencies] section, and that would probably do us fine!
Now one place where I think Cargo could benefit greatly is the [profile] section:
[profile.dev]
opt-level = 1
[profile.release]
debug = true
lto = true
That I think actually looks better as:
[profile]
dev.opt-level = 1
release.debug = true
release.lto = true
So I do think there's possible areas for us to use this in Cargo!
Overall I'm 👍 on this feature, it seems like a natural extension of the [a.b.c] syntax in table headers and then, like before, the onus is on authors to leverage and recommend TOML patterns for "looking nice", which doesn't mean aggressively using or not using this, just where appropriate!
I think the biggest downside here is specifying when the table closes for modification. This proposal doesn't seem to make that clear.
For example, if we assume this is valid:
[profile]
dev.opt-level = 1
release.debug = true
release.lto = true
Is this also valid:
[profile]
release.debug = true
dev.opt-level = 1
release.lto = true
If it's valid, then why have tables close at all and if it's invalid, then how do you explain that to users effectively? The 2 current ways of specifying tables force locality when defining tables and do so in an obvious way. Exchanging key/value pairs within a table section never changes the validity of a file. In order to keep that invariant and add this functionality, you have to give up the locality of table definitions.
Note, the 'pro' side examples above could be written as:
[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
{ name = "red delicious" },
{ name = "granny smith" },
]
[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
{ name = "plantain" },
]
[profile]
dev = { opt-level = 1 }
release = { debug = true, lto = true }
The current specification seems to allow for reasonable readability while avoiding confusion and risking adding a feature without implementation experience.
There are two ways I can see addressing your concerns, @ahmedcharles:
The profile.release table is closed when profile is closed. The general rule would be that tables written with the dotted key syntax are closed when their enclosing table that is not written with dotted key syntax closes.
Require that all dotted key entries with the same prefix appear together, so the second example where dev.opt-level appears between release.debug and release.lto would be illegal. Then the profile.release table would be closed after seeing the last release. entry in the profile section.
The latter approach doesn't violate the principle that sorting a table should not affect its meaning or validity since sorting would keep dotted keys with the same prefix together. It would, however, mean that randomizing the order of key-value pairs could cause it to become illegal if it separates dotted keys with the same prefix. I'm not sure that's a problem though – I can see why being allowed to sort the keys is useful, I have a hard time seeing why randomizing the keys would be useful.
Note, the 'pro' side examples above could be written as:
[[catalogue."Cash & Carry".fruit]] name = "apple" physical = { color = "red", shape = "round" } variety = [ { name = "red delicious" }, { name = "granny smith" }, ]
I think it's key to note that this only looks reasonable because the keys and values in the physical table are quite short. If it was this instead, the inline table is less acceptable:
physical = { color = "redredredredredredredredredredredredredredredredredredredredredred", shape = "roundroundroundroundroundroundroundroundroundroundroundroundround" }
Of course, another solution would be to allow multiline inline tables, e.g.:
physical = {
color = "redredredredredredredredredredredredredredredredredredredredredred",
shape = "roundroundroundroundroundroundroundroundroundroundroundroundround"
}
I'm not sure if that's preferable to what's being proposed here, however. For example, it means that you can't scan through a section looking for ^\s*\w+\s*= and be sure that you're finding a key in that table since the shape = line for example looks like that but is actually an entry in a subtable. The physical.shape = syntax doesn't have that problem.
'Sorting' was the wrong word, I meant 'exchanging'. I think the property that keys can be shuffled within a section while retaining meaning is important, not because one wants to do that but because explaining the errors caused by not doing that no longer fits the definition of being simple. Saying that you can't duplicate section headers or key names is really simple by comparison.
Additionally, the motivation for restricting inline tables to a single line is explicitly because their intended use is for small, simple tables. Larger tables benefit less from inline syntax just as they would from the proposed path syntax. I.e. you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity.
The current spec has two properties:
This proposal forces a choice between those two properties, because you can't keep both.
@mojombo this should be reopened then. =)
Dotted keys have been merged, but we should still clarify when tables close.
you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity
Yep. Related values are to be put in the same table. And any forms of "table reopening" should be forbidden.
explaining the errors caused by not doing that no longer fits the definition of being simple
Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."
Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."
I suppose it depends on your definition of simple. Given what TOML strives to be, yes, this is far from simple, in my opinion.
The notion of "closing" a table applies to non-table assignments. Assigning sub- or super-tables is offered more latitude when standard table definitions are used. After all, it was in this context that this rule applies: "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it."
But what if the tables are defined with key-path notation? Or with inline notation, which raises similar questions? In other words, are these valid?
Key-path assignments and subtables
[profile]
dev.opt-level = 1
release.debug = true
release.lto = true
[profile.release.misc] # Is this section valid?
alpha = "A"
beta = "B"
Inline tables and subtables
[profile]
dev.opt-level = 1
release = {debug = true, lto = true}
[profile.release.misc] # Valid? Even though `profile.release` was defined inline?
alpha = "A"
beta = "B"
Inline tables and key-path assignments
[profile]
dev.opt-level = 1
release = {debug = true, lto = true}
release.misc.alpha = "A" # Can we define `profile.release.misc` this way?
release.misc.beta = "B" # Is this valid?
I think all three examples ought to be considered invalid. The first one visually breaks up the set of profile.release assignments. The others gunk up one-liner definitions, which should be kept short and succinct if used at all.
In order to keep things obvious and minimal, we may insist that the definitions of subtables be restricted on these two types of table definitions. Mainly:
These two proposed rules, along with the non-reopening restriction, ought to settle the issue of when tables are "closed," and can be extended to address table arrays.
I find the concept of "closing" a table quite difficult to grasp.
With the dotted key syntax, there are now so many different ways to navigate through tables, it makes it difficult to figure out when you are allowed to append to a table and when not.
If you want a concept of "closing", then why is this allowed?:
[a.b]
c = "a.b.c"
[a]
d = "a.d"
I feel that the concept of "a value can only be assigned once" is much easier to understand and should be sufficient. For primitives it's simple and arrays can be appended to anytime. You should be able to add new keys to a table anytime as well, as long a key has not been defined before.
The [a.b] and the [a] in the previous example can be interpreted as merely specifying a path creating referenced tables implicitly if needed. Once the key a is a table, it can't be assigned another value. However, it can be referenced and expanded again.
Another point that is not clear to me is how arrays are currently supposed to be handled. The the first part in the following example appears to be currently valid. When thinking in terms of paths, any key, included a dotted key, should reference the last element of an array. All versions below would be equivalent:
[[a.b]]
[a]
x0 = "a.x0"
[a.b.c]
d = "a.b[0].c.d"
[[a.b]]
[a]
x1 = "a.x1"
b.c.d = "a.b[1].c.d"
[[a.b]]
[a]
x2 = "a.x2"
[a.b]
c.d = "a.b[2].c.d"
[[a.b]]
[a]
x3 = "a.x3"
b = { c.d = "a.b[3].c.d" }
The only surprise is, that the [[]] syntax always creates a new element in an array and does not merely specify a path.
The conclusion to thinking in paths is, that the following should be valid as well:
[a]
b = "a.b"
[a.c]
c = "a.c.c"
[a] #currently not possible
c.d = "a.c.d"
[] #currently definitely not possible
a.d = "a.d"
The "assign a value only once" rule is easy to understand, the paths work consistently in all cases and should be equally simple to implement in parsers.
@falcon71 Let me address questions that you had in your examples. A second comment post will follow.
You asked why this was allowed.
[a.b]
c = "a.b.c"
[a]
d = "a.d"
The rules for opening and closing tables are more flexible for table and table-array values. The spec says "As long as a super-table hasn't been directly defined and hasn't defined a specific†key, you may still write to it." That's why you can write a and a.b in either order. This is valid TOML because nothing has been assigned to a yet, except for the table value a.b.
It is ugly. It needs to be sorted for legibility's sake. But it's legal.
†And I ought to put in a PR to re-write the rule in the spec, because "specific" isn't specific enough.
Table arrays are confusing enough as they are. Let me comment the code in your example, because something doesn't seem right about it. Not sure if you realize that each instance of [[a.b]] defines the next element of the table array.
[[a.b]] # Defines table array `a.b`, opens its FIRST element,...
# ...and leaves it empty?
[a] # Opens the table `a`, which already holds the array `a.b`
x0 = "a.x0" # (that's right)
[a.b.c] # Opens a new table `c` in the first element of `a.b`.
d = "a.b[0].c.d" # (that's right)
[[a.b]] # Opens SECOND element of table array `a.b`,...
# ...and leaves it empty?
[a] # INVALID AT THIS POINT. `a` was already defined above.
# Like I said, I'll address your central point in another post.
#...
Does this example clear up how the table array a.b works?
[a] # There's only one table `a`.
x0 = "a.x0"
x1 = "a.x1"
x2 = "a.x2"
x3 = "a.x3"
[[a.b]] # FIRST element of table array `a.b` (index 0, from your POV)
y0 = "a.b[0].y0"
[a.b.c] # This is `c` in FIRST element. `a.b.c` is implicitly `a.b[0].c`.
d = "a.b[0].c.d"
[[a.b]] # SECOND element (index 1)
y1 = "a.b[1].y1"
c.d = "a.b[1].c.d" # We're already in `a.b[1]`.
[[a.b]] # THIRD element (index 2)
y2 = "a.b[2].y2"
c.d = "a.b[2].c.d"
[[a.b]] # FOURTH element (index 3)
y3 = "a.b[3].y3"
c.d = "a.b[3].c.d"
@falcon71 As much as I can appreciate a general "assign a value only once" rule, I think that it would not work in TOML.
A human-readable configuration format does require some restrictions on how flexible it can be, in order to preserve readability. Key paths were introduced for that purpose. Using them improperly could lead to unreadable files, though.
I would prefer that all non-table basic-type assignments in a table be kept in the same place. Note that we have precedent for this. Say we configure a nested table x.a like this:
[x.a]
b = 1
[x.a] # INVALID: The table `x.a` was already defined.
c = 2
We didn't _re-assign_ anything to x.a, but that doesn't matter. The second [x.a] is considered a _re-definition_ of x.a. This has the nice effect of keeping all non-table values in x.a defined in one place, the standard section [x.a]. And it places no limitations on any later-defined subtables, or on the supertable x.
I previously recommended that all inline table assignments be closed to both new basic values _and_ subtables, to keep inline tables entirely self-contained. I stand by that recommendation. Key paths and standard subtable definitions should not touch inline tables.
@mojombo's past statement implies that a table whose basic values are assigned using key-path notation must necessarily have all such assignments grouped together, even if subtables and supertables are defined elsewhere.
But I also recommended that standard table notation should not be used to add subtables to tables defined by key-path assignments. The existing rules close off new basic value additions to key-path-defined tables once they are no longer being referenced, and my recommendation closes off new subtables in the same context.
For the sake of error reporting, all of this put together implies that each table in the configuration is defined in one continuous set of lines. An error message can thus state that "Line N invalid; table x.y.z was defined in lines A-Z." The user can take this hint and transfer line N's contents in between lines A and Z inclusive. For subtable restrictions, a similar message can be provided. Parsers would need to keep track of which lines defined which tables, but each table would always be a continuous range.
Thank you for your answers.
Yes, you are right, my proposal focused on implementation simplicity without providing any value for human users apart from obfuscation.
Based on my understanding of your rules, the following would be a valid toml?:
[a.b] #closes empty, opens a.b
c = "a.b.c"
[a] #closes a.b, opens a
#b.d = "a.b.d" #invalid, a.b is already closed
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
d.e "a.d.e" #closes a.b.d, opens a.d
#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
d.e = "a.d.e"
[[b.a]] #closes a.d, opens b.a[0]
a = "b.a[0].a"
[b] # closes b.a[0], opens b
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c
[[b.a]] #closes b.a[0].c, opens b.a[1]
[a.x] #closes b.a[1], opens a.x
Let me start by noting that you could have more than one table open at a time. Two tables can be open at one time if you are using dotted keys. With inline tables, you may have several tables open, if only briefly.
What I have in mind is a hierarchy of the definition styles. Sections contain bare keys, quoted keys, and groups of dotted-key-defined tables. They all can contain inline subtables for values, which may also contain dotted keys in inline subtables.
More explicitly:
[] and [[]], close off basic assignments to the root table. [a] # This opens the table `a` inside the root.
a1 = "a.a1"
b.c = "a.b.c" # This is the only assignment, basic or otherwise, to `a.b`.
a2 = "a.a2" # This is valid, and closes the table `a.b`.
#b.z = "a.b.z" # INVALID
This is getting very elaborate. But I think it's been an enlightening process so far, and I hope you think so too.
## Here's your original code.
## My comments are double-hashed and refer to prior lines.
[a.b] #closes empty, opens a.b
## Yes. The root table can only accept subtables and subtable arrays from
## this point forward. The section table `a.b` is opened.
c = "a.b.c"
[a] #closes a.b, opens a
## Yes, exactly. Subtables of `a.b` may later be defined.
#b.d = "a.b.d" #invalid, a.b is already closed
## That's right.
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
## No; section `[a]` keeps table `a` open.
## But Yes; the dotted keys open `a.b.d` here.
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
## No; section `[a]` keeps the table `a` open.
## The missing key path would have closed `a.b.d`.
## But since this is commented out, let's move on.
d.e "a.d.e" #closes a.b.d, opens a.d
## INVALID, because you forgot the "=" sign!
#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
## The dotted keys close the table `a.b.d` and open `a.d` here.
## The inline table value opens and closes `a.d.f` on a single line.
## `a` is still open for basic assignments.
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
## Not exactly; the table `a.d.f` is already closed.
d.e = "a.d.e"
## We're at a new section header.
## Open dotted-key tables (`a.d`) are closed.
## The old section table (`a`) is closed. `a` may have subtables defined later.
[[b.a]] #closes a.d, opens b.a[0]
## The section does open `b.a[0]`. But `a.d` was already closed.
## (TOML doesn't guarantee 0-indexing, but I get what you mean.)
a = "b.a[0].a"
[b] # closes b.a[0], opens b
## It is very strange to open a table after opening the first element of an
## array of tables within it. But it's valid.
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
## Yes.
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c
## The table `b` isn't closed until the next section header.
## But the key-path table `b.a[0].c` is opened
[[b.a]] #closes b.a[0].c, opens b.a[1]
## The table `b.a[0].c` is closed first, then `b` is closed.
## But Yes, the table `b.a[1]` is opened.
[a.x] #closes b.a[1], opens a.x
## Yes, that's right.
## At EOF, `a.x` is closed, and the root table is closed.
Thank you for taking your time to annotate the example. I indeed find this very enlightening.
The root table can only be accessed between BOF and the first table or arraytable declaration, so I think it can be treated like a normal table declaration (think []).
You would allow this:
a = "a"
b.c = "b.c"
d = "d" #valid, root is still open
#my interpretation of only allowing a single open table would have forbidden this
If I understand you correctly, you would keep track of three open tables:
This would lead to the following being invalid, which might seem confusing:
a.a = "a.a" #opens a, root still open
a.b.c = "a.b.c" #closes a, opens a.b
#a.c = "a.c" #invalid, a already closed
If this was to be allowed, then an arbitrary number of tables would need to be kept open for dotted keys and inline tables with dotted keys (I assume the rules would be exactly the same for inline tables. The order would matter as well).
In any case, while these rules might work, I find them quite far from being "obvious" like the previous rules before dotted keys were introduced. They could simply be remembered as "don't assign [table] twice". Now users will be busy rearranging keys until the parser accepts the file, because sometimes keys need to be grouped together, except for when they don't.
Thanks, I appreciate the feedback. I think you've got the concepts down pat. Both of your examples meet the table scoping standards that I have in mind. And you indeed can treat the top section like a normal table declaration, in which the only root-level basic assignments can be made. That's come up in past discussions, specifically in #456.
The open table tracking is a little more complicated. There'd only ever be one section table open (including root), and at most one dotted-keys table inside that. But inline tables can contain smaller inline tables, and now they can include key-path assignments, too. So an arbitrary number of nested tables could be opened up! Fortunately, line lengths for inline tables tend to be short, and if they're not, they can be expanded into sections or dotted-key assignments.
In this very Douglas Adams-y example, tables are nested seven layers deep. And during parsing, three tables are opened all at once on a single line.
pan.galactic.gargle.blaster = {large.gold.brick = {name="slice", type="lemon"}, quantity = 2}
At 93 characters, that is definitely abusing inline tables something fierce. But it's still legal.
The use of key paths in TOML is fully intended to make configs more readable. All these rules we've been discussing is intended to prevent config files from being more complicated than they need to be.
To the end user, the scoping principles are still very straightforward:
So we haven't strayed too far from the realm of the obvious, or the minimal.
I like the adage "Don't assign tables twice." That's a good way to state it!
We know that we're not going to break (almost) any valid v0.4 TOML document in v1.0. So, it makes sense to just extend the current rules onto the dotted syntax.
What are the "scoping" rules for a valid TOML file in v0.4?
As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it.
[snip]
You cannot define any key or table more than once. Doing so is invalid.
How do we extend those rules so that a valid v0.4 file stays a valid v1.0 file?
Simply moving these statements to be in the "Dotted Keys" section and referencing them from table definitions should be good enough.
Simply moving these statements to be in the "Dotted Keys" section and referencing them from table definitions should be good enough.
@eksortso @falcon71 I think you've arrived to something similar to https://github.com/toml-lang/toml/issues/446#issuecomment-388243460 here?
Dotted keys have been merged, but we should still clarify when tables close.
So, personally, I think we can have relaxed rules here which we can then tighten up if they cause issues. As such, I don't think we're not going to introduce any restrictions on closing of tables until after 1.0 -- the intention is to keep 1.0 backwards compatible with 0.4.
Since 0.4 doesn't included dotted keys, any rule that doesn't affect TOML files that don't have any undotted keys would not cause 1.0 to become incompatible with 0.4. On the other hand, changing the validity rules in a 1.x version would be a violation of SemVer, would it not?
any rule that doesn't affect TOML files that don't have any undotted keys would not cause 1.0 to become incompatible with 0.4.
Yeps. I did think of restricting only dotted keys but I felt it would be a little unintuitive to me - having different rules governing two equivalent ways to specify keys.
On the other hand, changing the validity rules in a 1.x version would be a violation of SemVer, would it not?
Yes. It'd need a major version bump.
Closing this since I feel this has been resolved. Any additional discussion on restricting keys just becomes #446.
Reopened as https://github.com/toml-lang/toml/issues/446#issuecomment-395405344 is a compelling argument to have restrictions on dotted keys.
I there is consent that it makes sense to restrict the order of dotted keys, I'd propose to add this sentence (or similar) to the spec:
All dotted keys that define a subtable must be placed together.
And to give an example such as:
mainkey1 = '...'
subtable1.key1 = '...'
subtable1.key2 = '...'
mainkey2 = '...'
#subtable1.key3 = '...' # NOT ALLOWED, 'subtable1' keys must be kept together
subtable2.key1 = '...'
subtable2.subsub.key1 = '...'
subtable2.subsub.key2 = '...'
subtable2.key2 = '...'
#subtable2.subsub.key3 = '...' # NOT ALLOWED, 'subtable2.subsub' keys must be
# kept together
mainkey3 = '...'
@ChristianSi Your newest proposal defines a sensible restriction to dotted keys. But could you clarify the following? Sub- and super-tables may be defined in any order. So I'm assuming the following would be valid. Is that correct?
mainkey1 = '...'
subtable1.key1 = '...'
subtable1.key2 = '...'
mainkey2 = '...'
[subtable1.plainsubsubtable] # This is legal, right?
#key1 = '...' #etc.
Also, I'd asked the same thing about subtables of inline table defined outside the inline value expression in #446, but didn't get any response. Would this definitely be correct?
# Valid in TOML v0.4
mainkey1 = '...'
subtable1 = {key1 = '...', key2 = '...'}
mainkey2 = '...'
[subtable1.plainsubsubtable] # This then would be legal, too?
#key1 = '...' #etc.
@eksortso The proposed restriction only applies to subtables defined in the form of dotted keys, so yes to your first question.
The answer to your second question is also yes, according to my understanding of the TOML spec.
It might help to be explicit about what the purpose of restrictions is. Are they intended to make it easier to implement a TOML parser with a fixed/predictable amount of state?
Are there any rules about how dotted keys interact with normal tables or inline tables? Some examples:
Example 1:
profile.release.opt-level = 3
[profile.dev] # I assume this should be invalid?
opt-level = 1
Example 2:
[profile]
release = {opt-level = 3}
release.debug = true # Is this OK?
Tables can be defined in any order, but any given table can only be defined once.
Example 1 actually is perfectly valid. It defines the table profile.release in full, using dotted key paths, before defining the table profile.dev.
It's equivalent to this:
# Same as Example 1
[profile.release]
opt-level = 3
[profile.dev]
opt-level = 1
Example 2, however, is not valid, because a table is defined twice. Since profile.release is first defined using an inline table, you can't use a dotted key path to go back in and define more non-table values inside of profile.release. You could use all dotted keys, or a single inline table, or even a new header, to define the contents of profile.release. But you can only choose one of these forms.
This would be valid:
#Example 2, with dotted key paths
[profile]
release.opt-level = 3
release.debug = true
And this would be valid.
#Example 2, with an inline table
[profile]
release = {opt-level = 3, debug = true}
There are a few other valid forms, and you can use whichever form works best for your configuration.
@eksortso Why would header-tables be allowed to extend an existing table, but inline or dotted keys not be allowed? Going from an example above:
subtable1 = {key1 = '...', key2 = '...'}
[subtable1.plainsubsubtable] # Why is this OK to modify `subtable1`?
key1 = '...' #etc.
Compared to:
subtable1 = {key1 = '...', key2 = '...'}
subtable1.plainsubsubtable.key1 = '...' # Why is this not OK to modify `subtable1`?
I've tried a few different parsers on the first example. Some allow it, some don't, it's somewhat inconsistent.
My preference would be that inline tables should not be allowed to be extended by any means (dotted keys or headers).
@ehuss In example 1, the first line defines subtable1, using an inline table. The second line begins the definition of subtable1.plainsubtable, using standard table syntax. Two different tables, defined using two different syntaxes. That is perfectly fine.
And the type of syntax doesn't matter either. Example 2 is also perfectly fine. There's something important that you've missed here. Namely, the second line begins the definition of the subtable subtable1.plainsubtable.
That's basically how dotted keys work. In defining subtable1.plainsubtable.key1, we're defining its parent table subtable1.plainsubtable and putting key/value pairs into it directly. Again, two different tables, two different syntaxes.
But if example 2 had instead looked like the following, the line with subtable1.key3 would violate the rule that a table may only be defined once (in this case, with an inline table).
# Notice the difference in the last line.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = '...' # INVALID, because subtable1 is already defined.
However, this is perfectly fine:
# Now assign a table instead of a scalar value.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = {} # Valid, because subtable1.key3 is a newly-defined subtable.
TOML v0.5-compliant parsers _must_ parse both of your examples, they _must_ fail the first example I provided above, and they _must_ parse my second example. A few more tests may be in order.
I share your preference in limiting inline tables, at least regarding their subtables. Previously, I'd said that key paths and standard subtable definitions should not touch inline tables. I still stand by this recommendation as a sensible stylistic choice. But in the interest of limiting restrictions, I wouldn't require it in the standard.
But if example 2 had instead looked like the following, the line with subtable1.key3 would violate the rule that a table may only be defined once (in this case, with an inline table).
# Notice the difference in the last line.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = '...' # INVALID, because subtable1 is already defined.
@eksortso, I do not think that it violates the rule, that a table may only be defined once. You are not defining the same table twice in the second line of the example above, you are simply continue adding keys to it. It is explicitly allowed in the spec:
As long as a key hasn't been directly defined, you may still write to it and to names within it.
subtable1.key3 has not been directly defined, so you certainly can write to it. And since you are not defining any table second time here, this is also, according to the spec not a problem.
Logically a TOML v0.5-compliant parser must parse this example too. Otherwise it would be non-complaint.
Moreover, contrary to what was said above, according to the spec as written the following is also perfectly valid:
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3
If it's not the intention of the spec, it needs to be reworded to express that, because currently it does not.
Thanks for correcting me, @AndrewSav. I've not tried every parser, but the ones I have tried agree with your interpretation of the spec.
I guess at one point I'd had in mind the notion that tables are defined in one, and only one, distinct location within a config file. With just section headers, this was obvious. But now that dotted keys can be used to inject key/value pairs into any subtable outside of its section, that distinction guarantee is now gone.
I think that @ChristianSi was attempting to preserve it with his proposed restriction. And from a stylistic perspective, I agree that dotted keys into the same table should be kept together. I would make a config file with that principle in mind. But I'm waffling on the restriction.
So in the interest of retaining the demonstrated flexibility, I'm afraid that the proposed restriction ought to be dropped. I've made the argument elsewhere that all the latitude that the current (and future) syntax allows could be abused, but in practice, some good configuration templates would prevent abuse from spreading.
At this point, no changes to the spec (i.e. README.md, for now anyway) need to be made.
@eksortso @AndrewSav I believe you are both wrong and using dotted keys to "inject" key/value pairs into tables defined elsewhere is prohibited. But I admit that the spec is not quite clear on this point.
If we have
subtable1 = {key1 = '...', key2 = '...'}
then I would interpret
subtable1.key3 = '...' # INVALID, because subtable1 is already defined.
as an attempt to define subtable1 again which is in violation of the spec as it currently stands. So yes, it should be INVALID!
Likewise, if [outertable.subtable1] was defined as a regular table instead of an inline table, then having the dotted key subtable1.key3 in the [outertable] section would be an attempt to define outertable.subtable1 a second time, which is not allowed. Conceptually, the inline table syntax is one way of defining a table, while using one or several dotted subtable1.keyX entries within one other regular table is a second way, and writing it as a regular table with its own [outertable.subtable1] header is a third way. Each of these ways is perfect fine on its own, but trying to combine two or more of them for the same table is not allowed because of the "you cannot define any table twice" rule.
If these restrictions did not exist, then the current restriction "you cannot define any table more than once. Doing so is invalid." would be in shambles, since nobody upon seeing either a regular table header or an inline table could have the slightest idea whether what they see in that place is a complete definition of that table or whether some of its key/value pairs are injected from anywhere else. I'm 99+ percent sure that that's against the spirit of TOML, but I admit that the current wording in the spec is not quite clear.
@mojombo @pradyunsg What's your stance?
(Note that all of this only applies to simple key/value pairs. If [outertable.subtable1] is a regular table, then defining [outertable.subtable1.subsubtable] elsewhere is completely normal, and above I argued that the same probably still applies if outertable.subtable1 = { ... } is defined using inline table syntax. Conceptually, nested tables are not really "in" the supertable in the same way as simple key/value pairs live within a specific table, they just take a parent table name and extend it.)
I think parsers must be allowed to extend existing tables with dotted keys. It has a beautiful symmetry which I think is important, and helps reason about TOML docs, both from a parsing point of view, and just reading one. To put to words what I mean by "symmetry": the way I think of TOML as a format is basically as a more readable way of writing a flat namespace of keys/value. In other words, given the following document:
[myapp]
debug = true
[myapp.logger]
level = info
format = "[$level] $message"
[myapp.listeners]
http = { type = "http", port = 8080, host = "localhost" }
myapp.listeners.https = { type = "https", port = 8443, host = "localhost" }
It is semantically equivalent to the following document, and vice versa:
myapp.debug = true
myapp.logger.level = info
myapp.logger.format = "[$level] $message"
myapp.listeners.http.type = "http"
myapp.listeners.http.port = 8080
myapp.listeners.http.host = "localhost"
myapp.listeners.https.type = "https"
myapp.listeners.https.port = 8443
myapp.listeners.https.host = "localhost"
If you disallow extending already defined tables via dotted keys, this symmetry is lost. Note that in both cases it is not allowed to redefine keys which are already defined, only to extend the global "table"/namespace (as a way of visualizing it, consider the global namespace to be _, and all keys defined in a document as being children of that key, like _.myapp.debug = true).
Doing this also provides a way for one to have "imports" which can be treated as the same logical document, allowing one to have a base config file, and extend it per-environment or whatever. Obviously we don't have imports in TOML right now, but in theory one could implement that in application code without the need for explicit TOML support, and without violating the spec, simply by restricting imports to only extending the base document with new keys.
Anyway, that's my two cents, and the parser I wrote for Elixir is designed based on that interpretation - I suspect I'm not the only one to have done so.
@eksortso @AndrewSav I believe you are both wrong and using dotted keys to "inject" key/value pairs into tables defined elsewhere is prohibited. But I admit that the spec is not quite clear on this point.
Whether I'm wrong depends on which date my comment was written, to be fair. My current stance from 3 days ago is wrong by your measuring, @ChristianSi, but I may be persuaded to accept yours with more debate.
That said, I need to highlight a developing problem. Some parsers use the loose interpretation. They're unlikely to change now, since they claim that TOML documents already written might break on their parser if they change their behavior to the correct standard.
Since TOML v1.0 is intended to be backward compatible with the last v0.* release, laying down the law now will require another v0.* release, to which conforming parsers must adhere. Will strong table rules be defined in direct terms in v0.6? Will a run-up to v1.0 include a statement explicitly permitting looser rules? Or will strong rules be dropped into v1.0 with some advanced warning to non-conforming projects?
@bitwalker I don't understand your example. Since your myapp.listeners.https inline table sits under the [myapp.listeners] table header, the full paths of the keys it actually defines are myapp.listeners.myapp.listeners.https.type etc.
To actually get the flattened structure you suggest, the last part of your example would simply have to read:
[myapp.listeners]
http = { type = "http", port = 8080, host = "localhost" }
https = { type = "https", port = 8443, host = "localhost" }
Nicely symmetric, but no need to inject anything anywhere.
@eksortso In my understanding, "the loose interpretation" is a misinterpretation of the spec, in other words: a bug. The spec is admittedly not completely innocent, since the current wording allows different interpretations. I would therefore suggest a bug-fix release (v.0.5.1) that clarifies the meaning of the spec. No need to bump the version number to 0.6, since no new features are introduced (nor is old, non-buggy behavior disallowed).
Non-compliant parsers would then have to be updated to conform to the spec. Since dotted keys are pretty new (they did not exist before 0.5) and since I don't expect many (if any) documents to use "key injection" in the first place (what would be the point, except to confuse your readers?), I'd expect that to be a relatively painless process.
But it would certainly be a good idea to publish a spec update soon-ishly rather than to wait for a few years before the next version bump takes place. (Considering the pace of new TOML versions in the past, I am a bit worried in that regard.)
@ChristianSi Sorry, I wrote that example off the cuff, all I intended with the last line was to show different types of table usage in the example, but it ended up making it confusing :(
TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table.
This is the declared goal of TOML. The three key principles there, "minimal", "obvious semantics", and "maps unambiguously to a hash table" are what we should judge all questions by. In my opinion, we need the following:
I think that core principle should be (and is already to some degree): TOML is a syntax for representing a namespace of keys and values; keys are unique and there are no features which cannot be mapped to a flattened representation, where flattened representation is in dotted-key form, e.g. foo.bar = "baz". The syntax of TOML is composable, such that any combination of forms is allowed (within the syntax rules for those forms) and can always be unambiguously mapped back to the flattened representation.
Given that principle, we can define the following simple set of rules:
[foo.bar]) does not define foo.bar, it sets the context for keys following the declaration up until the next table declaration, or EOF. Given this:
[foo.bar]
baz = "true"
foo.bar.baz = "true", _not_:
foo.bar = {}
foo.bar.baz = "true"
Any combination of table declarations, inline tables, and dotted keys is permitted, as long as a key is never defined twice, so the following is allowed:
[foo.bar]
baz = { truthy = true }
baz.falsey = false
[foo]
bar.qux = "whatever"
[foo]
bar.quux = "something else"
and unambiguously maps to:
foo.bar.baz.truthy = true
foo.bar.baz.falsey = false
foo.bar.qux = "whatever"
foo.bar.quux = "something else"
Furthermore, this gives us a framework from which to reason about table arrays, in other words, rather than it being an exception to the rule of redefinition, it is simpler to think of each element declaration as defining a new "hidden" key, based on the order of appearance, which a conforming parser infers from the syntax, e.g.
[[products]]
name = "Hammer"
sku = 738594937
[[products]]
[[products]]
name = "Nail"
sku = 284758393
color = "gray"
could be flattened like so:
products._0.name = "Hammer"
products._0.sku = 738594937
products._1 = {}
products._2.name = "Nail"
products._2.sku = 284758393
products._2.color = "gray"
The key guiding principle here is symmetry/composability. We have various forms of syntax for tables and their keys, so it should be possible to combine them in different ways but retain _one_ semantic model. Prohibiting redefinition of keys is distinct from the question of whether you can "reopen" some part of the keyspace to add new keys.
@eksortso In my understanding, "the loose interpretation" is a misinterpretation of the spec, in other words: a bug. The spec is admittedly not completely innocent, since the current wording allows different interpretations.
What is the guiding principle behind your interpretation? I ask because adding more rules without some unifying principle is not user-friendly; it makes documents harder to write, harder to read, and parsing more complex and thus more error-prone. I agree it is good to add more rules when they provide clarity in the context of some guiding principle/intuition; but it is not necessarily desirable when the clarity comes at the expense of violating ones intuition. If readers of the spec are told to internalize a few simple principles ("TOML is a syntax for representing a namespace of keys", and "it is not allowed to redefine keys"), it gives them an easy way to test if some interpretation of a rule is correct ("does interpretation A imply I can redefine something?") and therefore easy to understand.
Conformity to a single way of doing things _is_ desirable in many cases, it is one reason why code formatters in languages like Go are so nice - when there is a lot of complexity, knowing that some things will always looks the same orients you in a new context. That said, TOML is a simple format, there isn't enough complexity to justify forcing that kind of conformity, particularly when it has the potential to roadblock further improvements down the line (such as imports), or put you in a corner with regards to ambiguity. I think it is also a bad sign when there isn't any justification which explains why some rule exists which ties back to the core principles, only that some usage pattern is prohibited arbitrarily (i.e. based on preferences).
In any case, I think it is important for the maintainers of TOML to decide what mental model should be driving these design decisions (or at least make it prominent, if they have already made it known elsewhere), and then consider these questions in that framework. From what I've seen, things are too abstract (i.e. "maps unambiguously to a hash table" is not specific enough, what is the "core", or simplest possible, representation; how do features relate to that representation), and that drives both ambiguity in the spec, and different interpretations of how features should be implemented/represented, because everyone has a different mental model.
I've come back around to the strict interpretation of table definition that @ChristianSi claims is the true one. And as @bitwalker calls for, the mental framework needs to be made clear in the documentation.
The turning point came with some example code posted before, which we must address, but for different reasons. Here is the code:
foo.bar = {}
foo.bar.baz = "true"
The first line foo.bar = {} means that foo.bar is an empty table. The second line makes foo.bar non-empty. The loose interpretation says that this is fine; it just means foo.bar = {baz = "true"}. The strict interpretation considers this code invalid, because foo.bar is defined in whole in the first line and foo.bar.baz in the second line is not a new subtable but a key/value injection.
The strict interpretation is the correct one.
I tried, best as I could, to articulate the mental rules of this interpretation prior to October 25th in my previous comments. Following are the rules as best as I can express them, and we'll need examples to illustrate them all.
Did I miss anything? Could any of these be written better?
@eksortso I think you missed something:
In your reasoning, you stated foo.bar = {} cannot be extended with foo.bar.baz = true, but _could_ be extended with foo.bar.baz = {} (rule 4 in your list, but also rule 1, and is implied by the fact that one can define subtables with [foo.bar.baz]. This is a contradiction with your third paragraph (i.e. the issue you had with my original example), as under the rules you provided (which is inconsistent with regards to keys, but we'll get to that), you are still allowed the following:
foo.bar = {}
foo.bar.baz = true
As stated by the combination of rules 2, 3 and 5. You also have other conflicting rules (3 with 1, 5 with 1). Most importantly though, the rules listed here do not appear to be based around any core mental model from which to reason about them - there is no symmetry or composability to the rules, and it seems primarily motivated by restricting how a table is defined. The mental framework I keep asking for isn't just about defining a list of rules, they have to be based on something, and that _something_ is the framework that I care about.
I think it bears stressing, _tables are not important_, they are not relevant to the ultimate representation of a TOML document (a hash table) _except_ where they indicate the presence of a nested table, which only matters because they are intended to hold more keys, after all, that is what a hash table does. TOML is a convenience format for representing a nested key/value structure, so it ultimately doesn't matter what order tables are defined in, the very presence of a key indicates that the path to that key must consist of tables and nothing else, otherwise there is redefinition. Table declarations should be considered a way to reduce the verbosity of defining nested keys, since they remove the need to prepend the full path to the key for all subsequent keys, but that's it. Anything else and we already have contradictions with the features that exist in the syntax, which is supposed to be one of the things TOML is free of.
While I don't really care what the specific mental model, I have to second @bitwalker's desire for _some_ consistent mental model. Otherwise the set of rules that gets cobbled together will end up being ad hoc and arbitrary. If it's based on some mental model, whatever that is, then it will be consistent. It can be a strict mental model or a loose one—as long as there is one.
Regarding the
foo.bar = {}
foo.bar.baz = "true"
issue, it seems to me that all values given as the right hand side (RHS) of an = in TOML are generally immutable and not extensible: integers, strings, floats. If you want to build up a structure incrementally, then you generally have to use the structure of sections and key-values pairs to do so. Mixing a presumably immutable RHS value specification with subsequent incremental addition to either tables or arrays seems pretty dicey to me.
it seems to me that all values given as the right hand side (RHS) of an = in TOML are generally immutable and not extensible: integers, strings, floats. If you want to build up a structure incrementally, then you generally have to use the structure of sections and key-values pairs to do so.
I agree, with the exception of tables, as the spec already allows for the following:
a.b.c = "c"
a.c.d = "b"
[foo]
[bar]
thing = true
[bar.baz]
something_else = false
If, as the spec implies, a.b.c = "d" is equivalent to:
[a.b]
c = "d"
# or
[a]
b = { c = "d" }
And if, as the spec implies, [foo] is equivalent to foo = {}, then it follows that these properties must be true:
1.) If a dotted key's path includes components which are not yet defined, they are implicitly defined to be tables
2.) If a dotted key's path includes components which are defined, and are tables, then that table is _extended_ with the new key, not redefined
3.) If a dotted key's path includes components which are defined, and are _not_ tables, then it is obviously an attempt at redefinition, and therefore an error
4.) Declaring a table with [a.b] defines a.b to be a table if it doesn't exist, but if it does, and the declaration has child keys (such as c = "d"), then it must be equivalent to the dotted-key representation, i.e. a.b.c = "d"; in other words, implicitly creating the table if it doesn't exist, otherwise extending that table with new keys.
The central problem here is with the fact that the spec is contradictory with regards to dotted keys and their equivalence to the table syntax. On the one hand it has examples which demonstrate 1-3 to be true. On the other, in the table section, it says that the following is invalid:
[a]
b = "c"
[a]
c = "b"
This disallows 4. However, it says the following conflicting example is valid:
physical.color = "orange"
physical.shape = "round"
This is a contradiction, and my take is that it is not possible for one of these examples to be invalid without throwing out any sensible mental model with which to reason about the syntax of tables and dotted keys.
I would agree with @ChristianSi and @eksortso if the dotted key syntax didn't exist as it is defined, but since it does, it just simply doesn't make sense to have this awkward asymmetry to the syntax, where depending on which syntax you use, something is invalid using one syntax, but valid using the other. But as you said, we need a mental model to refer back to in order to decide what makes sense and what doesn't - without that, I'm not sure we can arrive at any kind of consensus.
Let me post this now, because I couldn't finish reading your first response without choking.
@eksortso I think you missed something:
In your reasoning, you stated
foo.bar = {}cannot be extended withfoo.bar.baz = true, but _could_ be extended withfoo.bar.baz = {}(rule 4 in your list, but also rule 1, and is implied by the fact that one can define subtables with[foo.bar.baz].
That's precisely what I'm saying, yes.
This is a contradiction with your third paragraph (i.e. the issue you had with my original example), as under the rules you provided (which is inconsistent with regards to keys, but we'll get to that), you are still allowed the following:
foo.bar = {} foo.bar.baz = true
I am _not_ allowing this. I _explicitly_ stated that this code (using "true" instead of true) is invalid. Reread what I wrote about strict interpretation. foo.bar.baz = true is attempting to assign a key/value pair in foo.bar, which is a redefinition of that table, because the first line foo.bar = {} says that foo.bar _contains no key/value pairs_ at its level, _period._ There could be KVPs infoo.bar's subtables, but not in foo.bar itself.
As stated by the combination of rules 2, 3 and 5. You also have other conflicting rules (3 with 1, 5 with 1).
Provide examples of these contradictions, if they exist. We'll get to the bottom of this.
Despite my tone, I do think we share common ground. But I'm dissuaded from reading further into the conversation. I'll follow up in a few hours.
@eksortso Sorry if I came across as attacking you, that's not at all my intent, I absolutely respect your opinion!
I am not allowing this. I explicitly stated that this code (using "true" instead of true) is invalid.
What I meant is that your rules, in the list, _do_ allow it, or at least do not disallow it. You stated the following about strict interpretation:
The strict interpretation considers this code invalid, because foo.bar is defined in whole in the first line and foo.bar.baz in the second line is not a new subtable but a key/value injection.
This implies that if the second line is foo.bar.baz = {}, then it would be allowed, as it is an exception to the strict interpretation as you've stated it. That's also supported by this rule:
- A subtable, or an array of tables, may be defined within the parent table's definition either explicitly (with inline tables) or implicitly (with dotted keys). This allows a subtable to be defined in the middle of its parent.
Assuming that was intentional, it is very surprising that one thing is allowed but the other is not using the same syntax.
I'm also not sure why inline tables are implied to be different than regular tables, there is nothing in the spec that indicates that there is a difference in what they represent, or even that setting keys in a table which was previously defined via dotted keys is disallowed (see the physical.* example I referenced in my last comment).
... the first line foo.bar = {} says that foo.bar contains no key/value pairs at its level, period
Is there something in the spec which supports this? I don't think it says that; but I'm not sure _what_ it says exactly, because the spec is ambiguous about it as far as I can tell. We only have various examples of behavior around tables and dotted keys from which to infer what it may mean.
Provide examples of these contradictions, if they exist. We'll get to the bottom of this.
I'd appreciate it if you'd assume that I'm arguing in good faith, and not that I'm making stuff up; in any case, the following rules are the ones I was referring to:
1.) Tables may be defined in any order. This includes subtables and supertables.
A table is defined when its key/value pairs are explicitly stated.
3.) Key/value pairs of a table are fully specified within one single continuous range of lines, though subtables may also be defined within that range.
5.) A subtable, or an array of tables, may be defined within the parent table's definition either explicitly (with inline tables) or implicitly (with dotted keys). This allows a subtable to be defined in the middle of its parent.
1 and 3, and 1 and 5 are in conflict. Namely, 1 says tables, including subtables/supertables, may be defined in any order, so it follows that subtables can be defined before their parent tables. 1 and 5 are in conflict, or at least 5 is partially redundant, because 5 implies that subtables may only be defined within a parent tables definition, but 1 says they may be defined in any order. A subtable definition implies defining a new key within its parent table, so either 1 or 3 cannot be true, as a subtable defined before a parent table means that the key/value pairs for a table are not necessarily all fully specified within one single continuous range of lines.
If I misunderstood something, definitely let me know; but I think some of these rules are in conflict, or are ambiguous, or there are additional rules missing required to clarify. In any case, my problem with the rules has more to do with the fact that I don't know how to identify what is correct behavior vs what is not, because there is no model from which to reason about them. I agree we all have common ground here, but I don't think any of us know what it is precisely, we're feeling it out in the dark so to speak.
@bitwalker
[foo.bar]
baz = { truthy = true }
baz.falsey = false
Why would you (or anyone) want to write such a thing? Why not write either
[foo.bar]
baz = { truthy = true, falsey = false }
or
[foo.bar]
baz.truthy = true
baz.falsey = false
Either of these alternatives is not only easier and less confusing to read, but also easier to write.
@ChristianSi Yes, of course, but that's also completely not the point of the example. It is not that someone would want to write that example that way, it is about equivalence of forms in the grammar. My other comments go into plenty of detail about why that is important.
@bitwalker Okay, so it seems nobody is arguing in favor of key injection as something that's actually useful and good to have. Good to know.
Your quest for simplicity is honorable but, as I understand you, you propose to achieve it by dropping the rule "You cannot define any table more than once" altogether. That would be simple, admittedly, but the simplicity in the spec would potentially lead to documents that are highly complex, hard to read, understand and reason about. IMHO you are aiming for simplicity in the wrong place. Let's aim for simplicity of the TOML documents; if this requires more complexity in the spec, the trade-off is worth it.
Also, let's face it: The rule "You cannot define any table more than once" will NOT be dropped. So far nobody has even seriously proposed such a thing. If you want to propose it, feel free to open a feature request, but I would be very surprised if the maintainers decided to agree to such a request.
The main point of controversy about the strict interpretation seems to be that it prohibits
foo.bar = {}
foo.bar.baz = true # ILLEGAL key injection attempt
but seems to allow
foo.bar = {}
foo.bar.baz = {} # externally defined subtable
I've argued so myself, but, after rereading the spec, I revise my position and argue that the TOML v0.5 spec actually prohibits BOTH these cases, removing the controversy.
Justification: In the section on tables, the spec says: "As long as a key hasn't been directly defined, you may still write to it and to names within it."
And gives this example:
# THIS IS INVALID
a.b = 1
a.b.c = 2
Note that the spec here says nothing about which values the keys map to. So, we can modify the example, using inline tables as values instead of integers, and from the wording of the spec it clearly follows that the resulting structure is still invalid:
# THIS IS INVALID (TOO)
a.b = {}
a.b.c = {}
Just for the sake of completeness: Order does not matters in TOML (except where arrays are concerned), so obviously the example remains invalid if we reverse the ordering of the keys:
# THIS IS INVALID (TOO)
a.b.c = {}
a.b = {}
So it logically follows from the spec as it currently stands that inline tables must be complete. Any nested subtables must be defined within the inline table itself, defining them externally is not allowed.
# THIS IS FINE
a.b = { c = {} }
So, from the spec itself it follows that the potentially confusing disparity simply does not exist. @bitwalker 's honorable quest for consistency can be achieved by the strict interpretation; there is no need to drop nearly all rules and adopt an "Anything goes" model.
I revise my position and argue that the TOML v0.5 spec actually prohibits BOTH these cases, removing the controversy.
I think that any value (table or array) that is given as a RHS should be considered immutable. That way, if you see x = RHS you know, regardless of what else is in the document, that RHS is the value of x. If you want to have an extensible table or array of tables, then you need to use TOML structure. Disallowing "injection" of values but not of subtables seems weirdly incoherent and mixed up.
Okay, so it seems nobody is arguing in favor of key injection as something that's actually useful and good to have. Good to know.
I didn't say that, I suspect it _is_ useful, and I gave an example in one of my earlier comments stating one possible use case, I am sure there are others as well. The discussion in this thread certainly seems to indicate that there is some desire for the capability - or at the very least that the syntax not be self-contradictory.
Your quest for simplicity is honorable but, as I understand you, you propose to achieve it by dropping the rule "You cannot define any table more than once" altogether.
My argument is that the syntax, as it exists today, is contradictory, but implies that dotted keys can reopen tables. That implication further implies that in some situations you can "define a table more than once" (more specifically, the syntax allows you to reopen a table to add more keys with dotted keys, but one cannot do the same with the bracketed table syntax, which is inconsistent).
My proposal is that TOML needs to define a core model for the format, and base its rules around that; if there isn't one, and the rules are arbitrary, then I believe TOML can only become more complex, and so self-defeating in its stated goals. Once a core model is defined, _then_ we can argue about what rules are required, or should be redacted, based on whether they fit the model.
That would be simple, admittedly, but the simplicity in the spec would potentially lead to documents that are highly complex, hard to read, understand and reason about. IMHO you are aiming for simplicity in the wrong place. Let's aim for simplicity of the TOML documents; if this requires more complexity in the spec, the trade-off is worth it.
As I stated before, I don't buy this argument. Yes, in theory someone could write a horrible document, but they have basically no reason to do so, as the syntax of TOML primarily consists of tools for writing clean documents, and is the main draw of the format anyway. You are worried about edge cases that are simply unlikely at best. The benefit of a simple model is that documents are easier to reason about, because there are few rules that one needs to know both to read and write them. Such a model is also easier to extend in the future, as it is flexible, and edge cases (if there are any in such a model) are much less likely to present conflicts with extension.
Complexity in the spec is just as harmful to users of the spec as it is authors/maintainers of parsers/serializers for the format - it makes it harder to remember the rules, it makes it harder to know if something you have written is valid or not, which means you need to have a validator on hand at all times, and the increased complexity means that parsers are more likely to have bugs, such as disallowing valid documents, or allowing invalid ones. Simplicity of the spec is something that benefits the entire ecosystem in the end.
I'm very much curious to see an example of something one is likely to do with looser rules that a stricter interpretation prevents, and which represents a real readability problem in practice. I can concoct some ugly TOML documents easily enough, but I'm not likely to ever do that in practice, because there is no benefit, if anything it is harder to even write such documents than to "do the right thing".
Also, let's face it: The rule "You cannot define any table more than once" will NOT be dropped. So far nobody has even seriously proposed such a thing. If you want to propose it, feel free to open a feature request, but I would be very surprised if the maintainers decided to agree to such a request.
Well you certainly seem certain about that, but I'm not so certain - older parsers already may choke on 0.5 documents (I've seen many myself just in Elixir), and the changes I've talked about are not any more serious than the breaking changes which have already occurred, and I would argue such a change, organized around a core model expressed in the spec as the basis for all syntax rules, would be at least as beneficial as any other change the spec has undergone. In any case, I absolutely would open an issue/PR, but there is no point until the maintainers weigh in on what that model _is_.
I've argued so myself, but, after rereading the spec, I revise my position and argue that the TOML > v0.5 spec actually prohibits BOTH these cases, removing the controversy.
Justification: In the section on tables, the spec says: "As long as a key hasn't been directly defined, you may still write to it and to names within it."
And gives this example:
# THIS IS INVALID a.b = 1 a.b.c = 2
That is clearly invalid because a.b is a non-table value. However, as shown in the dotted keys section, the following _is_ valid:
a.b.c = 1
a.d = 2
Note that the spec here says nothing about which values the keys map to. So, we can modify the example, using inline tables as values instead of integers, and from the wording of the spec it clearly follows that the resulting structure is still invalid:
# THIS IS INVALID (TOO) a.b = {} a.b.c = {}
Well, no, the spec does not say that this example is invalid, in fact it almost certainly implies the opposite, because of the a.b.c / a.d example I just showed; a.b.c implicitly creates a table a and a table a.b, so setting a.d is the same as if one had written a = {} followed by a.d = 2. The spec is ambiguous about your specific example though, and is one of the main points of contention here.
Just for the sake of completeness: Order does not matters in TOML (except where arrays are concerned), so obviously the example remains invalid if we reverse the ordering of the keys:
# THIS IS INVALID (TOO) a.b.c = {} a.b = {}
I do agree that the spec states this is invalid, because you are redefining a table which was already defined. In my opinion this _should_ remain an error, because the intent is ambiguous due to the ordering; had it been reversed though, it no longer would be an error according to the spec, and shouldn't be, because the intent is clear.
So it logically follows from the spec as it currently stands that inline tables must be complete. Any nested subtables must be defined within the inline table itself, defining them externally is not allowed.
Well, since I'm refuting the assumptions this is built on, we can't say that.
So, from the spec itself it follows that the potentially confusing disparity simply does not exist.
The spec is ambiguous here, so no, it does not follow; the entire conversation we've had so far is about two possible interpretations (with variations in between) of the spec, so it is a given that there is some disparity/confusion, and this is because the spec does _not_ clarify the interaction between implicit and explicit table declarations and the behavior of dotted keys. At a minimum we have to agree that our argument is based on our interpretations of the spec, and how we desire it to be read, but neither of us are in a position to say that the spec is clear about this. My argument thus far is based solely on how I believe the spec should be clarified in the future, aside from specific conclusions drawn from what things are spelled out in the spec.
@bitwalker 's honorable quest for consistency can be achieved by the strict interpretation; there is no need to drop nearly all rules and adopt an "Anything goes" model.
I didn't advocate for dropping all rules or adopting an "anything goes" model - I very clearly expressed the basis from which additional rules would be derived, there are obviously still some basic rules, and the syntactic rules of the grammar. My quest for consistency is ultimately about defining the thing we're trying to be consistent _with_, which you haven't stated yet. Just because a set of rules are consistent with each other doesn't mean they make sense, just that they aren't conflicting with each other; I would assume we all want the rules for TOML to make sense in the context of _something_ - the goals of the project would indicate that the context is how to express hierarchical key/values in a readable and convenient way, for which only a few rules beyond the grammar are necessary.
Is the context instead that TOML is an opinionated format for expressing those key/values in a specific way? Then, probably, dotted keys outside of the bracket syntax should never have been added. Now you have more than one way to express the same thing, so it is hard to argue that it is opinionated, and if it isn't opinionated, why are we debating about _how_ someone should be allowed to express key/values? I don't mean to say that you are necessarily making the above argument, but I would like to know the context from which you are advocating, because otherwise it is difficult to put myself in your shoes.
I think I will wait until we hear from the maintainers for now, unless you have specific things you want me to explain about my argument, or you share the context I mentioned; until then I don't think we can make much progress on this.
I've been AWOL for a bit because a lot of real life has been happening for me.
I see a lot of interesting discussion has taken place here (thanks everyone!) but I genuinely don't have the bandwidth currently to get up to speed on it currently. I hope to make time to come around to this soon.
I'm still catching up. (For the record, I've _never_ advocated different types of hash tables for the different table syntaxes.)
@ChristianSi The strictest interpretation is very good. But to clarify, would the following still be legal? That is to say, can headers still be written subtable-first (ugly as that may be)? Or would you say that [a.b.c] means that the table a.b is assumed empty except for its subtables, and that the second line makes the document invalid?
[a.b.c] # An empty table
[a.b] # Its parent, with no key/value pairs (not counting a.b.c)
It seems sensible that only tables which are given as right-hand-side literals be considered "closed".
Getting back to the central topic, would the following be legal under the strictest interpretation? I'm inclined to think it's not, but perhaps it actually is. In the latter case, the openness of subtables introduced by dotted key/value pairs is still in play. And in either case, we may need to add language to the spec addressing the ordering of dotted key assignments.
a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"
# And btw, we do need to update TOML syntax highlighting, in jneen/rouge I think.
@StefanKarpinski If that's true, then the above is perfectly valid, since no inline table values are involved.
It seems fine to me since tables are being built up incrementally in any case. What is the purpose of a more strict interpretation? This is a real question. Is the purpose to allow an implementation to "close" a table earlier? Is closing a table early actually a significant benefit in any implementations?
@eksortso:
@ChristianSi The strictest interpretation is very good. But to clarify, would the following still be legal? That is to say, can headers still be written subtable-first (ugly as that may be)? ...
[a.b.c] # An empty table [a.b] # Its parent, with no key/value pairs (not counting a.b.c)
Sure, that remains legal. Order of table blocks (introduced by [...]) doesn't matter in TOML, except where arrays of tables (introduced by [[...]]) are concerned.
Getting back to the central topic, would the following be legal under the
strictest interpretation? ...a.ok.a = "Hello" a.DD = "DISTRACTION" a.ok.z = "Goodbye"
Sure, that remains legal. Order of key/value pairs within a table block doesn't matter in TOML v0.5. (Some months ago there was a discussion about prohibiting such an ordering in future versions of TOML, but that would clearly be an additional restriction which is not yet part of the spec. The strict interpretation, on the other hand, is only about making explicit what's already implicit in the TOML v0.5 spec, not about introducing new restrictions.)
To help clarifying things, here is an attempt to explain the strict interpretation in an unambiguous manner and with examples. If this interpretation is accepted as the correct one, a suitable rewrite of this attempt could be incorporated into a future version of the spec (v0.5.1 or so).
Ways of defining tables
TOML has two ways of defining tables: table blocks and inline tables. TOML forbids defining the same table twice, therefore you can use either of these for any table, but you cannot use both for the same table. Moreover, you are not allowed to define the same table in two different table blocks or in two inline table literals.
Table blocks start with a table header line: [table.name] for stand-alone tables, or [[table.name]] for members of a table array. They continue with a (possibly empty) list of key-value pairs and end right in front of the next table header line (or, if there is none, at the end of the document). A special case is the root table block: it contains any key-value pairs between the start of the document and the first table header line; these key-value pairs belong to the (unnamed) root table.
A table block does not only define its main table (whose name is given in its table header line – if there is none, it defines the unnamed root table), but also any nested tables mentioned in dotted keys listed within the table block. To give an example:
# in root table
vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.bools.t = true
vals.bools.f = false
This fragment defines four tables: the root table ('') and the nested tables 'vals', 'vals.nums', 'vals.bools'. (No values are inserted into the 'vals' table directly, but it is nevertheless defined because it appears within a dotted key.)
Tables must not be defined twice, therefore the following table header lines are now ILLEGAL:
[vals] # ILLEGAL, defined in root table!
[vals.nums] # ditto
[vals.bools] # ditto
But tables defined within table blocks are only assumed to be semi-complete: nested tables and table arrays may be defined in other table blocks (obviously, since all tables are direct or indirect children of the root table). So, to return to the above example, all other syntactically correct table header lines which haven't yet been used as keys remain allowed, including
[misc] # another child of the root table
[vals.literals] # a new, not yet defined child of 'vals'
[vals.nums.specials] # a new, not yet defined child of 'vals.nums'
# ... and anything else you can think of, except stuff like
[vals.nums.one] # ILLEGAL, since that's already a key
Alternatively you can define tables as inline table literals. You could rewrite the above example as:
# in root table
vals = { nums = { one = 'One', two = 'Two' }, bools = { t = true, f = false } }
Inline tables, however, are values, and like other values (anything that appears on the right side of an equals sign) they are supposed to be immutable and complete. If you define 'vals' as an inline table, you are therefore NOT allowed to define any nested tables outside the inline table literal (neither as table block nor as another inline table literal).
# still in root table
vals.literals = { ... } # ILLEGAL since 'vals' is an immutable inline table
vals.nums.specials = { ... } # ditto
[vals.literals] # ditto, the chosen syntax doesn't matter
[vals.nums.specials] # ditto
[vals.nums.something.deeply.nested] # ditto
The principle is simple: Anything you want to go inside an inline table must be written into the table literal.
# This is allowed, but the line will probably get too long to be really readable.
vals = { nums = { one = 'One', two = 'Two', specials = { ...} }, bools = { t = true, f = false }, literals = { ... } }
# Consider switching to table block or dotted syntax instead!
Anything said here likewise applies to inline table arrays (including arrays of inline table arrays and so on) which work in exactly the same way as inline tables.
We have a good example that would help to clarify the standard regarding dotted keys and when implicitly defined tables are introduced. It's important to resolve this, because between three different Python TOML parsers in PyPI, one of them (uiri/toml) raises an error, and two others (sdispater/tomlkit and alethiophile/qtoml) raise no errors and define both c and d in a.b.
The example comes from sdispater/tomlkit#37. I'm hoping that I am interpreting this right.
a.b.c = 12
[a.b]
d = 34
My take is, this is _invalid_ under TOML v0.5.0, because the table a.b is defined in two different locations: implicitly in the root block with the dotted-key definition, and explicitly in the [a.b] block. The key/value pairs do not conflict with each other, but to be valid, they must be declared in the same block.
I imagine that @ChristianSi would agree with this interpretation and would call for explicit language clearing up all confusion in a future TOML version (and also that the table a is defined in the root block); but that @bitwalker, and maybe @StefanKarpinski, would say that the TOML in the example is valid in v0.5.0, maybe with varying interpretations to allow for "scope merging." But I'm just speculating.
So to anyone interested, what is your take? Is this example valid TOML v0.5.0? What, if anything, belongs in the next version of TOML to clarify what we see happening here?
@eksortso I believe that all arguments in favor of either interpretation have been exchanged, so now would be the time to Make A Decision. Sadly, since TOML's founder is an absentee owner 999 days out of 1000, such a decision is unlikely to be made. Unless somebody else with sufficient decision-making power jumps in – @pradyunsg maybe? – I fear this issue will remain unresolved, leaving the TOML world sadly fragmented :cry:
This is administrative stuff at heart, but it must be addressed. Differing implementations is not good.
Would it speed things up if a _decision pending_ tag were slapped onto every issue where the only thing necessary going forward is for someone with the rubber stamps like @mojombo or, as was suggested, @pradyunsg, to read the ticket, consider the arguments, and make a binding decision?
I've been swamped by a lot of things in the past bit of time. I'll try to catch up on this over the coming weekend.
@eksortso which issues specifically?
@pradyunsg, I was speaking generally, thinking that having a dedicated tag on issues or PRs might speed up response times on critical issues. Specifically I'm referring to this issue, because we're seeing divergent interpretations in the parsers. Though it could be applied to others like #553 which have been talked through thoroughly but aren't as immediately critical to the standard.
The idea behind this is that our top decision makers could focus on _decision pending_ issues and respond to them first. But depending on what the TOML standard's actual governance model is, such tagging would be redundant.
My OSS time situation isn't good. (pip 19.0 rollout hasn't been "smooth") :/
If someone could summarize the possible positions the specification could take wrt restrictions, as discussed above, it would be greatly appreciated. :)
@pradyunsg I'll summarize my position at least, and let others cover theirs:
In essence, there is ambiguity in the spec regarding reopening/extending tables to define new keys, namely via dotted keys vs bracketed keys, with inline tables in the mix as well.
My argument is that if the core data model is a hash table, then any combination of table syntax should be permitted to define tables, or extend previous definitions of tables as long as the restriction that redefining keys with non-table values is not violated. This keeps implementation straightforward and the rules simple for those writing TOML to remember. As I see it, any other option results in conflicting rules which are arbitrarily resolved, which does not seem to vibe with TOMLs stated goal of minimalism.
In my view, the following is valid:
# produces { a = { b = 1, c = { d = 2}}
a = {}
a.c = {}
a.c.d = 2 # extends a.c
[a] # only opens the table, reopens if it exists
b = 1
The discussion in this thread is long, but I think is worth the read, because we identify all the issues and possible solutions in detail.
See my comment below for some additional thoughts.
@pradyunsg My point was that while many people in this thread feel that the following:
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3
should be invalid; the spec explicitly allow this by saying:
As long as a key hasn't been directly defined, you may still write to it and to names within it.
It needs to be clarified if that's not the case.
I would also like to echo @bitwalker by saying that this thread is definitely worth reading in its entirety.
The example given by @AndrewSav reminded me of a point I would like to clarify. If that example or any of the others in this thread are actually supposed to be invalid, then it is not only important to clarify the specification, but clarify _why it is invalid in the first place_, beyond just "we choose to resolve conflicting rules in this specific way".
Cognitive load is just as important a metric as syntactic complexity in my opinion, and having a framework from which to reason about the rules reduces that load, as long as there is _some_ unifying framework.
Put another way, if TOML maps unambiguously to an arbitrarily nested hash table, what do the rules described in the specification do to support that mapping or support the goal of minimalism. If any are contradictory, why? If we want to place restrictions on how the syntax allows you to describe a hash table, users and implementors alike expect those restrictions to come as a trade off, for a benefit that is worth more than the loss of flexibility. That trade off should be explained to help both users and implementors of TOML to properly reason about its use. If there is no trade off, then such restrictions probably should be lifted, or at least reconsidered.
I'll stop posting now to avoid cluttering this thread further, but I feel like the above condenses my thoughts best.
@bitwalker Your example isn't getting you what your comment says. The introduction of [a] means, by your own standard, the code produces the following:
a.b = 1
a.a.c.d = 2
Or, {a = {b=1, a={c={d=2}}}}.
I thoroughly back the position laid out by @ChristianSi in his November 10, 2018 comment. I couldn't express it any more clearly. https://github.com/toml-lang/toml/issues/499#issuecomment-437613979
@eksortso Thanks, I still stand by that position and propose to add something like the text in that comment ("Ways of defining tables") to the next revision of the TOML spec. If further clarification is needed: it's an attempt to explain how dotted keys and inline tables interact with TOML's rule "You cannot define any table more than once".
I believe that such a clarification would not introduce any new restrictions but merely make explicit what's already implicit in the TOML v0.5 spec, as explained in an earlier comment.
Just noting that this is still on my radar -- I've just not been able to make time for this.
I finally managed to come around to reading this and spend some time thinking about this.
Geez y'all. This is a wonderful and dense conversation! Thanks a ton for providing your inputs here everyone! It's much appreciated. :)
Putting down my thoughts in a follow up post.
I was in the "strict" camp before it got a name. ;)
@ChristianSi's well written "Ways of defining tables" semantics, are exactly as what I had in mind, when writing up the specification for dotted keys.
To reiterate poorly, inline tables are immutable and tables directly defined by a dotted key can not be "redefined" by using the [table] syntax or the inline table syntax.
i.e. The following examples are invalid:
foo.bar = {}
foo.bar.baz = "true" # INVALID
foo.bar.spam = {} # INVALID
vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.nums = { three = 'Three' } # INVALID
[vals.nums] # INVALID
three = 'Three'
The following examples are valid:
vals.nums.one = 'One'
vals.nums.two = 'Two'
[vals.letters]
one = 'A'
two = 'B'
[profile]
release.debug = true
[profile.release.misc]
alpha = "A"
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3
I never intended that this last example be valid (and neither did @mojombo), but it is as per the language used. Now that I re-read the spec, it is clear to me that the intent to disallow this is not as obvious, as I thought it was when I wrote this.
~We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.~
@bitwalker It would help to add a clarification in the inline tables section -- inline-tables are basically a fancier "Value" and all values are immutable.
While I do think having some reference/guidance on why certain choices were made is helpful, I don't think adding that would be critical-path for getting to 1.0.
Action items here would be, at least:
If anyone can think of additional things we should do here, please do holler! :)
We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.
I'm on the fence on this TBH -- I don't want to break compatibility but I also _really_ want to just straight up disallow this -- I don't see too many usecases where doing this out-of-order makes much sense anyway so maybe the breakage is fine?
I guess we should look into this in a follow up, better scoped, issue.
@pradyunsg
I was in the "strict" camp before it got a name. ;)
Happy to hear it :+1:
If I understand you correctly, you definitively want to prohibit key injection into inline tables in TOML 1.0 (yeah!) but are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?
toml
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3
While I don't have any strong feelings on the second issue (as opposed to the first one!), my viewpoint is that such out-of-order definitions, though bad style, are harmless and should not be prohibited in TOML 1.x. For one thing, they are clearly allowed in 0.5 and hence covered by our compatibility promise, and moreover, the rule that "order of keys within a single table block" (introduced by [...] or [[...]]) "doesn't matter" is pretty clear-cut and easy to remember.
are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?
Yep and yep.
though bad style, are harmless
Yea, this is basically where I'm split tbh. Allowing them in TOML 1.0 isn't a PITA but it is a quirk that I (really) don't want to have.
We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.
Let's just stick with this.
If anyone can think of additional things we should do here, please do holler! :)
No one did.
Opened #630, #631 and #632 as follow-ups. Going to go ahead and close this. Thanks again for the discussion here everyone! :)
Most helpful comment
@ChristianSi Sorry, I wrote that example off the cuff, all I intended with the last line was to show different types of table usage in the example, but it ended up making it confusing :(
This is the declared goal of TOML. The three key principles there, "minimal", "obvious semantics", and "maps unambiguously to a hash table" are what we should judge all questions by. In my opinion, we need the following:
I think that core principle should be (and is already to some degree): TOML is a syntax for representing a namespace of keys and values; keys are unique and there are no features which cannot be mapped to a flattened representation, where flattened representation is in dotted-key form, e.g.
foo.bar = "baz". The syntax of TOML is composable, such that any combination of forms is allowed (within the syntax rules for those forms) and can always be unambiguously mapped back to the flattened representation.Given that principle, we can define the following simple set of rules:
[foo.bar]) does not definefoo.bar, it sets the context for keys following the declaration up until the next table declaration, or EOF. Given this:[foo.bar] baz = "true"Is flattened to
foo.bar.baz = "true", _not_:foo.bar = {} foo.bar.baz = "true"Any combination of table declarations, inline tables, and dotted keys is permitted, as long as a key is never defined twice, so the following is allowed:
and unambiguously maps to:
Furthermore, this gives us a framework from which to reason about table arrays, in other words, rather than it being an exception to the rule of redefinition, it is simpler to think of each element declaration as defining a new "hidden" key, based on the order of appearance, which a conforming parser infers from the syntax, e.g.
could be flattened like so:
The key guiding principle here is symmetry/composability. We have various forms of syntax for tables and their keys, so it should be possible to combine them in different ways but retain _one_ semantic model. Prohibiting redefinition of keys is distinct from the question of whether you can "reopen" some part of the keyspace to add new keys.
What is the guiding principle behind your interpretation? I ask because adding more rules without some unifying principle is not user-friendly; it makes documents harder to write, harder to read, and parsing more complex and thus more error-prone. I agree it is good to add more rules when they provide clarity in the context of some guiding principle/intuition; but it is not necessarily desirable when the clarity comes at the expense of violating ones intuition. If readers of the spec are told to internalize a few simple principles ("TOML is a syntax for representing a namespace of keys", and "it is not allowed to redefine keys"), it gives them an easy way to test if some interpretation of a rule is correct ("does interpretation A imply I can redefine something?") and therefore easy to understand.
Conformity to a single way of doing things _is_ desirable in many cases, it is one reason why code formatters in languages like Go are so nice - when there is a lot of complexity, knowing that some things will always looks the same orients you in a new context. That said, TOML is a simple format, there isn't enough complexity to justify forcing that kind of conformity, particularly when it has the potential to roadblock further improvements down the line (such as imports), or put you in a corner with regards to ambiguity. I think it is also a bad sign when there isn't any justification which explains why some rule exists which ties back to the core principles, only that some usage pattern is prohibited arbitrarily (i.e. based on preferences).
In any case, I think it is important for the maintainers of TOML to decide what mental model should be driving these design decisions (or at least make it prominent, if they have already made it known elsewhere), and then consider these questions in that framework. From what I've seen, things are too abstract (i.e. "maps unambiguously to a hash table" is not specific enough, what is the "core", or simplest possible, representation; how do features relate to that representation), and that drives both ambiguity in the spec, and different interpretations of how features should be implemented/represented, because everyone has a different mental model.