Toml: The spec language seems to forbid writing to the same defined table using dotted keys

Created on 18 Sep 2020  Â·  47Comments  Â·  Source: toml-lang/toml

(I may have made some mistakes in this write-up, but I hope I have made the contradiction obvious by the time you get to the end. Please read to the end, because I may be identifying multiple problems, so if you feel I am wrong about one issue you should still consider the rest of this write-up.)

Let's start here:

Dotted keys define everything to the left of each dot as a table. Since tables cannot be defined more than once, redefining such tables using a [table] header is not allowed. Likewise, using dotted keys to redefine tables already defined in [table] form is not allowed.

The contradiction is here:

  1. Dotted keys define everything to the left of each dot as a table.
  2. tables cannot be defined more than once

Here, "define" must mean explicitly defined (because tables can be implicitly defined multiple times by either dotted keys or headers) and no delineation is made between tables "defined in dotted keys" versus tables defined in headers (referred to elsewhere in the spec as "directly defined"). It is stated that any table (explicitly) "defined" cannot be (explicitly) "defined" again. That means this should be invalid:

[fruit]
apple.color = "red" # Here, apple is "defined as a table"
apple.taste.sweet = true # Here apple is "defined" AGAIN as a table

I would also add that the spec should explicitly state that "dotted keys" can NOT refer to keys separated by dots inside headers. Otherwise, that would also mean that defining headers out of order would be invalid.

[fruit.apple]
# `fruit` is implicitly defined as a table
# `apple` is explicitly defined as a table within `fruit`

[fruit]
# fruit is explicitly defined as a table

# of course, this document should be valid!

Now, if you interpret "define" as implicitly defined in this case, that still doesn't work. More on this later.

We see the following elsewhere in the spec:

As long as a key hasn't been directly defined, you may still write to it and to names within it.
```toml
# This makes the key "fruit" into a table.
fruit.apple.smooth = true

# So then you can add to the table "fruit" like so:
fruit.orange = 2
```

The above is the only place in the spec where the term "directly defined" is used to delineate between the different kinds of "define". (Aside from where it says that in-line tables are "fully defined" at their location and cannot be added to or modified, but that is pretty air-tight so I don't want to focus on it at all.)

The (first) paragraph in question is followed by "The [table] form can, however, be used to define sub-tables within tables defined via dotted keys." I don't appreciate how the first two examples follow naturally from the principle "Since tables cannot be defined more than once" but the third example is a contradiction of the principle. It's fine for all three of these examples to be fine in TOML, but not without delineating between the different kinds of "define".

The contradiction leads me to question whether the following is an example of a valid contradiction of the aforementioned principle:

Is it valid for "dotted keys" (not in headers) to modify something implicitly defined as a table in a header? My guess is yes, since if you change the order it is valid.

[fruit.apple.taste]
# `fruit` is implicitly defined as a table
# `apple` is implicitly defined as a table in `fruit`
# `taste` is explicitly defined as a table in `apple`
[fruit]
# fruit is explicitly defined as a table
apple.color = "red"
# apple is (indirectly) defined as a table with color set to "red"

While there are occurrences in the spec where you can easily infer the difference between the kinds of "define" being used (including in some examples above), the top quotation seems to imply it is possible to possible to implement the "dotted keys" by only delineating between defined tables and implicitly defined tables (leaving aside arrays and in-line tables for now), when in fact, it is not possible. You HAVE TO delineate between dotted key definitions (not in headers) versus header definitions. There is no way to satisfy all the rules any other way.


Here is an example of an implementation. It's a bit TypeScripty, hopefully others can understand what I am getting at.

Define enum ScopeType as having members: { Inferred, Defined }
Define scopeTypes as a Map/Hash of Table -> ScopeType.
Let global be the global scope

A Map pairs keys with values, and has methods "has", "get" and "set". "has" returns whether a mapping exists for a given key. "get" returns what a given key maps to. "set" maps a given key to a given value. In this case, ScopeTypes.has(value) is another way of expressing that a given value is not a literally defined value, e.g.1 or false
[a]
# assert global["a"] == undefined or scopeTypes.get(global["a"]) == ScopeType::Inferred
# define `a` as global["a"] or an empty table {}
# scopeTypes.set(a, ScopeType::Defined)

b.c = 1
# assert a["b"] == undefined or scopeTypes.has(a["b"])
# scopeTypes.set(b, ScopeType::Inferred)

[a.b] # should be forbidden
# assert global["a"] == undefined or scopeTypes.has(global["a"])
# define `a` as global["a"] or an empty table {}
# scopeTypes.set(a, ScopeType::Inferred)

# assert a["b"] == undefined or scopeTypes.get(a["b"]) == ScopeType::Inferred
# define `b` as a["b"] or an empty table {}
# scopeTypes.set(b, ScopeType::Defined)

The crux of the problem here is in our [a.b] logic, where b may only be Inferred, in order to allow these cases:

[a.b.c]
[a.b]
# `a` is Inferred
# `b` was Inferred, but is now Defined
[a]
[a.b] # `a` is Defined
# `b` was undefined, is now Defined

And block this case:

[a.b]
# b is defined
[a.b]
# b is defined already! error!

Hence, there is no way to block "redefining such tables [tables defined by dotted keys] using a [table] header is not allowed." E.g.

[a]
b.c = 1
[a.b] # forbidden?
# `a` is Inferred
# b is must be Defined in order to block this

So let's rewrite our logic where we set b in b.c = 1 to be Defined in order to account for the above case:

a.b = 1
# assert global["a"] == undefined or scopeTypes.get(global["a"]) == Defined
# define `a` as global["a"] or empty table {}
# scopeTypes.set(a, ScopeType::Defined)
# set a["b"] = 1

Now we can't block this:

[fruit.apple]
# fruit is Inferred
# apple is Defined

[fruit]
# fruit is Defined

apple.color = "red" # This is acceptable by the above logic

# To block this, we'd need to assert `apple` is not Defined

There is no way to block the above document without blocking the use of the same dotted-key-defined-table using only 2 ScopeType enums.

a.b = 1
# `a` is Defined here

a.c = 2
# We must allow `a` to be Defined for this
# But if we allow `a` to be Defined, we can't block the previous case!

I am trying to get the point across using code here because I don't know of a better way to express this technically. We need to delineate between dotted key define's versus header define's in the spec in order for it to make logical sense.

(I may have made some mistakes in this write-up, but I hope I have made the contradiction obvious by the time you've managed to read this far.)

clarification

Most helpful comment

I'll be back later, but it'll likely be late in the evening for you.

Hah. Time has no meaning in this age of corona self isolation and working from home. It's still March 2020 after all, right?

All 47 comments

Yes, I think you had the same problem. The concept of "define" was not clear enough for me and it could use it's own paragraph in the spec. I closed my issue #770 because I didn't think this was the place for tech support but maybe the author of the spec wants to actually make some changes.

I believe this has been sufficiently explained in the comments to #770 and #771 and can therefore be closed. Or is anything still unclear?

If anyone has a specific wording change to suggest, I'm all ears. Otherwise, I'd say I'll close this when I come back around to this. :)

I appreciate the conception of the spec given in #771. I think it answers all the problems I identified. In the next few days I'll write something to clarify this in the spec (unless someone else wants to). Having just learned this, I understand which parts aren't as clear as they could be.

I think this was sufficiently addressed by #795 and can therefore be closed.

Indeed!

Sorry for being inactive, but it seems to me like this problem is still present in the latest version of the spec:

Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.

fruit.apple.color = "red"
# Defines a table named fruit
# Defines a table named fruit.apple

fruit.apple.taste.sweet = true
# Defines a table named fruit.apple.taste
# fruit and fruit.apple were already created

Since tables cannot be defined more than once, redefining such tables using a [table] header is not allowed. Likewise, using dotted keys to redefine tables already defined in [table] form is not allowed. The [table] form can, however, be used to define sub-tables within tables defined via dotted keys.

IMO this is still contradictory (fruit and fruit.apple were already created AND defined, at least going solely by the language in the spec) and doesn't adequately convey https://github.com/toml-lang/toml/issues/771#issuecomment-696796941

This language "Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created" has a few problems. What do dotted keys do if such tables were previously created? Are they.... invalid? This still doesn't answer whether a previously created (by [headers]) table can be defined by dotted keys (they can).

[fruit.apple.texture] # creates fruit, fruit.apple
smooth = true

[fruit]
apple.calories = 94.64 # defines fruit.apple

I believe this has been sufficiently explained in the comments to #770 and #771 and can therefore be closed. Or is anything still unclear?

I understand how TOML works now, due to our discussion threads, but this is important behavior to explain in the spec because @marzer/tomlplusplus still gets this wrong... https://godbolt.org/z/K53Tb7 (hence https://toml-parser.com/ gets this wrong too)

I also don't think its good to keep the term "directly defined" in the spec after introducing the concepts of a table being "created" vs being "defined". Again, what makes something directly defined versus not directly defined? Are things indirectly defined? I think most of my original criticisms are still applicable to the latest document.

This language "Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created" has a few problems. What do dotted keys do if such tables were previously created?

_Nothing._ Such tables are already created. Read the example's comment for the extra bit of clarity that I thought was obvious. If that's still inadequate, refer back to my explainer in #795, either version of it, and the conversation that followed.

Are they.... invalid?

_No._

This still doesn't answer whether a previously created (by [headers]) table can be defined by dotted keys (they can).

The text that you quoted explicitly says, "using dotted keys to redefine tables already defined in [table] form is not allowed." So _they cannot._ Find me contradictory language that this doesn't cover.

You could excise the word _directly_ from "directly defined" and what you'd have left would mean the exact same thing, because all the means for defining key/value pairs is included in the spec. If this is still a rough edge too far, then go ahead and open a new issue, or open a PR for a future update. If there's discussion to be had, it's welcome, elsewhere. But this issue is closed.

@Validark: I think that the spec is now reasonably clear, though, of course, implementers still can and will make mistakes. But the behavior you found in tomlplusplus is in clear violation of the spec, which says: "The [table] form can, however, be used to define sub-tables within tables defined via dotted keys" (followed up by an example).

So you can file a bug there and get it fixed, if you didn't already. But that's really a problem of that specific implementation, not of the spec.

This language "Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created" has a few problems. What do dotted keys do if such tables were previously created?

Nothing. Such tables are already created. Read the example's comment for the extra bit of clarity that I thought was obvious. If that's still inadequate, refer back to my explainer in #795, either version of it, and the conversation that followed.

This is not the correct answer. The answer is that the dotted keys define but do not create such tables.

This still doesn't answer whether a previously created (by [headers]) table can be defined by dotted keys (they can).

The text that you quoted explicitly says, "using dotted keys to redefine tables already defined in [table] form is not allowed." So they cannot. Find me contradictory language that this doesn't cover.

Your quote speaks only of defined tables, and says nothing of created tables.

Again, this is what we're talking about:

[fruit.apple.texture] # creates fruit, fruit.apple
smooth = true

[fruit]
apple.calories = 94.64 # defines fruit.apple

implementers still can and will make mistakes

@marzer has been involved in these discussions though. I don't think he misunderstood the spec as much as the spec lacks some critical points.

https://github.com/marzer/tomlplusplus/issues/61

I think @eksortso is right though that this should be addressed elsewhere.

Ok, I'm way late to this party, apparently. My read of the spec when I wrote that part of toml++ was that the example above should be illegal because dotted keys were just syntactic sugar for nested inline tables, and thus had the same rules about not being mutable via regular [tables]. If I got this wrong I'll happily change it, but I don't think that _should_ be wrong. Dotted KVPs and [tables] being allowed to mix in that manner seems rather stupid.

The [table] form can, however, be used to define sub-tables within tables defined via dotted keys

Huh. Was that added recently?

@Validark I owe you a response. A few bits of confusion need to be addressed, but your example at the end is a real thinker, worthy of some extra consideration. Talk to you again later.

@marzer:

The [table] form can, however, be used to define sub-tables within tables defined via dotted keys

Huh. Was that added recently?

Yeah, it was added by #797. You might want to check that PR to ensure that you didn't miss anything in your implementation. The PR didn't really change the meaning of the TOML spec, but it made things explicit that were formerly more or less implied.

@ChristianSi Roger that, thanks

@marzer:

The [table] form can, however, be used to define sub-tables within tables defined via dotted keys

Huh. Was that added recently?

Yeah, it was added by #797.

No, that text was _preserved_ in #797. It was there before. The original text dates back to August 2019 in README.md.

Oh, sorry for the confusion! But my general advice to read #797 still stands.

Hah. Interesting. Guess I just blew right past that snippet the first time round O_o

After revisiting the implementation and tests for this part of toml++ it appears I _did_ actually write a test case for the language feature in question, and my parser handles the spec's example correctly.

That is:

# "The [table] form can, however, be used to define sub-tables within tables defined via dotted keys."
[fruit]
apple.color = "red"
apple.taste.sweet = true

[fruit.apple.texture]  # you can add sub-tables
smooth = true

Parses correctly and yields the values and structure you'd expect. What is rejected with an error is a reversal of the two [table] headers:

[fruit.apple.texture]
smooth = true

[fruit]
apple.color = "red"
apple.taste.sweet = true

If I'm understanding the language-lawyering going on here, this should in fact be accepted because fruit.apple has only been defined, not fully 'created', right @eksortso?

@marzer I believe so. Thanks for bringing this up.

I think @Validark, who I was responding to when you wrote, was correct in pointing out that the current language does not clearly explain that dotted keys can define a table that was created elsewhere but was not yet defined.

It tears me up that this wasn't perfected by v1.0.0's release. The current spec isn't wrong. Just unclear. And needs fixed.

I'll submit a PR.

@eksortso Alright, well fortunately that's a pretty easy code change; the current behaviour is an explicit error check and removing the check will yield the correct behaviour.

~For the sake of being comprehensive, it should be the case that an appearance of [fruit.apple] in either of the two examples should cause it to explode, right? e.g.:~

<snip>

Scratch that. The spec has an explicit example for this situation. It's already in my tests 😅

@marzer That's correct.

@marzer @Validark Asking you first. Does this alternative paragraph clarify the issue?

_Dotted keys create and define a table for each key part before the last one. If such a table was previously created but not yet defined, that table instead is used and its definition begins. All dotted key table definitions end with the next table header (or EOF)._

@eksortso

Dotted keys create and define a table for each key part before the last one.

Hmm? Isn't that the opposite of the intention? i.e.

[fruit.apple.texture] # creates [fruit] and [fruit.apple], 
                      # creates and fully defines [fruit.apple.texture]

[fruit] # provides the definition for the previously-created [apple]

@marzer The second line immediately qualifies the first line. If the table already exists, use it instead.

I can't find a more concise way to express this. So I'm open to suggestions.

My confusion lies with "create and define a table for each key part before the last one"

Which, given the definitions we've established for "create" and "define", is incorrect. It doesn't create and define each of these tables, only creates them. The definition only applies to the last one. Right?

I think the first sentence needs to be more specific. Something like:

_Dotted keys create a table for each key part, and fully define the table for the last key part._

It doesn't create a table for the last key part, which is a key name. It could be the key for an inline table, but that case is already covered.

But thank you @marzer for setting me straight; I need to create or identify a table for each key part left of a dot. And only then talk about defining that table.

@marzer No wait, you missed something. It's not just the last part that's defined.

All those dotted-key tables are, in fact, defined. They cannot be redefined outside of the table section that they appear in.

I'm running on fumes right now. It's 6am and I haven't slept all night. But I'm damned determined to get this done before I fall asleep.

What?? How can that be possible? If that's the case, this should be illegal:

[fruit.apple.texture]
[fruit] # if what you say is true, fruit was defined above, and this should explode

Or are we specifically talking about dotted keys in dotted key-value-pairs, as opposed to just dotted keys more generally? Because if so, the text above is still wrong, but now for different reasons.

@marzer We're only talking about dotted keys in key/value pairs.

Ah right. Well if that's contextually obvious wherever that clarification paragraph would appear, then I can't see an issue with it.

Unfortunately I just noticed an issue. It involves dotted keys inside inline tables.

I need sleep, or breakfast, or something, so I'll let this simmer for awhile. So much for that damned determination. I'll be back later, but it'll likely be late in the evening for you.

I'll be back later, but it'll likely be late in the evening for you.

Hah. Time has no meaning in this age of corona self isolation and working from home. It's still March 2020 after all, right?

@marzer The term "dotted keys" in TOML exclusively refers to keys separated by dots in key-value pairs. Headers do not have dotted keys.

It would be nice if this nomenclature was changed so that things which look the same can be referred to similarly (and distinguished by the context in which it appears). However, I'm going by the current definitions for these concepts.

@Validark

The term "dotted keys" in TOML refers to keys separated by dots in key-value pairs. (pre-edit)

No, I have to disagree with that. Tables are identified by keys, and those keys can be dotted keys.

[fruit.apple] # this table's fully-qualified key is fruit.apple

It is important to make the distinction between dotted keys and dotted KVP's.

@marzer From the spec:

Naming rules for tables are the same as for keys (see definition of Keys above).

Header tables are not identified by keys. Keys and dotted keys refer only to that which is found in key-value pairs.

@Validark Nonsense. Conceptually they _are_ keys, even if the spec doesn't name them as such. Consider, also from the spec:

TOML is designed to map unambiguously to a hash table.

The top-level table, also called the root table, starts at the beginning of the document and ends just before the
first table header (or EOF).

Dotted keys create and define a table for each key part before the last one [...]

The [table] form can, however, be used to define sub-tables within tables defined via dotted keys

Table headers define the keys which map to the tables they represent. There's a bunch of contradictions in the above if you think otherwise.

Especially given the mixing of table headers and dotted kvps here in this discussion. How can something be a key only when it appears as part of a KVP, but not when it appears between [], if you can use one to make subtables of the other? It makes absolutely no sense.

The spec could be more explicit about this perhaps, but personally I think to suggest they're conceptually different is the height of pedantry regarding something that is otherwise very obvious. The language is supposed to be simple and obvious. They're keys.

I understand what you're getting at, and I'd even support changing the definition of what a TOML key is to include table names.

https://github.com/toml-lang/toml/issues/771#issuecomment-700200910

Should these definitions be adjusted to be more in line with common sense? Sure, but that's a separate issue from what current definitions are. We need to have explicit, rigid definitions for language concepts so we can describe them concretely. Yes, it's pedantic, but through what other lens does one read a specification?

Should these definitions be adjusted to be more in line with common sense? Sure, but that's a separate issue from what current definitions are. We need to have explicit, rigid definitions for language concepts so we can describe them concretely.

Sure, but I won't stand for being corrected on this. That's what's pissed me off here. Not that the spec isn't clear about this, but that you felt you needed to correct me when I wasn't wrong. They're keys, in that a key is a unique identifier, even if the TOML spec doesn't name them as such (despite the fact that the examples I gave above show that it implicitly does).

If the spec changes, great, the terminology should probably be conceptually consistent (i.e. call a key a key since words have meanings), but it doesn't matter to me here; they've always been keys, and any argument to the contrary is time-wasting pedantry and/or being a dick on purpose, frankly.

I was not wrong to suggest there be a distinction between dotted kvps and dotted keys more generally, since a dotted table header is a table header that contains a dotted key.

Yes, it's pedantic, but through what other lens does one read a specification?

Also, we should _avoid_ unnecessary pedantry when it's just downright stupid, like this is. That TOML doesn't have a 100-page specification or a damn ISO committee is a _good thing_. We should be willing to allow a dictionary and/or assumed domain knowledge to do some of the heavy lifting for us, in aid of keeping the language both minimal and obvious. For example, it's obvious that table headers define keys that uniquely identify the table.

From the OED:

Key, n.
[Computing]
a field in a record which is used to identify that record uniquely.

@marzer:

Parses correctly and yields the values and structure you'd expect. What is rejected with an error is a reversal of the two [table] headers:

[fruit.apple.texture]
smooth = true

[fruit]
apple.color = "red"
apple.taste.sweet = true

If I'm understanding the language-lawyering going on here, this should in fact be accepted because fruit.apple has only been defined, not fully 'created', right @eksortso?

Basically yes – order never matters in TOML, except when arrays of tables are involved.

But note that it's actually the other way around: [fruit.apple.texture] CREATES fruit and fruit.apple, but it doesn't DEFINE them (it only defines fruit.apple.texture, which it also creates).

Hence, [fruit] is free to DEFINE fruit later.

Generally, please let's not confuse things needlessly! Table headers and keys are of course related, but they are NOT THE SAME, and it's essential to keep them apart, or else the TOML spec will be impossible to understand.

Cases in point:

  • Dotted keys were only introduced in TOML 0.5 or so, while table headers with dots in them existed from TOML's first beginnings.
  • Dotted keys are only syntactic sugar. It's possible to express every TOML document without using them. Table names with dots in them, on the other hand, are needed to express documents with more than one level of table nesting. (That's not quite true, of course, since now that we have them, one can usually use dotted keys instead of table names if one really insists on making things unreadable. But once arrays of tables come into play, that's no longer possible.)

Semantically and logically they're both are ways of expressing keys in the conceptual sense. Also according to the dictionary. The spec should be amended to note as such, honestly.

@marzer: I can agree with you when thinking about the parsed data structured expressed by TOML documents – the DOM, so to speak. In the DOM, there are only keys and values, and values are either tables (which contain key-value pairs), arrays (which contain values), and primitives (strings, numbers etc.).

In the DOM, it doesn't make sense to speak about "inline tables" – a table is just a table, after all. But the TOML spec doesn't just speak about the DOM, it also (and primarily) speaks about the syntax level – how these things can be written down in in a long multiline string, otherwise known as TOML document. Here it's essential to distinguish "inline tables" from regular (block-level) tables, since they are expressed in different ways. And they are not just expressed in different ways, they also adhere to different models conceptually – inline tables are closed (it's not possible to define subtables elsewhere), while block-level tables are open.

Mere users of parsed TOML data structures don't have to care about these differences, but those writing TOML documents (likely by hand) and those implementing TOML parsers need to keep them in mind. And it's not possible to explain them without speaking about "inline tables", even if at the pure DOM level there is no such thing as an inline table.

The same applies to keys (whether dotted or not), which are written to the left of an equals sign, versus table headers, which are enclosed in brackets. In the DOM, both are identical, but at the syntax level, they differ, and the conceptual models to which they need to adhere differ as well. A table defined in "dotted key" form is less open than one defined using its own table header (it's no longer possible to define a parent table elsewhere if that parent's name already appeared to a left of a dot), but less closed than an inline table (it's still possible to define subtables elsewhere). This is a real conceptual difference, and it's not going away.

So the TOML spec cannot cease talking about table headers and keys as they were different – because they are, syntactically and conceptually, even if not in the DOM.

Sorry for my long absence. I didn't want to weigh in on the keys thing, although my approach is similar to @ChristianSi's take.

Any TOML table has a singular namespace, matching the ubiquitous so-called mapping types in various languages. But keys for KVPs and subtable names do operate under different rules depending on the syntax used. Arrays of tables defined with the[[double-bracket]] syntax have effectively been treated the same way as subtables, though their names mean different things in two different contexts. As much as we can mold them into a table-like state by the syntax, they're still array types once they're parsed.

So the one namespace has three separate partitions at the syntax level. Conceptually, I'm fine with that, but it does lead us in the community to play loose with our words. I'm not comfortable calling table names "keys," per @marzer, even though that's exactly what they become eventually.

And we must keep this partitioning in play in order to discuss certain concepts. I'm interested in introducing a way to sequence subtables, much like @brunoborges' original sequence-of-tables concept before we had to pare it down. At heart this involves preserving the order in which subtables are defined within the document, and the parent table either has this property or it doesn't.

But I digress; I'll bring all that up again someday.

I still need to address @Validark's concern about the language of the spec and how it pertains to defining parent tables created elsewhere. And to be honest, I think this long-closed thread is no longer the right place to do that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Suhoy95 picture Suhoy95  Â·  4Comments

jakwings picture jakwings  Â·  3Comments

tamasfe picture tamasfe  Â·  3Comments

emilmelnikov picture emilmelnikov  Â·  4Comments

genericptr picture genericptr  Â·  4Comments