Toml: Declaring, Creating, and Defining Tables

Created on 16 Dec 2020 · 20Comments · Source: toml-lang/toml

Regarding #769 and #788, I had started work on the notions of declaring, creating, and defining tables as separate concepts, in order to add some clarity to how and why tables are made with TOML's syntax. I'd like to finish this up by the end of this week, but its worth hashing out what these concepts mean, so we have a common vocabulary for these things.

_EDIT: Since the original version, the declaration of a table is identical to the creation of the table. See the original version for the details._

Creating a table only happens once. This instantiates the table, though it doesn't necessarily define it yet.

The root table, although nameless, is created before the first line of the document.
Standard table syntax creates a table for each of its names (and nests them appropriately), except for names that already refer to a table. E.g., [x.y] creates tables for xand x.y, and subsequently[x.y.z.w] creates tables for x.y.z and x.y.z.w.
Dotted keys create parent tables for each name to the left of a dot, if no table already exists for that name.
Inline table values immediately create a table with the opening brace. Such tables may be anonymous.
Double-bracket array-of-tables syntax, e.g. [[tables]], creates a table in an array, which is subsequently referred to by the array's name (e.g. tables) until the next [[tables]] header (or EOF). If the double-bracket syntax has a dotted key, then the rightmost name refer to an array of tables, which is first created if it hasn't been already. In this case, all names to the left of a dot are created as supertables (and nested appropriately) if they haven't been created already.

The definition of a table has a specific start and end, and denotes when key/value pairs can be created within it.

The root table is defined up until the first header (or the EOF) is encountered.
All other tables are defined using exactly one type of syntax, as follows.
Standard tables are defined starting with their header and, just like the root table, ending at the next header or EOF. This defines the table with the rightmost name.
Inline tables are fully defined between their braces, as are dotted-key subtables defined within.
Dotted-key subtables outside inline tables are defined starting with their first key/value pair and ending with the root or standard table that they are created in.

Inline tables have an additional restriction: No additional subtables or arrays of tables may be defined within the inline table after the ending brace. This forces inline tables to be entirely self-contained.

Also, an element of an array of tables can no longer be referenced once another double-bracket header creates a new table element in the array.

So I need to take all of this, which in one way or another we've discussed in depth in other issues, and apply it succinctly to the current toml.md.

Any thoughts? Anything I missed?

Source

eksortso

Most helpful comment

@komkom

If I get your reasoning correctly then on line 1 a table 'x.y' gets declared but not created.

In that example [x.y] will be created if it hadn't been already.

The following toml should be illegal since it reuses the key 'arr.table' right?

That toml is legal. The key on the last line, fully resolved, is equivalent to arr[1].arr.table, not just arr.table. In JSON:

{
  "arr": [ {
      "table": {}
    },
    {
      "arr": {
        "table": 1
      }
    }
  ]
}

(generated using https://toml-parser.com/)

I'm confused by this issue more generally. Personally I don't think there's any meaningful difference between 'declaring' and 'creating' things in TOML, and there's no need to draw a distinction. Surely the solution to this is to just clarify things for the specific person who submitted #769 and then close it? Maybe there's a few opportunities for slight wording tweaks and/or better examples in the spec, but that's the extent of it IMO. Adding compsci-esque wording and definitions is only going to confuse more people than it will help.

marzer on 18 Dec 2020

👍3

All 20 comments

Hello,
This is my first post so please bear with me.
Maybe you can easily clarify my misunderstanding.

So eg. consider the following toml

[x.y.z]

If I get your reasoning correctly then on line 1 a table 'x.y' gets declared but not created.
This is the json representation for the above doc.

{"x":{ "y":{ "z":{}}}}

and here the path x.y needs to exist. This sounds to me like a contradiction since there must
have been something created.
What do you think?

komkom on 18 Dec 2020

And according to this statement
"Once a name is declared to be a table, it cannot be used for any other purpose. If a declared name is used as a key and not as a table name, then an error is thrown"
The following toml should be illegal since it reuses the key 'arr.table' right?

[[arr]]
[arr.table]
[[arr]]
arr.table=1

komkom on 18 Dec 2020

@komkom

If I get your reasoning correctly then on line 1 a table 'x.y' gets declared but not created.

In that example [x.y] will be created if it hadn't been already.

The following toml should be illegal since it reuses the key 'arr.table' right?

That toml is legal. The key on the last line, fully resolved, is equivalent to arr[1].arr.table, not just arr.table. In JSON:

{
  "arr": [ {
      "table": {}
    },
    {
      "arr": {
        "table": 1
      }
    }
  ]
}

(generated using https://toml-parser.com/)

marzer on 18 Dec 2020

👍3

I also don't get why we're talking about three different things here – "Declaring, Creating, and Defining" – instead of just two (Creating and Defining, say). When I introduced the distinction, I only talked about CREATE and DEFINE. I believe that's sufficient, there is no need for a third category. My CREATE seems to have mutated to "Declaring" in this proposal. I don't care too much about the terminology, though I would say that "create and define" are easier to keep separate than "declare and define", so personally I would stick with my old terminology.

In any case, I don't see the need for what @eksortso calls "Creating" here – I may be wrong, but it seems to me we can drop this category without any terrible loss.

ChristianSi on 18 Dec 2020

To me the concept still is confusing eg. the following toml

[x.y.z]
[x.y.v]
[x.y]
[x.y.m]

line 1: a table 'x.y' gets created
line 2: a table 'x.y' gets created
line 3: a table 'x.y' gets defined
line 4: a table 'x.y' gets created

Whats is conceptionally happening here IMO is

line 1: a table 'x.y' gets implicitly declared, since it does not exist at this point.
line 2: a table 'x.y' is used but nothing is happening here since it is already declared.
line 3: a table 'x.y' gets explicitly declared
line 4: a table 'x.y' is used but nothing is happening since it is already declared.

So what do you think about the terminology
(IMPLICIT) TABLE DECLARATION

IMPLICIT TABLE DECLARATION happens on the first use of a dotted key within a scope.

TABLE DECLARATION is unique within its scope and is a table statement.

The scope is either the root table or an array.

komkom on 18 Dec 2020

🎉1

See, talking about it in terms of implicit/explicit makes sense to me, and it's terminology I used myself in another discussion, but apparently it was confusing ¯\_(ツ)_/¯

I don't believe it is confusing at all, though. It's a good mental model for explaining how TOML works. That you just independently came up with the same terminology suggests as much.

marzer on 18 Dec 2020

"since it does not exist at this point"

I _do_ disagree with the use of "exist" here, though. [x.y] does exist after you create [x.y.z]; how could it make sense to suggest otherwise? x contains y which contains z; if z (x.y.z) exists, the containing tables must too. Consider:

{
  "x": {
    "y": {
      "z": {}
    }
  }
}

marzer on 18 Dec 2020

👍1

My viewpoint:

[x.y.z]  # x and x.y created; x.y.z created and defined
[x.y.v]  # x.y.v created and defined
[x.y]    # x.y defined
[x.y.m]  # x.y.m created and defined

ChristianSi on 18 Dec 2020

❤1 👍1

The computer that I was drafting my PR on is now dead, and I've lost the modifications that I made that compelled me to make a three-part distinction in the life cycle of TOML tables in general. I feel as if there was good reason to separate declaration from creation (and super-tables can be defined out of order with standard syntax then linked up into a proper tree structure later on). But you'll need to give me a few minutes to recollect my thoughts before I respond. I'm glad the conversation is flowing.

eksortso on 19 Dec 2020

👀1

@eksortso oh no! - seems like this issue is not only mind-boggling ...
I was wondering if we are missing the bigger picture here. Can someone explain to me why it is a good idea to allow defining a table after it has been created ?

komkom on 19 Dec 2020

@komkom TOML allows you to skip defining parent tables if all that your configuration needs is a deeply nested subtable. You can write [w.x.y.z], and you're already 5 levels deep. It's a feature of the language.

Now let's say that you're configuring a software architecture more than once, for multiple systems, and all the settings that you'd need to change lie deep within the standard template's structure. You'd want to put the subtables that you change the most near the top of the template. And TOML lets you do that, without messing up all the parent tables' config settings.

That's just a few naive examples to show why TOML allows subtables to be defined out of order.

eksortso on 19 Dec 2020

👍1

@eksortso the example you make makes sense to me and its not on me to question this language feature. But then going back to the problem, what about thinking of a table statement more of as a unique definition in its scope (root table or array).
The creation happens whenever the table gets first encountered when parsing the document.
This happens either on a dotted key or on a table statement.

komkom on 19 Dec 2020

👍1

After some review, I now agree with @marzer and @ChristianSi that there are only two concepts, not three, in play when tables are constructed. With all due respect to @marzer and to @Validark from #769, I prefer the simpler one-word names "create" and "define" for tables. I may still use the word "declare" for the _names_ of tables, but creating a table and declaring its name are conceptually the same thing.

The biggest change is this: when first referenced, supertables get created if they don't already exist. But they aren't necessarily defined when they're created.

I'll update my first post to reflect this change of mind. I'm not able to post a PR yet, but I will soon.

eksortso on 24 Dec 2020

👍2

@komkom I must confess, the concept of "scope" makes no sense to me. There is only one root table, and all tables that are created are subtables at some depth within the root table. Even within the context presented within a single table section, all keys exist either within the table (for keys with single names) or within subtables of the table (for dotted keys or inline tables). So as far as TOML is concerned, all tables are created within a single root table.

Whether or not those TOML tables are physically created at the time that they are created in TOML is just an implementation detail. In the end, the resulting structure must be nested the same way that the tables in the TOML are nested.

eksortso on 24 Dec 2020

What I mean is that the table declarations only need to be unique within their respective scope (root table or array).
eg this example

[[arr]]
[arr.table]
[[arr]]
[arr.table]

is a valid toml and redeclaration of the table 'arr.table' is ok because they are in different array scopes. I think this is an important concept which needs to be explained somewhere (maybe it already is ?).

komkom on 24 Dec 2020

👍1

@komkom You're right, though at this point I think that that usage of table array names for parent tables is just implied in the spec. With the PR (which will be out in a few hours), I intend to make that explicit. The dual nature of array-of-tables names (they name the array, and they refer to the most recently created table on that array) will be clarified.

eksortso on 24 Dec 2020

Let me point everyone at this confusing paragraph in the Array of Tables section of the spec.

You can create nested arrays of tables as well. Just use the same double bracket
syntax on sub-tables. In nested arrays of tables, each double-bracketed
sub-table will belong to the most recently defined table element. Normal
sub-tables (not arrays) likewise belong to the most recently defined table
element.

I believe that this puts the cart before the horse. Hey you say you can nest an array of tables inside an array of tables (specifically inside one of the table array elements), but then you follow up by saying that subtables belong to the most recent table element. Isn't that the more important point to be made?

We need to start with this: _Any reference to an array of tables always refers to the most recently defined table element of the array._ Only then does the arrays-in-arrays text make complete sense.

eksortso on 24 Dec 2020

👍1

The clarification PR, #797, is now up for review. There are a lot of changes to go over, but they are consistent with what has been discussed here and elsewhere. I invite you to pick it apart.

eksortso on 24 Dec 2020

It seems to me this is largely added complexity for complexity's sake. Look at the information about Inline tables here:

"Inline table values immediately create a table with the opening brace. Such tables may be anonymous."
"Inline tables are fully defined between their braces, as are dotted-key subtables defined within."
"Inline tables have an additional restriction: No additional subtables or arrays of tables may be defined within the inline table after the ending brace. This forces inline tables to be entirely self-contained."

Think about how each of these statements apply to an inline table like so:

vec = { x = 1, y = 2 }

Most of the meat of the above quotations could be summed up with, "inline tables are key-value pairs enclosed by curly braces". That establishes a naming convention, syntax, and implies they are self-contained without opening up any questions like "What's an anonymous table?". Also, "dotted-key subtable"'s don't need special mentioning. It is implicitly understood if you already give an example of an in-line table allowing dotted keys and inline tables being one of the valid values which can appear in key-value pairs.

A lot of what you are saying goes without saying. Ask yourself whether you are meaningfully adding anything before adding anything. Just write a few simple statements to clarify only where the current spec is unclear. There is no need to add fluff in places where the spec is already quite clear. You don't get any points for a higher word count, you only increase the cognitive load unnecessarily and decrease the number of prospective readers.

Validark on 25 Dec 2020

😕2

First, @Validark, you're arguing over an analysis, not over the spec, or the PR. I invited you to pick apart #797, and maybe you already have.

It seems to me this is largely added complexity for complexity's sake.

I'm not adding the entirely of this to the spec. I'm trying to summarize and clarify in this issue (which would be a "discussion" if the GitHub app could handle discussions; but I wanted this done with the tools at my disposal). The spec wording will hold these concepts in concentrated form.

Look at the information about Inline tables here:

"Inline table values immediately create a table with the opening brace. Such tables may be anonymous."

Definition of "create" for online tables.

"Inline tables are fully defined between their braces, as are dotted-key subtables defined within."

Definition of" define" for online tables. Dotted keys get a special mention because...

"Inline tables have an additional restriction: No additional subtables or arrays of tables may be defined within the inline table after the ending brace. This forces inline tables to be entirely self-contained."

Inline tables are the only syntax in TOML with this restriction. It belongs in the summary.

Think about how each of these statements apply to an inline table like so:
vec = { x = 1, y = 2 }

vec is a self-contained table. Done.

Most of the meat of the above quotations could be summed up with, "inline tables are key-value pairs enclosed by curly braces". That establishes a naming convention, syntax, and implies they are self-contained

The naming convention comes from keys. That's already in the spec. I'm not discussing naming conventions.
Inline tables can be anonymous, inside arrays. The spec implies that but I included it in the summary because, like with the root table, it's unusual but not surprising. That's how it'll stay.
It is implied nowhere that online tables are self-contained. Standard table syntax would otherwise be able to make subtables inside the inline table. This is _expressly forbidden_ in the spec now and is summarized here.

without opening up any questions like "What's an anonymous table?".

Opened, and shut.

Also, "dotted-key subtable"'s don't need special mentioning. It is implicitly understood if you already give an example of an in-line table allowing dotted keys and inline tables being one of the valid values which can appear in key-value pairs.

The scope of a dotted key's definition ends with its container. I was making that clear here. And examples don't mean anything without specification to imbue them with meaning. We've stripped examples down before to prevent unnecessary confusion and reinforce what was already stated.