Toml: Allow "[]" to return to global scope (for adding sibling array after a table)

Created on 22 Mar 2017  ·  52Comments  ·  Source: toml-lang/toml

I'm trying to model this Rust struct in TOML:

#[derive(Debug, Clone, PartialEq, Deserialize)]
pub struct WhitelistConfig<T: DeviceIdentifier> {
    appliance: Appliance,
    rules: Vec<Rule>,

    #[serde(default)]
    ignore: PhantomData<T>
}

When writing it out in TOML, I believe the natural way to read/write it would be:

  1. Provide information about the appliance this config file connects to
  2. Provide the list of rules (which could be long)

That would look something like...

[appliance]
host = "teddriggs-dev"
api_key = "MY-API-KEY"

# this isn't valid right now; it will consider the field to be part of `appliance`
rules = [
    { tag = "whitelist" },
    { group_id = 1 }
]

One approach I thought of that could address this would be allowing [] to appear once to return to global scope after entering a table. It would be invalid to use [] if the document started with key-value pairs, avoiding the splitting of a table definition.

This might be too narrow a case, but I think the pattern of 'metadata + a list' seems likely to occur somewhat frequently. This may also dovetail with #309; shorter syntax for arrays-of-tables would make me less concerned about teaching users the [[rules]] construct.

Most helpful comment

Yes, there are the trade offs. The downside of not being able to arbitrarily inject things into the global scope is that you can't order the non-table keys in the global scope in an arbitrary order. I think we can live with that downside. I think the cost of not being able to specify an arbitrary order is quite a bit smaller than the cost of making removing a very simple property of TOML: it's always obvious where a key is defined.

All 52 comments

  1. Empty brackets naturally correspond to an empty key. Special treatment of empty brackets increases overall language complexity, a tiny step from being obvious towards being cryptic.
  2. In general, random switching between tables and/or global scope makes configuration file less friendly to a human reader.
  3. This looks much less json-ish and much more ini-sh:
    [appliance]
    host = "teddriggs-dev"
    api_key = "MY-API-KEY"

    [[rules]]
    tag = "whitelist"

    [[rules]]
    group_id = 1

Isn't empty quotes correspond to an empty key, and empty brackets disallowed?

@guai, empty quotes correspond to an empty _quoted_ key. Quoted things are allowed inside brackets. So i believe that this is valid today:

[""]
foo = "bar"

I see

[]     # INVALID

in readme. So empty brackets already a special case, but it just disallowed and have no special meaning. I think @TedDriggs right, it is a good candidate for marking return to toplevel
We can define child map before parent unless parent already filled with values and kinda 'closed'. This allowed:

[a.b]
c = 1
[a]
d = 2

If we widen this rule on toplevel we'll get something like this:

[b]
c = 1
[]
d = 2

I think this is quite intuitive and will allow to remove this special case handling in implementation code. Which is always a good thing

I see

[]     # INVALID

in readme. So empty brackets already a special case

Not special, this case is exactly the same:

= "no key name"  # INVALID

If you allow

[]

than what is your intuition for

[a..b]

and for

[a.]

?

Empty strings maybe? I don't insist this to be allowed cause it much less usefull.
I'm writing my first toml doc and I already hit this strangeness that I can write child tables before parent's key-value pairs but not on top level. Why?

Consider Tom`s comment in #419:

You can use a dot-separated key-path to unambiguously denote a value:

[a.b.c]
key = "val"

a.b.c.key #=> "val"

In general:

path = table-name + "." + key-name

In particular, for empty brackets:

[]
key = "val"
.key #=> "val"

To get "key" instead of ".key", you have to introduce a special logic.

I can write child tables before parent's key-value pairs but not on top level. Why?

Because there is no natural way to return to a global scope.

Top level in toml is special in many ways. For instance, top level itself is doomed to be a table, while any value below is allowed to be an array, a table, or an immediate string/number/etc. In contrast, top-level "JSON value can be an object, array, number, string, true, false, or null".

So we have a complexity trade-off here. Introduce special treatment of empty brackets for sake of making top-level less special in one particular aspect: "I can write child tables before parent's key-value pairs but not on top level".

.key for an immediate child of top level map is wrong
then for the first example we would have .a.b.c.key
there already is this logic somewhere
btw why should I even care about implementation? I concerned more about how easy it would it be to use toml

And its not path = table-name + "." + key-name
its path = (parent.pathList + [key]).joinWithDot()
where for empty parent.pathList we'll get just [key]

@guai You should care about the parser's implementation because, from the spec: "TOML should be easy to parse into data structures in a wide variety of languages"

But in that case, making [] return to global scope should not be too difficult. I don't know how every Toml parser is made, but mine already treats [] as a very special case and throws an error, so it could instead treat it as something valid and merge the table's content under it with the root table 👍

I'm not for changing the meaning of things like [a..b] or [.] or .key : keep them invalid.

@TheElectronWill, it also states, that it is obvious. For me, its obvious, that applying the same rule everywhere is simpler and better.
The rule is: brackets matched - change current path, then key-value pairs matched - add them at current path. Unless current map already closed, which happens when we switch from the map with at least one primitive inside to another path. Empty path means root
I'm not the fan of these guys [a..b] or [.] or .key too.

I'm not for changing the meaning of things like [a..b] or [.] or .key : keep them invalid.

100% agree; the original proposal deliberately was limited to the "return-to-global" behavior for that reason.

I'm all for improving the representation for top-level arrays... the repeating [[arrayname]] syntax is definitely less ergonomic than most other elements of the spec. That being said, I don't think that this proposition is the way to go. I prefer toml files specifically because the language is very intuitive. I can throw a toml file at a newbie (or non) programmer, and they can immediately comprehend it.

I don't think that [ ] is expressive or intuitive enough given that human readability is the primary selling point of the toml spec (at least for me). Declarations like [tablename] and [table.sub-table] are immediately obvious because they simultaneously reflect structure and meaning, even to the uninitiated. I use toml files extensively, and I would never guess that [ ] meant "return to the top level namespace and treat the next key as an array instead".

@fspm, the proposition is not "return to the top level namespace and treat the next key as an array instead" but "return to toplevel and treat the next key-value as immediate primitive in toplevel map". We can do it for maps at levels > 0. For levels > 0 we can first describe child maps/arrays and then immediate primitives, but not on top level, where we must declare primitives first right now.
Your comment lead me to the thought that we could use [[]] to describe toplevel array. Which makes toml more close to json/yaml

I'm not a fan of the "return to global" idea at all. A nice property of TOML is that it's very obvious which table a key belongs to based on where it's defined. If you want global keys, then you know to always look at the top of the file.

Things like [] or [[]] are completely opaque.

you know to always look at the top of the file

@BurntSushi, but that prevent expressing some domain-specific ordering. Like important things on top, less important - lower. And most important things not required to be toplevel

@guai Then don't use the global namespace for non-table keys.

@BurntSushi, its acceptable when I design the data structure from scratch, but not when I need to express somethins existing in toml

Yes, there are the trade offs. The downside of not being able to arbitrarily inject things into the global scope is that you can't order the non-table keys in the global scope in an arbitrary order. I think we can live with that downside. I think the cost of not being able to specify an arbitrary order is quite a bit smaller than the cost of making removing a very simple property of TOML: it's always obvious where a key is defined.

@BurntSushi, it makes toml less usefull for me. I'm trying to migrate xml configuration files of an existing
large project to toml.
I do not want to change pathes all over the code and I do not want any reordering, cause people got used to current order. This order have some meaning itself.
Wasn't this supposed to be the main reason, why toml even exist - to have simple and nonambiguous format for configs?
The possibility of drop-in replacement would make it much easier to adapt

@BurntSushi, what trade-off are you talking about? I see no gains here, only losses:
less symmetry, more exceptions, more complicated rules, less expressiveness.
The only kinda gain is "we should look both before and after child sections everywhere but toplevel, on toplevel we only should look on top". Not much of a gain imho

I don't understand. Please use examples.

As [] isn't obvious, I think that if you're in a situation where you need to do this, you could use a table named "global" (or whatever makes sense), and put its content where you need it programmatically.

In the example given by @TedDriggs, a "parameters" table makes sense for me:

[appliance]
# ...

[parameters]
rules = ...

Even if it's less straightforward to map this to the data structure you mentioned, it shouldn't be impossible with a good library.

@TheElectronWill but in that case we'll have to prefix every path all over the toml document

BTW, why "[] isn't obvious"?
What another meaning it can have in this context? I see none. Don't treat users as complete idiots.

@BurntSushi, when I was talking about less symmetry and more exceptions I meant this examples

[a.b]
c = 1
[a]
d = 2
[b]
c = 1
[]
d = 2

I can add prefix to nest some section into another one, and I can remove this prefix. Except when its on top level. In this case I need to reorder/reorganize/refactor, change code etc.
One can first define children and then primitives except toplevel.
We can have arrays except toplevel one.
And so on.
And to mention expressiveness: it turns out I cannot express any data stricture just as I want. I have no toplevel array. I can't use the order I want.
I need to take this all into account.
Whereas in json or yaml I can. Toml could be a better and more readable drop-in replacement for this heavy(yaml) and less readable(json) formats in case if we give the green light to this issue.
And then one can decide, if he need this feature in his own document

And if you guys afraid that users won't get it, then we can use comments

[a.b]
c=1
[] # toplevel primitive start here
foo = "bar"

I think in most cases users don't write toml, they change existing one or copy-paste it and then just change values they know the meaning of

And to mention expressiveness: it turns out I cannot express any data stricture just as I want. I have no toplevel array. I can't use the order I want.
Toml could be a better and more readable drop-in replacement for .... json ... in case if we give the green light to this issue.

Toml is less expressive than json in many more ways. Toml imposes restrictions on type of items in array. Toml has no _null_. Things like _nested arrays of tables_ are hard to read.

Toml could be a better and more readable drop-in replacement for this heavy(yaml) and less readable(json) formats in case if we give the green light to this issue.

TOML isn't a replacement for JSON or YAML. It's a configuration file format.

It's a configuration file format.

And I could not just convert my existing xml configuration files into toml without extra efford
Not a good property for a format, made espesially for this purpose.

I think the biggest problem here is that this proposal would violate the "write once" aspect of tables. The root level is a table. It can be written to at the beginning of each TOML document. If we also allow writing to it via [] then we are allowing the re-opening of an existing table. Unless we only allow writing to [] if there are no global entries at the top of the file, which seems fragile and complex.

@mojombo, I propose to use the same rules on any level.
Child map can be introduced if map at current path have no primitives yet
We still must group primitives together, but the restriction that primitives on toplevel must appear before any child goes away.

@guai:

We still must group primitives together, but the restriction that primitives on toplevel must appear before any child goes away.

So, if a [] follows anywhere in a long file, I'm NOT allowed to write global keys at the top of the file (they must go below the []), but if it doesn't, I am? This is somewhat logical, but it would also be utterly confusing and counter-intuitive. People would get surprising parse errors ("Global table re-opened") that appear much farther down below in the file than the place they edited (the top) and would accordingly be hard to understand and resolve. No, that's not gonna fly.

It also contradicts the original proposal:

One approach I thought of that could address this would be allowing [] to appear once to return to global scope after entering a table. [emphasis added]

But the original proposal won't fly either, for the reason stated by @mojombo: It would make the top-level the only one whose entries could be spread over two places instead of being all grouped together.

This issue has been discussed long enough and it's clearly going nowhere, I think. Time to close it?

@ChristianSi, then why is it ok to get "table re-opened" error on levels > 0?

As @guai pointed out, you already get errors when you reopen a table, which could be defined far away from the user edit.

The meaning of [] isn't obvious, but I realized that the syntax of the array of tables... isn't more obvious. At least we know that [] probably declares a table.

Allowing [] to open - but not to reopen - the global scope makes much more sense than the original proposal, and is quite simple for the parser. It allows flexibility, at the cost of introducing a new syntax (actually it's more a special case of the table syntax). Therefore I'm not against it.

@guai:

@ChristianSi, then why is it ok to get "table re-opened" error on levels > 0?

Well, if you add [tablename], disregarding the fact that [tablename] was already defined elsewhere, you shouldn't be totally surprised if you get an error. But if you just add a key-value pair (at the top), either not knowing that there is [] elsewhere or not knowing what it means – that's a quite different matter and shouldn't cause an error.

So defining a [] syntax with such a "spooky action-at-a-distance" effect (forbidding key-value pairs before the first explicitly named table) violates the principle of least surprise and hence shouldn't make it into the spec.

@ChristianSi, this have nothing in common with principle of least surprise. Principle of least surprise means that if user can be already familiar with something with this syntax, we should not change its meaning.
What experience would be reused here?
None
Its impossible to avoid any learning for everyone.
Why even one assume, that he can create new pairs at the top if he know nothing about toml?
If user does not know what [] mean he will learn it from the first error message stating that key-value pairs should be grouped together
If someone does not know toml well then I think he will just edit existing examples and nothing more
And I think user will place new pair along with other pairs he knows the meaning of anyway

What experience would be reused here?
None

The experience of reading most documents ever written: you don't get to "re-open" a section after a new section has started. No one expects to read something like this:


An article about configuration files

Introduction

blah blah blah...

Methods and Data

blah blah blah ..

Introduction

eh?
....


@sgarciac, I never suggested to make sections reopenable, I suggested to make toplevel movable.
Where did the assumption that toplevel is always "the most useful information" came from? In my case toplevel is uncategorized crap which I want to be at the bottom.
And since I want to adapt toml into existing large project, I do not want to add prefix to pathes all over the code. But I do want toml to be an easy drop-in replacement for any configuration library.

@guai:

In my case toplevel is uncategorized crap which I want to be at the bottom.
And since I want to adapt toml into existing large project, I do not want to add prefix to pathes all over the code.

You don't have to add a prefix to every path. Just put your "uncategorized crap" into a table called "General", "Unsorted", "Crap", or whatever, and leave everything else unchanged. You don't even have to change any of your data structures to do so, it's just a tiny change in the way they are mapped to and from TOML.

And should you answer "I cannot do that", I will reply with: "Of course you can!"

@ChristianSi, good idea, but it is still a workaround
You must first find out somewhere what substitution path was hardcoded as a replacement for toplevel for that exact file.
Where will you find it out?
People say that [] is not obvious, but at least it can easily be taught from the manual or from error message.
Not quite obvious feature vs complete lack of information.

To summarize I would say the different standpoints are that:

  • Developers using the format should have it easy to do that
    vs
  • Users using the format should have it easy to do that

I understand that people would like to do as little changes as possible to their codebase to support sth.
This particular change unfortunately seems to have no compromise to not strictly violate the user-orientated nature of TOML.

So why is this still open?

Indeed. This can be closed.

Indeed. This can be closed.

Just another task about this to appear. This is fourth one already, I believe.

@guai Sorry but I can't understand what you have typed.

@pradyunsg, I mean once this task will be closed another one will apper about that same thing. Because that is what some people (me included) see as most obvious.

If we had a wiki here, we could add a FAQ section.

But actually that is a general problem with suggestions, they tend to be redundant.
Leaving it open to make it more obvious to people is not the answer.
(A real fix should probably done on GitHub side)

@guai Oh, okay. That makes sense.

I guess it would be a good idea to link to the existing ones from here?

@TedDriggs @guai Thanks for your impassioned thoughts on this matter, these discussions are always helpful in thinking through these edge cases. For closure on this, I think the proposal is too non-obvious to be included. In TOML, root keys come first, and thus carry some significance because of that. In addition, TOML need not be a perfect replacement for existing config formats. I know it's inconvenient to have to rework some code to incorporate a switch to TOML, but maybe it could be a good opportunity to rethink the config layout. Is it the best idea to have a bunch of "crap" settings uncategorized? Could they fit elsewhere? At the very least, you can communicate their non-importance by bundling them into a [misc] table.

Regarding questions that keep coming up, I'm 100% ok with that. If people constantly bring things up, it means there is an opportunity to clarify the spec or reconsider past decisions based on new evidence.

Was this page helpful?
0 / 5 - 0 ratings