Toml: Version pragma

Created on 6 Feb 2018  路  17Comments  路  Source: toml-lang/toml

Back in my Erlang days, when I was working on BERT-RPC, I gave a talk about it at the Erlang Conference in Stockholm. Joe Armstrong was there, and after the speech, during questions, he spoke up and said "It appears you've made the number one biggest mistake in protocol design: not having a way to specify the protocol version number in use." Well, in my defense, almost nobody adopted BERT-RPC and therefore a version number was irrelevant. SO THERE, JOE.

But TOML has already enjoyed more success than my esoteric binary format, and I expect that versions beyond 1.0 will exist. In that future universe, TOML documents of several disparate versions will be laying around, and I must ask the question: do we need an official way to specify what version a TOML document intends to parse it?

I could, indeed, argue that no, it's not necessary, because a TOML document expects to be configuration, and a configuration file will be consumed by an application, and that application will specify and know what version of TOML it is parsing. However, I could then argue that applications also evolve over time, and perhaps a program would like to accept either TOML v1 or v2 or v3, etc, in which case it would rather rely on the document specifying a version, rather than having to use some feature detection heuristics, or coming up with its own application-specific way to document the version in use.

So perhaps we solve this preemptively in v1.0.0 so that future versions may rest easy knowing that a solution exists for when the time comes that v1.1 or v2.0 is released. Maybe it looks like this:

# TOML v1.0

A few questions for consideration:

  1. Optional or required?
  2. Top of file, bottom, anywhere?
  3. What does the v specification mean exactly? Does 1.0 mean any 1.0.x version? Do we need to use a ~ operator and get fancy like npm dependencies?

I have a few thoughts on these matters that I will elaborate, but I'm sitting on a 777 about to blast off and will be sans-internet for the next 11 hours. In the meantime, I'd love any thoughts on the matter or prior art that you've seen solve this problem elegantly.

new-syntax

Most helpful comment

I'm personally not a huge fan of schema numbers in human readable formats, but I shall explain! Ideally the format is flexible enough that no breaking change is ever needed, but assuming that this does happen one day then all of a sudden a "hello world" TOML document goes from:

hello = 'world!'

to

# TOML v2.0
hello = 'world!'

(or w/e the version marker is).

This to me is the reason why in human readable/writable formats (like TOML) the version becomes cumbersome. The format is in general designed for ergonomics of reading/writing but requiring you to annotate something would be a step backwards (at least in my mind). Although the schema may not be required today it would need to be required tomorrow (I think?) to opt-in to the breaking changes queued up.

I think in other words what I'm getting at is that TOML 1.0 (today) can be generally thought of as "optimized for ergonomics" but TOML 2.0 (tomorrow) would then be "optimized for ergonomics except for that schema declaration".

Now all that being said I do definitely agree that protocol numbers make total sense when humans aren't involved (binary protocols, etc), but I'm mainly thinking about this purely from a "human readable/writable" format.

In any case that's of course just my own thoughts, in Rust/Cargo we'll happily be following the TOML spec regardless!

All 17 comments

In theory this sounds fine, but I don't think we can get away with making it required at this point since it would break all extant uses of TOML. Or am I misunderstanding something?

We could declare that it is optional, but necessary for any future major version bump of TOML. That is, the absence of a version tag forever and always means TOML v1, while the presence of a version tag could indicate TOML v1, v2, etc.

(If possible, it would be nice to stick to v1, v2, v3, ..., and avoid a versioning scheme more complicated than that.)

I'm personally not a huge fan of schema numbers in human readable formats, but I shall explain! Ideally the format is flexible enough that no breaking change is ever needed, but assuming that this does happen one day then all of a sudden a "hello world" TOML document goes from:

hello = 'world!'

to

# TOML v2.0
hello = 'world!'

(or w/e the version marker is).

This to me is the reason why in human readable/writable formats (like TOML) the version becomes cumbersome. The format is in general designed for ergonomics of reading/writing but requiring you to annotate something would be a step backwards (at least in my mind). Although the schema may not be required today it would need to be required tomorrow (I think?) to opt-in to the breaking changes queued up.

I think in other words what I'm getting at is that TOML 1.0 (today) can be generally thought of as "optimized for ergonomics" but TOML 2.0 (tomorrow) would then be "optimized for ergonomics except for that schema declaration".

Now all that being said I do definitely agree that protocol numbers make total sense when humans aren't involved (binary protocols, etc), but I'm mainly thinking about this purely from a "human readable/writable" format.

In any case that's of course just my own thoughts, in Rust/Cargo we'll happily be following the TOML spec regardless!

I'm not completely convinced that this is a good addition. It's basically the same position as @alexcrichton's.

Optional or required?

Optional sounds nicer -- if we add some language saying that parsers should fail if there's a mismatch and holler if it's missing (but continue).

I'd like have it be not be required for the reasons @alexcrichton mentions -- it makes sense more for binary, non-human formats than human ones. Just want to add that a compulsory schema goes against trying to keep the language minimal.

Top of file, bottom, anywhere?

First line only.

Does 1.0 mean any 1.0.x version?

Yep.

Do we need to use a ~ operator and get fancy like npm dependencies?

No.

I also am -1 for adding any kind of version pragma.

The premise of a version pragma is that every tool is designed to understand a particular version of the specification, will check for the presence of the version pragma and gracefully refuse to process a file that uses a newer revision than it understands.

Application authors choose TOML because it's an obvious, minimal language, but application users don't choose TOML, they're just stuck with the format that the application's author picked. Ad-hoc tools built around these applications may use a specific TOML parser, or (if a given application's config file is simple enough) they might re-purpose an INI-file parser, or hack together an awk-and-sed script to pull out the information they want. These ad-hoc tools don't target TOML, they target a specific application's specific use of TOML, and the people who write them may never even see the TOML spec, let alone read its instructions about version pragmas.

Even for somebody who's trying to implement the TOML specification, implementing a version pragma is going to be far down the list of priorities, since there's only one possible value right now therefore it's safe to blindly assume all TOML files are v1.0 anyway. So I imagine there's going to be a lot of half-finished TOML parsers, especially for niche languages or from people that hate third-party dependencies, that don't get around to implementing the version pragma, regardless of what the specification says.

But let's say somebody writes a library that implements the TOML specification exactly, down to every detail. The library implements version 1.2, but the host application feeds it a file that says "1.3". What should the library do? The TOML specification says "reject", but in practice such a file is probably going to work just fine. Let's say v1.3 added a new data-type like an IPv6 literal鈥攊f the file doesn't contain any IPv6 literals, then an implementation of the v1.2 specification won't have any problems, and if the file does contain an IPv6 literal, it's more helpful to have a specific "I don't understand this value on this line" error than a general "format unsupported" error. Therefore, ignoring the version pragma is the practical thing to do.

The biggest problem I have with version pragmas is that if a file-format has a version pragma, that encourages people to make breaking file-format changes without thinking through the ramifications because "we can just bump the format version". Consider the cautionary tale of HTML, that did have version numbers and even doctypes, and no browser ever enforced them because there was so much existing content that was mislabelled. Likewise, consider TLS 1.3 whose version pragma is set to match TLS 1.2 for compatibility with existing implementations. Or how Windows 95 reported itself as version 3.95 to old applications.

I too am skeptical. My guess it that TOML parsers will tend to ignore the pragma anyway and files will rarely contain it. Also, unless there is a breaking TOML 2.0 release, it won't serve any useful purpose in the first place. (And I'd perform a dance of happiness if even TOML 1.0 is released while I'm still healthy enough to do so!)

One alternative approach to breaking changes can be a different file extension (I'm looking at you, node.js with the suggested .mjs extension)

If the whole point is only to annotate v2, v3, etc., then the file extension might as well change to .toml2, .toml3, etc.

I could, indeed, argue that no, it's not necessary, because a TOML document expects to be configuration, and a configuration file will be consumed by an application, and that application will specify and know what version of TOML it is parsing.

I find this argument pretty convincing

However, I could then argue that applications also evolve over time, and perhaps a program would like to accept either TOML v1 or v2 or v3, etc

Do you think this could be solved sufficiently by changing the file extension in future major versions of TOML?

This is important in my opinion, and make it mandatory. This is how it is done in XML 1.1.
https://www.w3.org/TR/xml11/#sec-prolog-dtd

@ahrvoje : It's worth mentioning that (almost?) nobody uses XML 1.1. And since nobody uses XML 1.1, they have chosen to update the XML 1.0 spec in-place in order to make the (small) changes that really need making. Case in point of being wary of the whole (mandatory) versioning thing.

I think what is more important to define than a version is a general syntax for pragmas.

My own opinion is that all pragmas including version can be implementation defined since toml isn't targeting ipc.

I would be OK with a defined version pragma as long as the default behavior when not specified was left implementation defined so users were never forced to see / write @TOML v2.

Just to add on to what @johannes-scharlach suggested, versioned files (e.g. asdf.toml2) could have their own MIME types too. It might make sense when receiving data from a source without that data having a file extension.

Consider curl -I http://example.com/read/list may return toml serialised data with the http header: Content-Type: application/toml but if that data is serialised with an updated version of toml, whatever is receiving that data might not have a proper way to tell the difference unless the server returns something like the header Content-Type: application/toml2.

Other formats have had a mime update with version updates in the past as well:

Ideally, we'd want multiple versions to be backwards compatible to avoid this issue, but in the case of breaking changes, this might be something we might have to do.

I agree that having a version in a config file is not nice for humans, but the files are not just intended for humans. I think Joe is right.

Instead of a comment I would suggest to use a pragma key (e.g starting with @, like @version=1)
This would allow to extend it with other pragmas (like @schema)
For v1 it could be defined as the default, so files wouldn't require a definition. Also systems could specify the version as part of the expected schema and therefore not require the version.

If TOML becomes successful, it will be used accross system boundaries and those things will be important.

Thus, +1 to allow pragmas (as special key/value pairs) and to reserve one for the version

On the human side I could see how versioning (whether that is in the file itself or by extension) can be beneficial.
Should the spec change, then the user must be able to recognise what spec the file is following and which input is allowed.
I don't think V1 needs it yet though. Only once the spec changes, should this be mandatory, because I do think it should be mandatory in that case.

Time heals all wounds. In this case, the wound of indecision. I think we should avoid pragmas, and here's why: my initial argument suggested that because TOML is a config format, it will be consumed by applications that know what version of TOML they support. I counter-argued that maybe they want to support several versions of TOML over time. Indeed this will probably be true, but the solution to this seems easy. An application that wants to support multiple versions can just TRY to read the file as if it were each of those supported versions and see which one is successful. Maybe they both are, in which case you have a cross-compatible TOML file (as long as we are careful not to change the meaning of existing syntax that would be valid in both). If they're not, and one succeeds, then you know which one to pick. If both fail, then you just guess one of them and spit out some errors. Let the computer do the work, not the user. That is the TOML way.

That is a fair approach. I would, however, reserve the possibility of introducing pragmas later on. Thus all @-words (or select whatever you want as pragma identifier) are reserved and therefore should not be used.

Though (for versioning) this would only be necessary for the first line (or first entry). I don't expect you'd want the version, if any, to be displayed anywhere else in the file.

No current syntax matches a line-opening @ so in this regard, it is already implicitly reserved.

Was this page helpful?
0 / 5 - 0 ratings