Toml: Proposal: TOML Schema

Created on 4 Dec 2020  路  27Comments  路  Source: toml-lang/toml

Hi all,

I've been designing, along with @aalmiray, a grammar for a TOML Schema document. The proposal can be found in this repository: toml-schema.

The main difference between a TOML document and a TOML Schema document is the existence of key-value pairs with built-in values (keywords). This is one of the points I'd like to get feedback from the TOML community.

It is not the goal of this proposal to support namespaces. A TOML document cannot embed multiple schemas under nested namespaces. This would make this really, really hard to implement and support, with little to no benefit.

Feel free to comment here and/or create issues on the toml-schema project.

Thanks,
bb.

Most helpful comment

Forgive the intrusion, but ...

image

All 27 comments

Looks good so far, although I feel that the stated namespace restriction shuts down a promising opportunity before it can be explored. Although I'm inclined to avoid so-called "microformats" as being overly broad in scope, I'll set that objection aside for the time being.

It may be possible to treat a subtable like it's its own TOML hash table, assess a separately specified schema against it, and integrate that assessment into the assessment of the original parent document.

This could be done easily within a TOML file simply by giving a table [subtable] its own toml-schema subtable, like this, which would apply only to that table and its nested descendents. The key names would be localized during subtable schema assessment, but that's the only complication I see.

[subtable.toml-schema]
version = 2
location = "<url>"

What would your objections be to allowing such a sub-schema application?

The default key is a little confusing. How would it be used in practice? By that I mean, when the schema is checked and a key with a default isn't present, then what specific process assigns the default to the key in the resulting configuration? The parser? An active schema validator? The application?

Certainly not a standalone schema validator, because no configuration would be constructed when it runs.

But is default, then, just a fancy comment for what the application should do with a missing key? # assign this if missing?

when the schema is checked and a key with a default isn't present, then what specific process assigns the default to the key in the resulting configuration?

@eksortso my thinking is that the TOML parser should notice the existence of the schema reference, and then validate the document against the schema, and for any missing key, it will check for a default in the schema and grab the value from there, and construct the resulting TOML object with that value.

@eksortso regarding namespaces, what I found to be challenging is the recursiveness of the schema. I'd be happy to support it if someone can bring a solution to the problem.

Forgive the intrusion, but ...

image

@brunoborges Well, in a sense, there's already limited support for multiple schemas: [toml-schema] has its own schema, and it doesn't need to be repeated in a TOML document for it to be checked. But that does hint at an approach that could be applied throughout a document.

Any part of a TOML configuration will have, at most, one schema to rule over it. The schema over that part would be defined by the toml-schema assigned to it or to its nearest parent. A local schema would completely shadow a more global schema, so there would be no threat of recursion. Any toml-schema table or subtable intrinsically has toml-schema as its schema.

The only complication in a plan like this is how we could assign a schema to every table in a table sequence. An array has no way to assign a table to a key with no table of its own. Perhaps the first table in the sequence can have a subtable called sequence-toml-schema, or some better name, to assign a schema to all table elements that don't have their own toml-schema. And such a sequence-toml-schema would use the intrinsic toml-schema.

Perhaps the first table in the sequence can have a subtable called sequence-toml-schema, or some better name, to assign a schema to all table elements that don't have their own toml-schema.

This is what I found to be difficult. The moment extra tables must be added later on in the document to support more metadata, the TOML document starts to lose its appeal.

One idea that did come to mind was a namespace prefix, just like XML/XSD does.

[toml-schema]
version=1
location="url..."

    [toml-schema.cust]
    version=1
    location="url for customer namespace schema"

[title] # this is top-level schema
name="Customers Orders Configuration"

[customers] # part of top-level schema

    [cust:customers.orderSettings] # this one is a customer element
    maxitems=3
    region="North America"
    shipping="UPS"

        [customers.orderSettings.header] # this is still part of the customer namespace as it is a child of a table linked to the 'cust' namespace.
        comment="some comment"

But then, how to reference the customer schema from the top-level, general schema?

Is this not just a slightly-more-fleshed-out duplicate of #629, #76, or #116? On #629 specifically @pradyunsg makes the point that

That's non trivial and TOML won't be gaining such complexity

I realize this proposal is 'better' than #629 in that the schema is itself a separate TOML document, and I appreciate the amount of thought that has gone into it, but TOML is supposed to be simple and human-oriented; I don't buy any claims that a schema would be solving a real problem. If the TOML is so complex that it requires the parser to perform context-aware validation against a nontrivial schema, maybe TOML was the wrong tool for the job to begin with.

Also consider that by adding a 'url' you're implying the TOML parser needs to either have network awareness built-in (at the very least the ability to do a basic HTTP fetch), _or_ require applications to implement that themselves via callbacks. Neither are great options, and are likely to be impossible in many contexts. One of TOML's selling points is it's minimalism, both in syntax, _and_ thus the subsequent implementation. The requirements for implementing URL fetching will be a complexity/bloat bridge-too-far for many implementations.

@marzer Here are a few points:

  1. The proposal is not suggesting that a schema must always be present. Just like XMLs can be done without a XSD nor even a DTD file. TOML can and should remain simple and human-oriented to a point that if someone chooses not to use a TOML Schema, they can do so. The proposal, although, would certainly require a special table to be defined in the TOML specification to indicate a schema, and if there is one, to be read by the parser. Not all parsers would have to support schema either. Just like some XML parsers do not support XSD.
  2. The location parameter doesn't have to indicate a remote URL. In fact, this should read URI and can be a local file. The parsers may provide extra feature to map a URL back to a local file, for offline processing/validation. Just like XML/XSD parsers have supported this scenario for more than a decade.

So in short: the proposal is to find common ground, without adding complexity to the TOML specification itself, but to ensure the specification recognizes the existence of the TOML Schema, and allows for a standard way for defining a pointer to a schema file. That's all.

@marzer I agree with your essential point:

One of TOML's selling points is it's minimalism, both in syntax, and thus the subsequent implementation.

But at this point, toml-schema is a separate project, and the impression I get from everyone so far is that it'll always be separate from core TOML, even if it's heavily adopted. It imposes nothing on the core standard, and the schemas themselves are fully compliant TOML documents.

I will disagree with you, vehemently, on the matter of complexity. Configurations always start small. But if a configuration is intended to scale up, there may come a time when a little help to keep things in line would be appreciated, especially when that help is a pure add-on with no additional load borne by the standard.

Just adding a little perspective here:

The syntax isn't the issue: The syntax for JSON or YAML or INI files aren't particularly complex. Heck, the syntax for XML isn't all that complex in most cases.

The issue is knowing what keys are available and the expected/valid values for each key.

Take, for example, Windows Terminal's settings.json file. It's the lifeblood of the Terminal in which one can configure the Terminal's many features. Settings are categorized into four areas: General Settings, Profile Settings, Color Schemes, and Actions.

Without a schema, remembering the names and values for each of the settings is a PITA and having to constantly refer to the docs is not productive.

WITH a schema, editors like VSCode make writing settings a breeze:
image

image

While I think that a TOML schema mechanism is a good idea, I agree with others here that it must be optional: TOML parsers may consider schemas, but they are not required to do so. A logical and in my viewpoint very important conclusion from this is that the absence or presence of a schema must not change the data structure resulting from parsing a valid (and schema-valid) document.

Therefore, a ''default'' key as described above cannot be part of the TOML schema spec, since otherwise a schema-aware parser would parse documents into different data structures (with defaults added) than a schema-ignorant parser. Let's not go down that road, since it would fragment the TOML community.

I don't think anyone is mandating that every TOML doc __must__ have a schema. But we are advocating that TOML should offer/support schemas when presented.

@ChristianSi one more time for the sake of the debate: XML and XSD are two separate specifications. One (XML) recognizes the existence of the other (XSD), but (XML) does not require it (XSD).

Not all XML documents _must_ have a _schema_.

Therefore, the proposal is to discuss, along the key TOML contributors and the TOML community in general, whether there is room for a TOML Schema specification, how it should work best, and how TOML specification should recognize its existence in a way that _is_ standardized (e.g. [toml-schema]), but completely optional.

@eksortso right now, the grammar I drafted does not suggest a fully compliant TOML document, but similar. If you look closely to the ABNF, it suggests a few keywords for built-in types, that are not quoted as strings.

Example:

[document.property]
type = array
arraytype = string

What do you think?

@brunoborges

So in short: the proposal is to find common ground, without adding complexity to the TOML specification itself, but to ensure the specification recognizes the existence of the TOML Schema, and allows for a standard way for defining a pointer to a schema file. That's all.

Well as long as it remains fully optional, such that a parser can completely ignore a schema URI and still remain compliant, I guess I have no complaint. To that end, I second @ChristianSi 's point:

''default'' key as described above cannot be part of the TOML schema spec, since otherwise a schema-aware parser would parse documents into different data structures (with defaults added) than a schema-ignorant parser. Lets not go down that road, since it would fragment the TOML community.

@brunoborges Making TOML schemas themselves fully compliant TOML document sounds like a very good idea. "Eat your own dogfood" and don't proliferate file formats and parser requirements needlessly. Just adding a few quotes here and there seems like a worthwhile price.

I have two questions about the proposed syntax:

1) If the schema refs are to be part of the TOML document structure with a 'magic' table named toml-schema, does it mean that table name is now reserved by the spec, and tables with that name should be validated accordingly by schema-aware parsers?

2) Should schema-aware parsers emit the toml-schema table in the parsed data tree, to keep with older parsers that would treat it as just ordinary, non-magic data?

Note that neither questions need answering at all if the schema is _not_ a part of the TOML document, and instead uses magic comments or similar. Something like:

##! toml-schema = { version = 1, location="url..." }

Which also has the upside of appearing visually distinct from regular TOML, though adds complexity to the language since that requires changes to the ABNF.

@brunoborges I apologize, because I've been basing my assessment on the project README only, and that's not in sync with the project's ABNF.

The README does imply that the schema must be a separate document, because the only thing that the TOML document needs to have is a [toml-schema] table with an external reference in location. Theoretically, and in regard to @marzer's points, that schema could be embedded in the TOML, but only if it's fully TOML-compliant.

Now regarding those non-TOML-compliant value keywords. As long as they remain in TOSD docs, then schemas could have special unquoted value keywords in the TOSD format. That could still have a knock-on effect:

  1. Developers adopt TOML Schema and use it with their configurations.
  2. They see a list of options, not strings, that can be left unquoted for type values.
  3. They realize this isn't in TOML and say Hey, maybe we should add this to the TOML standard!
  4. TOML gets option types, due to something that's really convenient and like TOML but isn't TOML.

I'd love to add enumerated values and option types to TOML, but I wouldn't do anything to encourage that, at least not just yet.

@marzer Your comment suggestion reminds me of #522, which was specifically about TOML version pragmas.

Could we use a similar pattern for referring to TOML schemas? Something like the following appearing at the top of the document?

# TOML Schema: v2 https://config.example.com/schema.tosd

@brunoborges Is the version value necessary? Couldn't a separate URI point to the appropriate version of the schema doc?

toml-schema.location = "https://config.example.com/schema_v2.tosd"

Is the version value necessary? Couldn't a separate URI point to the appropriate version of the schema doc?

@eksortso the idea of having the version, is to double check the intent. If schema.tosd is now v2.1 internally (although still on the same URL), but the TOML document still refers to the same URL, the parser should double check the version intent and throw an error if it is trying to validate a TOML document with schema v2 against a v2.1 schema file.

So, while it is not necessary, it would add some protection.

I'd love to add enumerated values and option types to TOML, but I wouldn't do anything to encourage that, at least not just yet.

Yeah, I am not a fan of the unquoted enumerated values either. I just really thought they'd make things easier for extensions/plugins and therefore developer experience in general, but I think you are right to say that it is not impossible to add the quotes.

That said, I'll document that a TOML Schema must be a TOML compliant document. This does raise the question: _Is there still a need for a TOML Schema ABNF grammar?_ I tend to believe that _yes_ it is still needed, to ensure of the structure.

@eksortso Any thoughts?

@marzer here are my thoughts on your two questions:

  • If the schema refs are to be part of the TOML document structure with a 'magic' table named toml-schema, does it mean that table name is now reserved by the spec, and tables with that name should be validated accordingly by schema-aware parsers?
  • Should schema-aware parsers emit the toml-schema table in the parsed data tree, to keep with older parsers that would treat it as just ordinary, non-magic data?

It is unfortunate that the TOML specification does not set a meta-table format for information regarding the document type (e.g. the version).

HTML for example has a standard way to do so:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

If TOML specification allowed for such standardized construct, then the schema reference could be part of it, along with the TOML specification version that could inform parsers of other metadata.

But, assuming that such construct will never be part of the specification, then my thinking is that we have a few options to consider:

Reserve the table [toml-schema]

Schema-aware parsers must evaluate this, and validate the document against the referenced schema. This table must not be part of the document tree, unless the parser is instructed to do so (opt-in).

Non schema-aware parsers must ignore this table and not append it to the document tree, unless the parser is instructed to do so (opt-in).

Do not reserve table [toml-schema]

Schema-aware parsers must evaluate this, and validate the document against the referenced schema. This table must not be part of the document tree, unless the parser is instructed to do so (opt-in).

Non schema-aware parsers by default will treat this table as a regular table and append it to the document tree, unless the parser is instructed to ignore it (opt-out).

Use of a special, comment-based format

The proposal of adding a new construct in the TOML specification seems to be the right solution, as long as this is part of the specification and the grammar.

I really like the following, because it is TOML-compliant in both ways: a comment for non-schema-aware parsers, and a [toml-schema] table for schema-aware parsers. This design meets the same intent as DOCTYPE in HTML.

##! toml-schema = { version = 1, location="url..." }

I would vote for this proposal, without any doubt 馃憤

@brunoborges

Non schema-aware parsers must ignore this table and not append it to the document tree, unless the parser is instructed to do so (opt-in).

I suppose I should clarify what I took "not schema-aware" to mean here: an old parser that knows nothing about this new feature. If it knows about schemas but chooses to ignore them, then it is schema-aware but also non-enforcing.

Moot point, though; I agree with your points above that having it pragma-style in comments is the likely the right direction.

Therefore, a ''default'' key as described above cannot be part of the TOML schema spec, since otherwise a schema-aware parser would parse documents into different data structures (with defaults added) than a schema-ignorant parser. Let's not go down that road, since it would fragment the TOML community.

@ChristianSi I think you are making a really good point here regarding default.

Ideally, a TOML file should output the same data regardless of what parser was used, as long as the parser is compliant with the version of the TOML specification. And if a schema-aware parser generates a data object that is different because it followed the schema and grabbed a few default values, then ultimately the file is different.

In essence what you are saying is that TOML Schema must not influence/modify the data of a TOML file. A TOML Schema can only dictate the data structure and data types; never data input.

I'm down with that.

_Is there still a need for a TOML Schema ABNF grammar?_ I tend to believe that _yes_ it is still needed, to ensure of the structure.

@eksortso Any thoughts?

@brunoborges Well, I'm a big fan of dogfooding, so my advice would be to write the schema standard as TOML using itself to check it. This will hold a lot more weight once TOML v1.0.0 is finally released. I'm not saying this just to be flippant; after all, ABNF was defined using itself for its first specification.

That said, if you want to keep the ABNF around, would it be possible to use the case-sensitive string syntax introduced in
RFC 7405?

Hi all,

I incorporated some of the feedback here, and for now, also decided to not focus on the ABNF grammar, and instead on a set of rules. I believe ABNF may be useful later to generate a parser that validates the overall structure of the TOML Schema document.

It is also starting to seem possible to draft a recursive TOML Schema file to validate the TOML Schema itself.

I'd appreciate those interested in this proposal if you could review the new README documentation.

Thank you

@brunoborges I've left some feedback/nit-picks on your Discussions page: https://github.com/brunoborges/toml-schema/discussions/4

(Is that where you want that sort of thing? Or here?)

Was this page helpful?
0 / 5 - 0 ratings