Toml: Clarify what "expected" means

Created on 7 May 2020 · 12Comments · Source: toml-lang/toml

The spec uses the word "expected" several times, and, as the discussion in #732 shows, the meaning of this word is not quite clear. I would propose to make it clear that these expectations are actually requirements, and reword accordingly.

Specifically, in the definition of the Integer type, change

64 bit (signed long) range expected

TOML implementations must be able to losslessly handle at least all
integers in the 64 bit (signed long) range

In the sections on time-related types, change

The precision of fractional seconds is implementation-specific, but at least
millisecond precision is expected.

The precision of fractional seconds is implementation-specific, but TOML
implementations must support at least millisecond precision.

(This change needs to be done three times, in the sections on Offset Date-Time, Local Date-Time, and Local Time).

The first wording has been moved out of #732 because the issue has turned out to be controversial. So let's discuss it here!

clarification

Source

ChristianSi

🚀1 👍1

Most helpful comment

my impression is that such a documentation requirement would quickly become too complex

@ChristianSi Haha, I aimed at the opposite.

But since we wouldn't define what constitutes proper documentation, such info mainly helps implementers to understand what is and isn't up to them. Meaning: where does the implementers freedom stop, and where does the strictness of the spec require certain things. Whether an implementation has an actual website, or even actual documentation, really is always up to them. They can always satisfy "implementation defined" by adding a line on their website: see source.

The more serious implementations may provide more elaborate information,and some already do.

Likewise, we don't currently specify how errors look like or how they are reported. Implementers may ask about this, but by specifying that it's up to them, you make it clear.

I wouldn't do this for the current version 1.0, it'll take some time to get right, but postpone such specified leeway to a future edition. It'll make the overall specification more solid.

Likewise, it may be something that only belongs in an actual spec text, not in the Readme.

abelbraaksma on 17 May 2020

👍2

All 12 comments

In #732 in the case of the integer types' expectations, I took the contrary position that "expected" means, in terms of RFC 2119, SHOULD rather than MUST. Although 64-bit unsigned integer values ought to be used to remain compliant with the TOML specification, I hold that there are exceptions (in embedded systems, say, or in Javascript's imprecise float-based numbers) where 64-bit precision cannot be guaranteed.

We may need to do some full rewrites to allow for current exceptions to expected precision. For instance: If an integer value cannot be parsed due to a lack of precision in the parser, then the parser must throw an error. Something like that. Such a requirement gets around how many arbitrary bits are required by forcing parsers to bail if they can't handle a large number.

eksortso on 7 May 2020

👍1

JavaScript with its built-in BigInt type can handle integers of arbitrary sizes – but that type still seems to be missing in a few important browsers such as Safari. Hmm, I'll accept that as a reasonable justification to use "should" in this case.

What about the fractional seconds thing, is "must" acceptable in that case at least? Don't tell me there are computer systems which cannot represent three fractional digits!

ChristianSi on 8 May 2020

👍1

How to proceed on this? @pradyunsg and others: what do you think?

ChristianSi on 14 May 2020

This sounds like a great thing to fix.

I don't want to use the phrase "TOML implementations", and slightly restructuring these sentences would allow us to avoid it. eg:

64-bit signed integers (from −2^63 to 2^63−1) should be handled losslessly.

Millisecond precision is required. Further precision of fractional seconds is implementation-specific.

pradyunsg on 14 May 2020

In terms of presenting an error, I'm OK to add something like:

If an integer value cannot be parsed due to a lack of precision, then an error must be thrown to avoid nasty surprises later.

pradyunsg on 14 May 2020

@pradyunsg, I wouldn't use suggestive language like "nasty" in the spec, though. My suggestion:

If the result of parsing an integer with unlimited precision would result in a valid integer that cannot be represented losslessly, an error must be thrown.

This also separates parsing from abilities of representation: (1) an implementer should detect whether a given value is a valid integer, regardless its size, (2) throw an error (i.e., overflow) when it cannot be represented.

The way I do this (for a different language): try to quick-parse (assume valid range and input of token), if it fails, regex over the raw value to give a meaningful error to the user: either an overflow message (too big, but valid) or a parsing error (not existing of valid digits). That, of course, if an implementation detail, but ensures speed for proper input, and only delays to more complex parsing logic when an error is about to be thrown anyway.

Also, we don't seem to say anything about ranges above the Int64 range, the wording suggests that these _MAY_ be represented with loss of precision, but then we also say that if we cannot do that, we should raise an error. If that is intentional (i.e., that no error needs to be thrown outside the range of Int64), I suggest a 3rd option (arguably not necessarily better):

If the result of parsing an integer with unlimited precision would result in a valid integer inside the range of a 64-bit integer that cannot be represented losslessly, an error must be thrown; if it can be represented with loss, and is outside the 64-bit range, an error may be thrown,

abelbraaksma on 15 May 2020

👍1

That sounds good, I'll prepare a PR using the wording as suggested by @pradyunsg and @abelbraaksma.

ChristianSi on 15 May 2020

👍2

If an integer value cannot be parsed due to a lack of precision, then an error must be thrown to avoid nasty surprises later.

@pradyunsg I like the phrase "to avoid nasty surprises," and would love to see that on the website as a caveat to those wondering if they can use a really large number.

That said, should we include a requirement in the standard that would make parsers' documentation state explicitly the ranges that they allow?

eksortso on 16 May 2020

should we include a requirement in the standard that would make parsers' documentation state explicitly the ranges that they allow?

I would welcome this. At W3.org we typically use the term "X is implementation defined" for stuff that must be defined by implementations.

Perhaps, though, in a next version?

Typically, you would do it two ways:

At the entry for the feature (say: integers, decimals)
Summarize all these in a specific section titled "Implementation defined features"

The latter will also list the obvious, which doesn't warrant being added to the general text. Something like: "implementations may impose maximum size for keys, values or source sizes. Implementations should (must?) raise errors when these limits are passed."

Furthermore, you may want to add a section on "implementation dependent" behavior. This should include stuff like:

this specification doesn't specify how source files are referenced, read or opened. They don't even have to be physical files, instead they could be a stream of bytes, a memory section, or a network resource. It's _implementation dependent_ how sources are read and parsed. It's _implementation defined_ what kind of sources are supported by an implementation.
it's _implementation dependent_ how data types are represented by an implementation. They may, for instance, map integers to native 64 bit longs, a big integer data type, or a floating point data type, provided that the minimal requirements for integers are met.
strings are specified as UTF8 in this specification. This ensures equal treatment by different implementations and that the full range of Unicode characters must be supported. This doesn't limit implementations to support different encodings, or use a different encoding internally. It's _implementation dependent_ how characters and strings are represented, and it's _implementation defined_ what input file encodings an implementation supports.

These are just examples. The idea is to make clear what the spec does and doesn't, and to get that clutter out of the way by putting it in its own section. It's important for implementers, it takes certain ambiguities away. And the requirement to list implementation defined things helps, in a later stage, to compare and list implementations.

abelbraaksma on 17 May 2020

Considering @abelbraaksma 's comment, my impression is that such a documentation requirement would quickly become too complex. Also, so far we have no documentation requirements whatsoever, so I seriously doubt whether it's a good idea to start them now. Also, as per #739 we will soon require implementations to thrown an error if they cannot represent specific integer values losslessly. I think that's better that formulating a documentation requirement, since let's be frank: very few people read the docs anyway.

ChristianSi on 17 May 2020

my impression is that such a documentation requirement would quickly become too complex

@ChristianSi Haha, I aimed at the opposite.

The more serious implementations may provide more elaborate information,and some already do.

Likewise, we don't currently specify how errors look like or how they are reported. Implementers may ask about this, but by specifying that it's up to them, you make it clear.

I wouldn't do this for the current version 1.0, it'll take some time to get right, but postpone such specified leeway to a future edition. It'll make the overall specification more solid.

Likewise, it may be something that only belongs in an actual spec text, not in the Readme.

abelbraaksma on 17 May 2020

👍2

For documentation I wasn't expecting implementers to write anything more elaborate than good error messages. Something like, "Error on monster-group-order: integer exceeds maximum of 9007199254740991" would suffice.

More than anything, implementation parameters need to be discoverable by normal users. It's true that nobody reads the docs, which wouldn't be bad to have anyway. What's important is that users can find this stuff when they need it.

So a little "implementation details" template would be nice for implementations to fill out, just to fulfill those sorts of requirements. That would keep things simple, assuming the details are kept up to date. It could also help with implementation specific tests.

eksortso on 18 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings