The spec uses the word "expected" several times, and, as the discussion in #732 shows, the meaning of this word is not quite clear. I would propose to make it clear that these expectations are actually requirements, and reword accordingly.
Specifically, in the definition of the Integer type, change
64 bit (signed long) range expected
to
TOML implementations must be able to losslessly handle at least all
integers in the 64 bit (signed long) range
In the sections on time-related types, change
The precision of fractional seconds is implementation-specific, but at least
millisecond precision is expected.
to
The precision of fractional seconds is implementation-specific, but TOML
implementations must support at least millisecond precision.
(This change needs to be done three times, in the sections on Offset Date-Time, Local Date-Time, and Local Time).
The first wording has been moved out of #732 because the issue has turned out to be controversial. So let's discuss it here!
In #732 in the case of the integer types' expectations, I took the contrary position that "expected" means, in terms of RFC 2119, SHOULD rather than MUST. Although 64-bit unsigned integer values ought to be used to remain compliant with the TOML specification, I hold that there are exceptions (in embedded systems, say, or in Javascript's imprecise float-based numbers) where 64-bit precision cannot be guaranteed.
We may need to do some full rewrites to allow for current exceptions to expected precision. For instance: If an integer value cannot be parsed due to a lack of precision in the parser, then the parser must throw an error. Something like that. Such a requirement gets around how many arbitrary bits are required by forcing parsers to bail if they can't handle a large number.
JavaScript with its built-in BigInt type can handle integers of arbitrary sizes – but that type still seems to be missing in a few important browsers such as Safari. Hmm, I'll accept that as a reasonable justification to use "should" in this case.
What about the fractional seconds thing, is "must" acceptable in that case at least? Don't tell me there are computer systems which cannot represent three fractional digits!
How to proceed on this? @pradyunsg and others: what do you think?
This sounds like a great thing to fix.
I don't want to use the phrase "TOML implementations", and slightly restructuring these sentences would allow us to avoid it. eg:
64-bit signed integers (from −2^63 to 2^63−1) should be handled losslessly.
Millisecond precision is required. Further precision of fractional seconds is implementation-specific.
In terms of presenting an error, I'm OK to add something like:
If an integer value cannot be parsed due to a lack of precision, then an error must be thrown to avoid nasty surprises later.
@pradyunsg, I wouldn't use suggestive language like "nasty" in the spec, though. My suggestion:
If the result of parsing an integer with unlimited precision would result in a valid integer that cannot be represented losslessly, an error must be thrown.
This also separates parsing from abilities of representation: (1) an implementer should detect whether a given value is a valid integer, regardless its size, (2) throw an error (i.e., overflow) when it cannot be represented.
The way I do this (for a different language): try to quick-parse (assume valid range and input of token), if it fails, regex over the raw value to give a meaningful error to the user: either an overflow message (too big, but valid) or a parsing error (not existing of valid digits). That, of course, if an implementation detail, but ensures speed for proper input, and only delays to more complex parsing logic when an error is about to be thrown anyway.
Also, we don't seem to say anything about ranges above the Int64 range, the wording suggests that these _MAY_ be represented with loss of precision, but then we also say that if we cannot do that, we should raise an error. If that is intentional (i.e., that no error needs to be thrown outside the range of Int64), I suggest a 3rd option (arguably not necessarily better):
If the result of parsing an integer with unlimited precision would result in a valid integer inside the range of a 64-bit integer that cannot be represented losslessly, an error must be thrown; if it can be represented with loss, and is outside the 64-bit range, an error may be thrown,
That sounds good, I'll prepare a PR using the wording as suggested by @pradyunsg and @abelbraaksma.
If an integer value cannot be parsed due to a lack of precision, then an error must be thrown to avoid nasty surprises later.
@pradyunsg I like the phrase "to avoid nasty surprises," and would love to see that on the website as a caveat to those wondering if they can use a really large number.
That said, should we include a requirement in the standard that would make parsers' documentation state explicitly the ranges that they allow?
should we include a requirement in the standard that would make parsers' documentation state explicitly the ranges that they allow?
I would welcome this. At W3.org we typically use the term "X is implementation defined" for stuff that must be defined by implementations.
Perhaps, though, in a next version?
Typically, you would do it two ways:
The latter will also list the obvious, which doesn't warrant being added to the general text. Something like: "implementations may impose maximum size for keys, values or source sizes. Implementations should (must?) raise errors when these limits are passed."
Furthermore, you may want to add a section on "implementation dependent" behavior. This should include stuff like:
These are just examples. The idea is to make clear what the spec does and doesn't, and to get that clutter out of the way by putting it in its own section. It's important for implementers, it takes certain ambiguities away. And the requirement to list implementation defined things helps, in a later stage, to compare and list implementations.
Considering @abelbraaksma 's comment, my impression is that such a documentation requirement would quickly become too complex. Also, so far we have no documentation requirements whatsoever, so I seriously doubt whether it's a good idea to start them now. Also, as per #739 we will soon require implementations to thrown an error if they cannot represent specific integer values losslessly. I think that's better that formulating a documentation requirement, since let's be frank: very few people read the docs anyway.
my impression is that such a documentation requirement would quickly become too complex
@ChristianSi Haha, I aimed at the opposite.
But since we wouldn't define what constitutes proper documentation, such info mainly helps implementers to understand what is and isn't up to them. Meaning: where does the implementers freedom stop, and where does the strictness of the spec require certain things. Whether an implementation has an actual website, or even actual documentation, really is always up to them. They can always satisfy "implementation defined" by adding a line on their website: see source.
The more serious implementations may provide more elaborate information,and some already do.
Likewise, we don't currently specify how errors look like or how they are reported. Implementers may ask about this, but by specifying that it's up to them, you make it clear.
I wouldn't do this for the current version 1.0, it'll take some time to get right, but postpone such specified leeway to a future edition. It'll make the overall specification more solid.
Likewise, it may be something that only belongs in an actual spec text, not in the Readme.
For documentation I wasn't expecting implementers to write anything more elaborate than good error messages. Something like, "Error on monster-group-order: integer exceeds maximum of 9007199254740991" would suffice.
More than anything, implementation parameters need to be discoverable by normal users. It's true that nobody reads the docs, which wouldn't be bad to have anyway. What's important is that users can find this stuff when they need it.
So a little "implementation details" template would be nice for implementations to fill out, just to fulfill those sorts of requirements. That would keep things simple, assuming the details are kept up to date. It could also help with implementation specific tests.
Most helpful comment
@ChristianSi Haha, I aimed at the opposite.
But since we wouldn't define what constitutes proper documentation, such info mainly helps implementers to understand what is and isn't up to them. Meaning: where does the implementers freedom stop, and where does the strictness of the spec require certain things. Whether an implementation has an actual website, or even actual documentation, really is always up to them. They can always satisfy "implementation defined" by adding a line on their website: see source.
The more serious implementations may provide more elaborate information,and some already do.
Likewise, we don't currently specify how errors look like or how they are reported. Implementers may ask about this, but by specifying that it's up to them, you make it clear.
I wouldn't do this for the current version 1.0, it'll take some time to get right, but postpone such specified leeway to a future edition. It'll make the overall specification more solid.
Likewise, it may be something that only belongs in an actual spec text, not in the Readme.