Toml: Feature request: Add a duration/timedelta type

Created on 14 Jan 2018  Â·  27Comments  Â·  Source: toml-lang/toml

I think it would be very useful to have a duration type natively in toml. It's a thing I use a lot in my web service configs, for cache TTL or timeouts. Right now I resort to using integers and making the key include the resolution (e.g. timeout_ms, ttl_hours). This has a couple of disadvantages:

  1. Requires extra code every time to be converted to actual language specific duration type.
  2. If the resolution chosen was wrong you have at least one of these problems:

    1. You have to change the key, resulting in backwards incompatibility

    2. You have to add zeros, which decreases readability

    3. You have to do calculations. e.g. if have ttl_hours and want 9 days you need have to enter 216. Which makes it (at least to me) not obvious when quickly looking at the config.

I would propose the following basic and IMHO natural syntax (inspired by go duration parsing/formatting):

day = 1d
hour = 1h
minute = 1m
second = 1s
milli = 1ms
micro1 = 1µs # U+00B5 = micro symbol
micro2 = 1μs # U+03BC = Greek letter mu
nano = 1ns

# allows floats
micro3 = 0.1ms

# allows combining
two_and_a_half_hours = 2h30m
# not advised but possible
five_seconds = 2s3s

# can be negative
minus_one_seconds = -1s

# allows underscores
hundred_thousand_hours = 100_000h

This notably doesn't include months and years because they can differ in duration and are quite easily approximated in days. I'm also fine with the following changes:

  1. Taking out/changing the µ for micro seconds. I think it's fine to use 0.1ms in most cases, so it's not strictly needed. I mainly put it in because Go duration parsing and formatting allows/uses it as well.
  2. I'm fine with adding a prefix to make differentiation with numbers easier. For instance D which would result in D2h30m.
  3. Removing the duplication possibility for 2s3s. Again I mainly put this in because the Go duration parsing allows it.

I really hope this is considered for inclusion as it would be really useful to me and my colleagues. (Much more so than the already supported datetime type, which I've never had an actual use for in a config).

PS. I created a modified fork https://github.com/pelletier/go-toml that supports this: https://github.com/JelteF/go-toml (see the last couple of commits)

new-syntax

Most helpful comment

The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation and RFC5545 defines a subset.

This would make your example:

day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S

# allows floats
micro3 = PT0.001S

# allows combining
two_and_a_half_hours = PT2H30M
# not supported
five_seconds = PT2S3S

# can be negative
minus_one_seconds = -PT1S

# allowing underscores would be a non-standard extension
hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix makes parsing simpler and keeps more space for other types that may one day be added.

The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores.

All 27 comments

A prefix is a good idea, especially if #427 gets accepted. Microseconds could be represented with us. And TOML tends to be pretty strict about formatting (to reduce the chance of confusion when reading a doc), so combining should probably require descending order of units with no duplication.

@NighttimeDriver50000, thanks for the input. The us suffix sounds like a good idea indeed, much easier to type than µs. And the reasoning for the other two points make sense as well.

The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation and RFC5545 defines a subset.

This would make your example:

day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S

# allows floats
micro3 = PT0.001S

# allows combining
two_and_a_half_hours = PT2H30M
# not supported
five_seconds = PT2S3S

# can be negative
minus_one_seconds = -PT1S

# allowing underscores would be a non-standard extension
hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix makes parsing simpler and keeps more space for other types that may one day be added.

The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores.

I would agree that using the previously baked standard would be a very good
idea. Java 8 supports parsing the ISO into Durations, which is a Very Nice
Thing, since I don't have to write the parser.

I'm sure other languages & libraries would also tend to the ISO direction
to some degree.

And TBH, I also agree this would be a good data type to have. Not sure it's
minimal, but certainly useful.

On Sun, May 20, 2018, 12:00 AM jongiddy notifications@github.com wrote:

The date-time type is derived from RFC3339, which is a subset of ISO8601.
It would be great to define any other time types using similar standards. ISO8601
has a duration representation
https://en.wikipedia.org/wiki/ISO_8601#Durations and RFC5545
https://tools.ietf.org/html/rfc5545#section-3.3.6 defines a subset.

This would make your example:

day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S

allows floats

micro3 = PT0.001S

allows combining

two_and_a_half_hours = PT2H30M

not supported

five_seconds = PT2S3S

can be negative

minus_one_seconds = -PT1S

allowing underscores would be a non-standard extension

hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix
makes parsing simpler and keeps more space for other types that may one day
be added.

The downsides are that sub-second units are only supported as decimals
(while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow
them at all), and the decimals do not support underscores.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/toml-lang/toml/issues/514#issuecomment-390462120, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAuc4JX4NPOa0MU-Ln2Y5q_2wfnIzWWMks5t0RSHgaJpZM4Rdtos
.

Good to see some activity on this issue again. Usually I agree that using standards is preferable. However, I don't agree that ISO8601 or RFC5545 would be a better fit for this then something similar to the Go duration parsing. With the following reasons:

  1. Having sub second resolution is really nice for defining short timeouts.
  2. Days and weeks in those standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.
  3. The use of capital letters makes it harder to see the units at a glance.
  4. Requiring a T between the days and time adds extra clutter for no good reason (unless you allow M for months, which is not consistent in number of days)
  5. I've never seen anybody use this standard, which suggests that people don't really like it. (yes this is not scientific of course, feel free to dispute this)

Fixing some of these is of course possible, but that would result in a custom standard. So it would lose the benefit of using the standard.

PS. I'm fine with having the P prefix (or any other one to avoid confusion with the standards). So don't take that as a reason to prefer the standard.

A suggestion: a simple quantity unit may be better than trying to specify all possibilities, and/or a limited subset with arbitrary exclusions.

This notably doesn't include months and years because they can differ in duration and are quite easily specified in days.

But that is exactly why you _can't_ specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

And time is not the only measurement, they all have units. Whether it be disk space/RAM, distance, volume, temperature, currency, etc.

A more general quantity unit measurement type would cover every possible combination, without having to specify them all or include all the conversion factors. Actually interpreting them (and converting if necessary) isn't really the job of a configuration or data file parser, that belongs to the application interpreting the configuration or data.

I like the idea of the combining - you could express something like 8 lb 5 oz or 2 months 3 days.

But generally, this sort of thing belongs more in the domain of the consuming application, not the data file format. And all measurements could be expressed as strings and handled by the application as appropriate for the application. What is the benefit of making it more complicated?

@Falkon1313 I agree that a general quantity unit should be part of the the consuming application. However, there's a big difference between durations and the other quantities you mention: There's a standard library type for durations in almost every programming languague (at least the ones that also have a datetime type, which is already part of the toml spec). And like I said before, the main advantage would be to directly generate that type.

But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

Yes I messed up there. I meant to say they are quite easily approximated in days. (fixed up that comment now)

Maybe we should just add a type which is a pair (number, string), the unit of the number being represented in the string. It would be very useful for any physical quantity.

lenght = 1 mm
size = 720 px
duration = 1 day

But the question is how it would be loaded in the program so that we can not add 1 apple with 2 watermelons...

I personally think generic quantities are best parsed and considered by language and/or domain specific libraries. Any number with a unit may well get coerced to a native type differently depending on that language's native type, and the application's specific understanding of the units of a given domain. For instance a language's Decimal type might be most appropriate when handling quantities (especially currencies).

An example of a quantity library which I think handles them well is Python's Pint, which keeps original strings around as long as possible (extremely useful for any kind of user input, including configs): https://pint.readthedocs.io/en/0.9/

But because durations are so often representable in standard libraries, and so ubiquitous when configuring services, I think it makes a lot of sense to have an intuitive, obvious format in Toml. For the record, that standard mentioned above is far from obvious to me.

I'm not sure number, string tuples will offer much benefit beyond strings, if the application has to decide how those strings transform the number anyway - parsing numbers is easy, handling units is a fiddle.

The USA standard for characters to use in lieu of μ is mc, as in 150mcg = "150 micrograms" which, while not intuitive, is in fact standard. In fact, the precise reason that the USA standard is that is because people might confuse the Mu symbol with a u, and be mislead as to its meaning. Accepting SI and USA transparency would be fine, but us would be undesirable.

Of course, ISO8601 favors the .000001 second style of notation as of the previous issuance... it was recently revised in 2019 and I am not sure how, exactly, yet.

Need this feature!

Days and weeks in those [ISO8601] standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.

Strictly separating _time_ and _calendar periods_ tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug. Though I don't have any numbers on how common actual usecases of using calendar units for time are, work experience shows that whenever people used instant.add(1, ChronoUnit.DAYS) instead of instant.atZone(...).addDays(1) in java, it always caused a timezone bug.

Given that ISO8601 makes a clear distinction between periods and time in the form of P<period>T<time>, I am strongly in favor of using it. Also since the already built-in support for datetimes RFC 3339 is also ISO8601-compatible, it seems intuitive to stick with it

The more I look at ISO 8601's duration standards, the more I like them. The P prefix identifies durations immediately, and T instantly separates the times and dates. And the smallest unit can have a fractional value. They don't have the brevity that @JelteF's original proposal offered. But maybe for the sake of easier configuration writing, we can accommodate a few modest extensions?

Here are some proposals. I tried to cover all the bases touched upon so far, and I hope I didn't stretch things out too far. What do you think?

  • [ ] Allow numbers to have underscores in them, the same way we allow them in integers and floats. This would only be useful for very small fractions like PT0.000_001S, but more on this later.
  • [ ] Allow an underscore between units, and between "P"/"T" and the following unit. That would make, e.g., PT_2H_45M valid. They wouldn't be allowed between the number and its unit. We'll want those to stick together.
  • [ ] Let the letters be case-insensitive. For instance, PT30S would be the same as pt30s. The height difference between the letters and numbers would make those numbers pop out to human readers.
  • [ ] A negative proposal: let's not allow weeks. We don't use the ISO 8601 week-based calendar, and so we wouldn't allow something like _P1W_ to be legal in TOML.
  • [ ] A radical change: Drop the P if there is no date part, so that T5M for five minutes would be valid. (We're going to need that T for time durations.)
  • [ ] Most radically: Introduce 2- or 3-character unit names for sub-second durations. These would follow after S but would work the same way as other units. In order of magnitude, these would be MS for milliseconds, MCS and its variants for micro, and NS for nano. So something like PT100MS for a 100-millisecond interval would be valid.

@eksortso In my view, if we follow the ISO 8601 standard, we should stay close to it. One or two small changes may be fine, but if we're to deviate as far as you suggest, we can as well start from scratch and roll our own solution – maybe as @JelteF suggested or something close it. Or we look for another fine standard/convention that fits our needs better without requiring as many changes as you suggest.

Fair enough. No need for additional units. Not now, anyway.

But, and I was trying to address some of @JelteF's concerns: I still recommend allowing the underscores, being case-insensitive, and prepending P if T starts a time duration. TOML would gain readability and brevity, and these somewhat intermediate forms can be converted to ISO 8601-compliant durations with trivial string munging.

Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug.

@Felk Thinking about this again, you are absolutely right. However, what I'm trying to say though is that this separation is not very useful in practice, if it means you cannot convert the string into the built in duration type of the language. This is the case for Go and Python at least (and I expect more languages).

@eksortso I agree that if we add a pass that removes all underscores and capitalizes all letters, we can still use parsing libraries for the standard.

Thanks, @JelteF.

The notion to allow T instead of PT for intervals with no date components wasn't done to create a distinction between date-based and time-based durations. It's just that I would confuse myself sometimes, that P5M is five months and PT5M is five minutes. That "T" makes a difference, but I guess I still need to convince folks that letting it stand without the "P" would be valuable.

So I know we're trying to get v1.0 out the door, but since there's little along those lines that I can help with, I'd like to move this along, in anticipation of v1.1.

I've sat on a PR for this for awhile. Would it cause any problems (e.g. distract from v1.0 release efforts) if it were submitted for future consideration?

I needed a way to represent durations sooner than later. I also did not want to fork the implementations I use as I rely on both cpptoml and python toml. So for the time being I am using an inline table like so:

delta={count=-15, unit="secs"}

I have a simple C++ utility to convert this into the std::chrono::duration types.

But I'd love to see first class support for this.

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

Btw, 8601 allows weeks, and an abbreviated format (without the letters). I wouldn't use either of those either (but I think there was already consensus on that in the main thread).

even though ISO 8601 doesn't appear to allow them

They don't disallow them, which in standard's parlor usually means that they allow them. My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive. Subtracting parts is complicated, and without timezone information not even reliably possible. What's more, you'll get a different amount of days depending on the time of year (daylight saving time) if you allow subtractions of parts.

Ending up with a duration that's either years and months, or days and time means you have ordered types. These types are exact. Once you mix these, they mean something else depending on time of year.

That's OK, and ultimately up to implementers, but doing all that for positive or negative durations is already quite some work. If independent parts can be positive and negative it's that much harder. And likewise, that much harder to explain to end users and in spec prose.

Note that my point of limiting scope of individual members is not about date, time, duration calculations in toml, but that it can be reasonably expected to be the main use case where these types will be applied.

(though I can sympathize with an opposing argument that we should be inclusive and allow each duration segment to be negative, many existing implementations of such types don't support such flexibility, but also, those that do either chose to support that the whole duration can be negative, or support that individual segments can be negative, but not both)

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway. But not everything does. So I have no problems with using ISO 8601's approach to fractional units, which seems reasonable enough to me.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

I can understand the value of simplicity, but I also want to create a standard that's eminently usable. I wouldn't exclude half-hours for general use when over 8750 hours each year would interpret 0.5 hours the exact same way.

Allowing such niceties creates challenges to devise simple, precise definitions. This is partly done; I already have ABNF code that takes fractions into account. And once I submit a PR (with language that's not on the computer I'm currently typing on), you can assess that for yourself.

My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive.

I agree with you here. Plus or minus the whole duration. That way, we can safely look past the fine points of duration arithmetic.

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway.

Perhaps true for http (but that doesn't support durations, iirc), my work was in the xml, xsd, XPath, xslt area, and those transcend the area of just "web technologies".

(and also, the W3 mention was merely as an illustrating example how "some other standards body" did it, I'm fully aware that their approach has been, and often still is, with its own flaws)

But i understand your points. Besides, most discussion in the W3 groups was wrt date, time, tz, era, calendar and duration arithmetic, which can fill a bookshelf by itself ;). It's dauntingly complex...

I now realize that data _manipulation_ is not something toml concerns itself with. I understand you need to be able to support applications that would want to express fractional time units, while other applications might want to prohibit that. Which is kind of in the same league of an application expecting a numeric value that is in the range 1-10, while toml will allow any 64 bit integer.

In other words, @eksortso, I see now why you'd generally prefer a broader definition over a more limiting one, allowing a wider range of potential scenarios.

Btw, would we want to differentiate between time span and duration? The first is defined by a start- and end-datetime, the second by a period without reference to, or bearing on, a given datetime. They are semantically equivalent, but serve different scenarios, and are expressed and interpreted differently. (apologies if this has already been decided).

@abelbraaksma Since time spans really haven't been discussed yet, we can keep them separate. I've not delved into them, though I'm aware that ISO 8601 does have a standard. Not sure how much it's used. But I could say the same about their durations, and note how they differ from existing data type declarations.

Not sure how much it's used.

Not much, as far as I can tell, though I can see it's potential. I think the complexity (a duration bound to a date requires date time calculations) is what explains the lack of it. It seems that most systems that need to support date, time and duration/timespan choose one over the other with varying degrees of support for calculations. Some systems, notably .NET, use a time interval, with no option to express years or months (more restrictive than an ISO 8601 duration, but parsers targeting .NET can use NodaTime, which recently improved duration support).

It's probably best to stick to one option first anyway. I'm curious how easy or hard it will prove to be for implementers, esp since each framework uses a different definition.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

genericptr picture genericptr  Â·  4Comments

Suhoy95 picture Suhoy95  Â·  4Comments

LongTengDao picture LongTengDao  Â·  4Comments

uvtc picture uvtc  Â·  3Comments

jakwings picture jakwings  Â·  3Comments