Toml: range type

Created on 11 Dec 2019  Â·  58Comments  Â·  Source: toml-lang/toml

I'm new to TOML and really liking it. The one thing I'd really find helpful is a range type, which implementations could interpret either as a range object (e.g., Python) or as an explicit array, depending on the language. I anticipate "just use a 3-array" or "just provide start, stop, and step attributes" as responses, but if you search you'll find that YAML and JSON users also request ranges from time to time. So I think there is a desirable feature here. I'm not going to suggest syntax but array syntax without commas [0 10 1] or doubled periods 0..10..1 or even Mathematica span style 0;;10;;1 pop into mind.

new-syntax

Most helpful comment

I don't think we need this -- the provided functionality is not compelling enough, to justify the complexity this brings in the syntax + mental model. "YAML has it" is very much not a good reason to add syntax to TOML.

Can someone please point out a real world use case where this is a problem? The premise of this issue seems very hypothetical.

All 58 comments

Can you give an example where putting a hypothetical range value into a TOML document would make more sense than just defining parameters of a range with the value types already defined in the spec?

The proper point of reference is the question, why do so many languages (from Python to Ruby to Mathematica to Matlab) provide simple syntax for range construction? The answer is that it is convenient and expressive.

As an example of usefulness, consider a simulation model where a TOML file is used to represent a collection simulation experiments. Each experiment is a table, and often a key-value pair in the table will specify a parameter and a range of values. This will be far easier to read in the TOML file if there is a simple syntax for ranges. Additionally, it provides direct guidance (e.g., to a Python parser) to construct the range rather than to construct some object that merely represents the parameters of a range.

I think ranges can also be great as a shortcut for typical common arrays, they save typing, and add clarity to the intention, and remove typos for cases where you create the range by hand.

In scenarios where TOML is used for configuring unit tests, or performance tests, I certainly see the benefit.

Also, they might open the door for infinite sequences, if we were to consider syntax that allows unbounded ranges. However, this would pose a potentially heavy burden on implementers, as such a thing is only possible with lazy evaluation of said range.

Why not use an inline table? E.g. for the simulation model sample:

parameters = [
  { name="alpha", first=2, last=10, step=2 },
  { name="beta", first=1, last=100 },  # default step: 1
  { name="gamma", first=50, last=-50, step=-1 }
]

Unbounded ranges are not a problem either:

range = { first=15, step=3 }

Or, if desired, you might specify the number of repetitions (different values) instead of an upper/last value:

range = { first=4, repetitions=16, step=4 }  # run tests for 4, 8, ..., 60, 64

This use case is too specialized and rare to deserve new syntax (remember what the "M" stands for?), but TOML can easily accomplish it already.

Hi Christian. Your proposed solution was anticipated in my original post (above). It is not parsed to produce a range of values, which is desirable. Instead, it is parsed to produce an object that can be converted to a range of values. A key feature of the TOML spec is its insistence on useful type inference (despite the "M"). And please remember the "O".

The need for ranges is neither specialized nor rare, even if you not need them often. That is why they have been requested in other settings (e.g., YAML, JSON), and that is why they are implemented in MANY programming languages. (I listed some examples above.)

Basically every for.. in... loop, many while loops, for i=x to y loops etc are inherently ranges. So I wouldn't call it 'rare'. In addition, languages like Java, C#, F#, PHP, Perl, Python, and even XPath all have specific syntax for ranges for arrays, linked lists and/or sequences, splicing and steps in ranges. Just to say that these things wouldn't be so abundant if it was 'rare'. ;).

I don't think the point is that it is currently impossible. The point is to have a simple, clear, unambiguous way of expressing ranges that is portable. As hoc syntax never is. I personally prefer the .. syntax, as it is clear to the casual reader, even without a programmer's background.

I don't think we need this -- the provided functionality is not compelling enough, to justify the complexity this brings in the syntax + mental model. "YAML has it" is very much not a good reason to add syntax to TOML.

Can someone please point out a real world use case where this is a problem? The premise of this issue seems very hypothetical.

  1. I think that a burden falls on those who say things like "the provided functionality is not compelling enough" no to rely on personal habits but to consider why so many languages have found it compelling to provide a special syntax for ranges. (Related: see Abel's comments.)
  2. I've looked around a bit and think that haskell's notation is simple and obvious (i.e., easy to understand). As Abel emphasizes, obviousness (the "O" in TOML) is a compelling consideration here. In haskell notation, the user provides the first two terms of the sequence and an upper limit. So a sequence from 1 to 9 by 2s becomes [1,3..9]. I think this looks good for TOML because it resembles array syntax but nevertheless parses without ambiguity.
  3. As for real-world use cases, these arise whenever value ranges are needed. Abel mentioned unit testing. My example of simulation modeling is not at all hypothetical: TOML is now in use for the specification of simulation models. And again, range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian.

many languages have found it compelling to provide a special syntax for ranges

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

range = { first=4, repetitions=16, step=4 }

Yes, it's directly understandable for a reader of the configuration. Much less obvious how to type it, or what values are valid:

  • can I leave something out?
  • is it case sensitive? Note that 'mere mortals' often assume case insensitivity, without knowing it exists
  • are decimals allowed?
  • what happens with negative steps, or negative other values?
  • is the range inclusive?
  • do you start at the beginning, or is the first step added?
  • what happens if the step doesn't end exactly on the range end, is the last step included, or not?
  • what with these commas and curlies, is that necessary?
  • can I use this syntax on that other app, does it understand it, or do I need to learn new syntax?

And herewith lies the problem: each and every application that supports TOML and needs a range, has to fully specify how it deals with all of these situations.

Just like with other features that are not necessarily used by everyone (nested arrays, I can't get the support engineers to understand them, but that's also true for the json-like syntax: TOML is certainly not for the average user), it is better to specify once and be clear about it, than let each and every configuration define it for themselves.

Even if only 10% is going to use it, it even if it's only useful in a subset of situations, this is true for most features of TOML, rarely will you see config files that use everything. Imo, that shouldn't be the leading argument.

Likewise, I can understand the hesitancy, in that you don't just want to extend the syntax on everyone's whim. Personally, I don't think this is a whim, and had wide spread usage in both present and past languages and configuration files. Let's do it right, and help users and designers with a clear addition to the syntax, ready if they need it, ignorable if they don't.

PS: for implementors, I think this is a very trivial thing to add.

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

This observation is orthogonal to the point. The point is simplicity and expressiveness.

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

This claim is incorrect. Only a programmer would say such a thing, and even then only a programmer who assumes additional context (i.e., this conversation). Arithmetic sequences using dots are introduced in grade school. The notation is notation exactly the same, but it is close. This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

I won't say more because Abel has said it much better than I could.

Arithmetic sequences using dots are introduced in grade school.

Numeric sequences are introduced in school, the notation is like (a1, a2, ..., aN, ...), and the semantics does not by any means imply arithmetic progression. For instance, (1, 3, 9) can describe first terms of geometric progression, or just some arbitrary sequence. Semantics of well-known school notation is pretty far from what you suggest.

This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

This is not a point at all. Configuration files should be handy for those who read and write them by hand. Shiny parser API cannot be an excuse for increase of amount of syntax features that user must learn.

Like @lmna said earlier: programming languages have tons of stuff which TOML neither has nor needs, since it's not a programming language. More relevant to the issue at hand would be whether other commonly used data serialization or configuration file formats have a built-in syntax for range types. As far as I can tell, that's not the case. Not even YAML (whose M could well mean "Maximal") seems to support it.

I don't doubt that this feature has been "requested" from time to time, but the fact that these requests have apparently all been rejected should tell us something.

As for obviousness: In Ruby, 1..10 creates an inclusive range (from 1 to 10), while 1...10 creates an exclusive range (actually from 1 to 9). That's obvious? Really?

cannot be an excuse for increase of amount of syntax features that user must learn.

I agree, so instead of requiring users to learn the individual specifications of each and every usage of TOML, let's give both readers and writers something they can work with and that's easy to understand and easy to write. Learn once, apply everywhere.

That's obvious? Really?

Not at all, it's good to learn from other's mistakes, and precisely the reason why we should keep it simple and explicit. One syntax, with an obvious meaning.

Configuration files should be handy for those who read and write them by hand.

Yes. That is precisely the point.

I am very confident that if the syntax [first,next..max], nobody will ever complain that is is hard to read or write. I am also very confident that not a single person will ever complain about writing or reading [1,2..20] instead of [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] -- during the typing of which I head to correct two errors and then double check that there were no others.

One syntax, with an obvious meaning.

Meaning of [1,3..9]syntax (well, if you manage to guess that the whole construct is about arithmetic progression) is not really obvious because of the following questions:

  • can I leave something out?
  • are decimals allowed?
  • what happens with negative values?
  • is the range inclusive?
  • what happens if the step doesn't end exactly on the range end, is the last step included, or not?
  • what with these commas and brackets and dots, is that necessary?
  • does number of dots really matter?
  • is it allowed to specify more than 3 values?
  • how do i use this feature to configure a print job for pages 3, 7, 12-15, 21-24?

Is it worth it to describe it all in the TOML spec? Will users be truly happy and enthusiastic about reading and remembering all that stuff? Will it be obvious for those who dont bother to even read the spec?

Will it be obvious for those who dont bother to even read the spec?

Certainly more obvious than local time, arrays of tables, or dot notation for supertable generation.

Learn once, apply everywhere.

I do disagree with "learn" part.

In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML.

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

TOML is not yet at 1.0. Will you propose to remove arrays of tables before the 1.0 release? Why or why not? How about local time notation? Keep or discard? And why?

how do i use this feature to configure a print job for pages 3, 7, 12-15, 21-24

This is a great question. Here is a possible notation for that: [3, 7, 12-15, 21-24]. What if you want only every other page in the first range? Then [3, 7, 12-15 by 2, 21-24]. What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

TOML is not yet at 1.0. Will you propose to remove

Official goal for version 1.0.0 is to be backwards compatible (as much as humanly possible) with version 0.5.0. So removal of existing syntax is not an option any more.

Will you propose to remove arrays of tables before the 1.0 release?

This could be done for 2.0, if someone comes up with an exellent alternative to current arrays-of-tables.

How about local time notation?

The whole date&time thing, not only the "local" aspect, was a very controversial feature. I believe that first-class date&time is not worth its complexity.

if someone comes up with an exellent alternative to current arrays-of-tables

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use. Yes, this is always the correct criteria. (Just fyi, I am pleased to have date-time functionality, although I wish times required a clarifying T prefix.)

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use.

Complexity of arrays-of-tables is justified by expressive power. An alternative should reduce the complexity (make things more obvious & trivial), but not at cost of readability and expressiveness.

Important thing to note is that first-class date&time and first-class ranges do not add anything to readability and expressiveness. You can encode them as TOML strings and then interprete those strings at application level (just like you interprete any other configuration parameter). No sacrifices here.

first-class ranges do not add anything to readability and expressiveness

This claim is obviously incorrect. Prove it to yourself by typing out any long range without ever checking to see if you made an error. A good syntax for ranges add readability, expressiveness, and ease of use. (Which is exactly why this exists in so many programming languages.)

Of course if I just want to parse everything myself, I could use an INI parser and handle the string values. A key piece of the value added by TOML is elimination of this need in config files.

Prove it to yourself by typing out any long range without ever checking to see if you made an error.

Okay, lets do it once again. range = { first=4, repetitions=16, step=4 } Hope, it is long enough?

lets do it once again

  1. You can only easily interpret the meaning of this because you are in this conversation. So, it lacks clear meaning to a reader. (This is a really important point that you are skipping over repeatedly.) This is especially true when readers need not be programmers.
  2. It is not standardized. You simply made up the keys to help you know what on earth you are talking about, which even so would not be evident if you were not in this conversation.
  3. It is parsed to an object that must be converted to an array by a knowledgeable user. So it has reduced functionality.

So in fact the meaning is not obvious at all to a reader who is not in this conversation. You are simply making the point that there are available workarounds, although without any supporting standard. Yes, we all know that. That's what we're doing now. The request is for something less tiresome and more communicative.

Just a quick reminder: it is NOT the case that the party with the highest number of comments wins :wink:

What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

Even more explicit would be [1-9 by +2].

In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML.

@Imna It's a great goal. But I've so far spent many hours on learning TOML and understanding the peculiarities of the syntax. I'm still not there, it's a rather complex spec with many caveats. And I have 25+ years experience in various computing and programming fields and have been a co-editor of W3C specifications. I know how to read specs (at least I like to think so ;) ), but TOML, in all its conciseness, is not so KISS anymore.

TOML is also way too complex for support-engineers at my partnering hosting company to write correctly. I just send them an updated file instead of saying: "please update field so and so in the TOML config", as they always make mistakes. But this is also true for any other config language. I think that the target audience is programmers and software engineers, even though we'd like it to be different.

That is not a critique, other config syntaxes are often harder to learn and compared to them I really like TOML and the way it tries to find a balance. But without JSON background and an understanding of arrays, tables etc, you are up for a rather steep learning curve.

Should you then stop adding new features? Stop evolving the syntax to prevent it getting more complex? I'm not sure of the right answer here, but generally I think evolution is good. To a certain degree, obviously.


I agree that writing range = { first=4, repetitions=16, step=4 } is clear, but it doesn't remove the fact that it's meaning is implementation-dependent. Unless you suggest that the above syntax is to be translated into [4, 8, 12...etc] by implementers, and not into an object with three fields.

We should, however, first try to answer the question: do we want this in? If the answer is yes, we can come up with an understandable and sufficiently-easy syntax. If not, we don't need to attempt that anyway.

I like the idea, but only if the chosen syntax is obvious for non-programmers. There's plenty of good examples of range constructs in programming languages but they're obviously only succinct and clear to people who know those languages. Requiring some familiarity with how language X does feature Y in order to make a config change defies the spirit of what TOML is intended to be, methinks.

If you _really_ wanted it to be simple, obvious, and unambiguous you could introduce some keywords, e.g. my_range = from 5 to 10 inclusive. Pretty hard to misunderstand what that means but obviously complicates parsing a bit.

@marzer, I like your idea, it's clear, simple and concise. The (slight) extra burden on parsing shouldn't be too hard to tackle.

my_range = from 5 to 10 inclusive

First of all, I have no strong preference on syntax and any such approach would fully meet my needs. But I still have a few comments.

  1. Only programmers are going to be worrying about whether a range is inclusive or exclusive. People ordinarily use inclusive ranges.

  2. As Abel has pointed out, it is very easy to overstate the syntax burden of any of the proposals. If someone is writing TOML, they'll learn an easy syntax after using it once. If someone is reading a TOML file, other things will be much harder for them to guess than the meaning of say, [5,10..100] or [5,10,15,...,100] or [5-100 by +5]. They'll look them up once and be done. Or, the writer can add a comment. All of these syntaxes allow truly trivial mastery.

  3. My request is only (!) for a range, but Alexey's query about printer configuration raises the possibility of an encompassing syntax. Consider the meaning of [3, 7, 12-15 by +2, 21-24]. Will anyone argue this is will not be obvious to a non-programmer? It is currently my favorite among the proposals: simple and obvious, and apparently easy to parse.

Will anyone argue this is will not be obvious to a non-programmer?

Since that matches the syntax used by Microsoft for decades in their "Print" dialog box to select pages to print from a document (apart from the brackets and by, the latter I can live without), I reckon that proves the point that 'ordinary people will understand it': anyone can print a document, or a selection from it.

I reckon that proves the point that 'ordinary people will understand it'

OK, then there is at least one "obvious" syntax.

In addition, two prominent use cases have been defined: printer configuration, and simulation configuration.

I will only add, because a few participants seem not to understand this, that the need to share simulation configurations across platforms and languages is widespread. Having a language agnostic way to do this is highly desirable. Absence of a range syntax in TOML is a barrier, since it requires sharing not just the TOML file but in addition communicating a convention for representing ranges, which means that a transformation will have to be implemented by the recipient.

@alan-isaac:

I will only add, because a few participants seem not to understand this, that the need to share simulation configurations across platforms and languages is widespread.

Is this only an imaginary use case, or are you really using TOML for this purpose? If the latter, it would be useful if you could give a short sample excerpt, showing (a) how you are currently listing this data (without a built-in range syntax) and (b) how you would wish if looked if your preferred range syntax were adopted.

If the difference between the two syntaxes is indeed significant, this might considerably strengthen your case. If not, I have serious doubts that your proposal will make it into the TOML spec.

@ChristianSi Yes, I am using TOML for the configuration of simulations and for the exchange of these configurations. But even if I were not, it is obvious that a configuration language is needed for this purpose, and it should be obvious that simulations that need configuration are all over the place. This is a role that TOML could fill much more nicely than it does.

The workarounds I've tried are all ugly, so there is no real need to discuss them. I have typed longish ranges by hand or produced them at a console and pasted them in. A language-dependent workaround is to provide the range as a code string that is evaluated to get the range. (Insecure!) I've done this. If the relationship at the other end supports it, the range can be described by a table that is processed by the recipient to produce a range. So a variety of workarounds are possible, but they are all awful. Parameter configurations should be sharable without requiring post-processing to extract the actual parameters.

I think that last observation is the one you are repeatedly skipping over, although both Abel and I have emphasized it. After all, if the question becomes whether there isn't some kind of post-processing would make the work possible, we can just go back to INI plus clever hacks.

As I said before, I really don't care which range syntax TOML adopts. However, there seems to be agreement that the syntax that resembles printer configuration is "obvious", so that may be the way to go, especially since it could indeed be used to configure print jobs. It would meet my needs.

I have trouble understanding what your objection is once a useful and obvious syntax (that would not be hard to parse) has been discovered. You seem to suspect that it will not actually find much use; is that it? If so, I strongly disagree.

@alan-isaac I don't think you're strengthening your case by refusing to even show a reasonable example. Well, your choice.

@ChristianSi I'm confused. Aside from machine generated files, there are only workarounds. What about my description of the workarounds is unclear? One cannot illustrate a range syntax in TOML when TOML does not support it. There are only workarounds, none of them universal, and that is exactly the problem. If a good workaround existed, I would not be making a feature request.

I suspect I don't understand what you are after. Perhaps you will be interested in the GUI interface on page 3 of this document, showing the ParameterSweep window in Repast Simphony. This example of course is specific to one particular popular simulation toolkit, but illustrates the kinds of simulation configurations that need to be shared in a language-agnostic, cross platform fashion. Similar interfaces are common in many simulation toolkits; I can share more such examples if you need.

If I understand, you are not contesting that an obvious syntax has been found. You are rather dubious that, if introduced, it would find much use. Is that correct? If so, I urge you to do a Google search on "parameter sweep". You will get millions and millions of hits.

He just asked you to give an example snippet of how you currently express ranges and how you would prefer to do so.

Example:

"Currently we write sim = { begin = 1, end = 500, step = 10 }, but I'd like if we could write sim = 1-500;10."

...but with the snippets pasted from your actual use cases instead of being invented by me for the sake of an example.

@marzer So, I really was not kidding in my description, sad as that may be.

A TOML file contains a collection of experiments; call the parsed result xpmts. A single experiment is a TOML table, where parameter names are the keys. Say the an experiment is in the table [xpmt01]. When it is just a matter of sharing within a local project and all users are Python users, we can eval values that are strings. (Shudder. But we do it.) Thus in the [xpmt01] table the entry param01="range(0,1001,10)" is post-processed by casting when necessary: if isinstance(xpmt01['param01'], str): xpmt01['param01'] = eval(xpmt01['param01']). If no experiment parameters are strings (not always the case), we can just walk through the experiment dict, replacing each string value in this fashion.

This approach has too many drawbacks to list, but prominent among them is that the TOML file does not actually specify the configuration but rather provides enough information that an informed enough user can produce the actual configuration by post processing. It would be much better in the [xpmt01] table to be able to write say param01=[0-1000 by +10]. (The actual syntax is not what is important here, but rather the ability to produce the actual configuration rather than a proxy for the configuration.) The safer and more language agnostic approach param01={start=0, stop=1000,step=10) does not fix this. It still means the TOML file cannot simply be shared as a way to share the configuration of the experiment: condition casting of values by the recipient of the configuration file is still required to produce the actual configuration.

Am I responding to the question now?

Am I responding to the question now?

Frankly? No. Just suggest a syntax that _would_ work for you, instead of pontificating and complaining about what doesn't/can't.

I _was_ trying to help you - I think a range syntax would be useful - but... ugh. Good luck, I guess.

@marzer I'm again confused; you asked for an actual example of current usage, which I provided. I also included a syntax that would work for me. It is the same one discussed multiple times above. In response to your question, I mentioned the syntax [0-1000 by +10] because that (or some variant appeared to have some support, particularly since it is tied to printer configuration syntax. My own preference is [0,10..1000], taken straight from Haskell, which I also mentioned above, but there were some objections to that (i.e., claims it was not "obvious" enough). Nobody has claimed the printer configuration syntax is not "obvious", and nobody has claimed it would be hard to parse.

Just to be clear, the syntax you suggested (0-1000;10) would also work just fine for me. But I anticipate objections that it is not obvious enough. Also, if I understood correctly, Abel proposed [0-1000 +10]. This would also be just fine. So would Scala syntax: (0 to 1000 by 10).

Whatever the team decides is most suitable will be perfectly fine with me. I care about the functionality much more than the syntax.

@marzer So which of the syntaxes that I've just mentioned would you choose?

@alan-isaac My impression is that you're not just hoping for a range type in TOML – which would conceptually, regardless of the syntax chosen, encode a triple of the form: range(start at x, stop at y, proceed in steps of z) – but you're also expecting TOML to evaluate the range for you. So instead of, say, range(start at 1, stop at 10, proceed in steps of 3) you're hoping to get the array [1, 4, 7, 10]. Is that correct?

@ChristianSi, I've always thought that was the main aim of this thread. Otherwise, it's essentially the same as using a json style object (apart from the advantage of an non ambiguous syntax).

There have been questions of 'how do you do it now' and how it would change. The answers in this same thread coming down to: you can't do it now, so there's no example.

Well, here's how I do it currently.

  • I have a project with thousands of small performance tests, they're logically numbered for ease of calling
  • when I'm working on a particular area, instead of running all (which takes long), I want to run a subset
  • for that, I created illegal syntax, much like [42-120, 1200-1800].
  • thanks to the nature of TOML, I have now different sections in the config for different types of test runs, the combination TOML + ranges feels very natural when using it, and simple to write without errors
  • I pre-process this TOML file and simply expand the ranges to be [42,43,44,45...],you get the idea. This makes a valid TOML file.
  • I then process the expanded TOML as normal

Obviously, there are other ways of achieving the same effect, but at the time, this seemed simplest. I looked at some existing parsers to amend them for this purpose, until I stumbled upon this thread.

So I waited, in case an a agreement could be reached.

I guess implementations could choose to statically expand into an array, or could choose to give an enumerator, or both, depending on their interface. But that's true already for the existing syntax of arrays, though an enumerator may be more applicable in some scenarios. But that's of course an implementation detail, irrelevant for TOML itself.

@ChristianSi

tl;dr: Yes.

The answer by @abelbraaksma nicely captures the core issue. The job of the TOML spec is just to provide an unambiguous meaning to the syntax, not to determine the parser implementation details. (Although, recommendations could be made, course.) For example, for TOML tables, the popular C parser for TOML naturally produces a struct rather than a hash table. The important thing is that I can send a file to a C user or a Python user and just say "use a TOML parser to extract the configuration of this experiment".

The goal is simply to have an obvious syntax that unambiguously indicates that a range of values is produced by a TOML parser, not to constrain how a particular parser might produce that (e.g., as a list, a tuple, an array, or a range object). In fact, Abel's examples have persuaded me (against my original thought) that the type of syntax he describes would be most useful to others (even though I just (!) need ranges). The printer configuration example is what really persuaded me. To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

@alan-isaac:

To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

I see, but let's be honest: that will never happen, since, as pointed out much earlier in this tread, TOML is not a programming language. A TOML parser will parse date strings into date objects and number strings into numbers, but it will never evaluate stuff like "10 days after 2019-12-23" (regardless of the syntax used). I even doubt that stuff like num = 3.2*10^20 + 17 will ever be evaluated by a TOML parser. TOML will never have readable and writeable variables, for loops, or conditionals -- and what you're asking for is essentially of the same scope. It's a programming language construct, and those are outside of TOML's feature set.

On the plus side, you might be able to solve your problem be sending the file through a template engine before parsing it as TOML.

@ChristianSi Are you offering a false dichotomy? I doubt that you can come up with any coherent way of distinguishing parsing 1979-05-27T07:32:00-08:00 to a data-time object and parsing [0-100] to a range. Please suggest how to understand the distinction as you are trying to draw it. Thanks.

@alan-isaac I'll easily parse [0-100] (or, as I would certainly prefer to avoid confusion with the subtraction operation, [0..100]) into "list containing one value: range(from=0, to=100)" for you. But that was decidedly not what you wanted in your preceding comment.

@ChristianSi As I also said in my previous comment, I personally just need ranges. If you are saying that you would be happy to have a range synatx 0..100 that say the Python or Ruby parser would parse to a range object, then yes please! That would be extremely helpful!

The rest of the discussion appears separate to me. In that separate discussion, I still believe you are drawing an untenable line between parsing and transforming, as illustrated most nicely by the date-time type. TOML has lots of syntax that provides convenient ways to say what values should be produced by a TOML parser. Indeed, this is a key feature of TOML over INI (where standard parsers produce only strings as keys and values). So I still invite you to try to make concrete the reasoning for rejecting say the printer-configuration syntax, with date-time parsing being the point of reference for the purposes of the discussion.

But again, I would be be delighted by the addition of a simple range syntax, and the Haskell influenced double-dot notation would be great.

Perhaps because often in this discussion we refer to how things are done in programming languages, we lose sight of a key thing: a concise way of expressing arrays, that are a natural sequence of numbers. Just like a time-span is a range of time.

Having such expression doesn't diminish the declarative nature of TOML, in my opinion, not does it magically turn it into a programming language, far from it. It's basically just a different way of writing the same thing, but clearer than reading or writing, say, 100 numbers.

The confusion comes perhaps from the idea that ranges are often used in loops in programming languages, but the proposal here does not intent to apply the range result. It is not a branch or loop instruction.

To summarize:

foo = [1,2,3,4,5,6,7,8,9,10]

Is exactly equal to (assuming one variant of the syntax):

foo = [1..10]

The main difference being esthetics, number of keystrokes, being prone or not to errors, clarity of intent, and readability.

I read the following issue and I'd want to add my 2 cents. As a toml user, if we can say that, I really like toml because it does what it claim to do and does it very well.

Being so minimal allows it to be able to be used as a drop in replacement for json/ini/... without much issues. From my point of view, having a range/slice type is a bit similar to having a datetime type. The datetime type is a pain to implement because we live in a world with timezones, offset and dst. There are so many ways to represent a date and as many ways to do it wrong that can end up very badly.

That said, this limitation of JSON doesn't prevent developers to use json. Special types can be handled as substructure of a json object. The same thing can be done in toml and as for the range/slice type.

Saying that you want to have range being understood in any language is a nice thing but a file format contains data and how the data is read / interpreted by an application is a whole different thing.

So taking this example:

foo = [1..10]

There are at least 2 possibilities to interpret this:

  1. It's generating a list from 1 to 10
  2. It's generating an Range type that lets you iterate a value from 1 to 10

If you generate a list from 1 to 10, you open toml with side effects like a file having a wrong range definition that would expand to a few terabytes ram being used. That's not really nice... Also it would make it a bit difficult for storage. So a range should be a dumb object that can be used to create a generator like in python3. In order to create a list from 1 to 10 in python3 you'd have to create a range for range(1, 11) but in python2 (while it's supposed to be unsupported as of today if I'm not mistaken), a range is a function or a generator. In javascript, it doesn't exist so it would have to be implemented and in other languages there may be some support for range but chances are a parser would have to define a custom type in many languages in order to support the extension. Just that means that the file format wouldn't be a file format anymore but starting to get into the "programming language" extension where it would have to define foreign types.
That's one good way of ending with flacky support of toml in different languages. Having multiple version of toml on a same language because a design choice in implementation didn't please someone else.
It's not like the Date type which is hardly a foreign type to any language out there.

In python3, I could handle the range issue with this format for [1,10]:

foo = [1, 11, 1]
ids = [x for x in range(*data['foo'])]

But yes, if you need to use it in an other language, you'll have to know what which parameter should be so if 1 and 11 are inclusive or not but as you define the format of your file, it should be in the file format (application level) not file format (transport level).

But nothing prevents you from having helper methods like:

array_to_range(array) -> range
range_to_array(range) -> array

This way you can have consistent way to parse substructures in toml which are specific to your application. Having it part of the file format, could be useful but it's so trivial to implement that I wonder if it's worth the hassle to have it being part of the language as it also comes with incompatibilities and sacrifices.

For example Rust doesn't seem to support a step like python3, in ruby you can include or exclude the last element, in C#, the range is a start:count parameters so 2,10 would yield [2,3...10,11] , so it means we can't have negative values for the second parameter. Java support for range seems to exists in many form but I couldn't find one supposed to be used to generate a list as an interator so it might be implementation specific like javascript.

On other thing is that choice design on how to implement the range could be influenced by the use, is is threadsafe, can it be used as an async iterator? I think it should be left to the user to implement it for their application.

Saying that you should be able to read file between languages is barely an argument as if I open a file x.json in one python app, an other ruby app won't comprehend what to do with the file even if json is correctly supported by ruby. So even if toml supported ranges, it wouldn't make all apps magically understand your file, you'd still have to implement your app to expect a range or to parse a range. Handling the conversion in your app is putting the maintenance burden on yourself and having it inside toml is putting the burden on every implementer of the toml format. It would suck to be unable to open a file on a certain language because implementer couldn't decide how to implement ranges in a way it pleases everyone. When you can just put an array or a object and call it a day.

@llacroix I suggest that the most basic question is simpler than you appear to believe. The question is simply whether TOML files should have a syntax to more simply describe a certain simple and common type of array, which would otherwise have to be typed out explicitly.

For this basic question, your most powerful argument is that the syntax is so flexible that a TOML file might cause a parser to generate a dismayingly large array. This will not affect parsers that choose to return a range object of some type, rather than explicitly constructing the array. Worrying about such parser decisions is like worrying about whether the current Python parser should return a list, a tuple, or an array. I think it is out of scope?

The need for this is not for sharing across instances of a single application but for sharing certain kinds of configurations across diverse applications. A good point of reference is thinking about how TOML files could be used to configure print jobs by providing an array of page numbers. The configuration should be parsed to the sequence of page numbers, not to an object that could be turned by an app here or an app there by not all printer apps into this sequence.

But again, the question is just one of convenient syntax. Just as you do not insist that TOML users represent number literals with hex notation, why should they not more simply represent certain common and simple array literals?

But again, the question is just one of convenient syntax. Just as you do not insist that TOML users represent number literals with hex notation, why should they not more simply represent certain common and simple array literals?

Because range aren't literal for arrays, they're pretty much literal for control flow. They are used to prevent loading a complete data structure in memory. You can accumulate all the values or reduce them to a single one.

In other words, range is the functional version of

for(i=start; i<stop; i+=step)

This will not affect parsers that choose to return a range object of some type, rather than explicitly constructing the array. Worrying about such parser decisions is like worrying about whether the current Python parser should return a list, a tuple, or an array. I think it is out of scope?

It's not out of scope as if the parser can return different types, it would make it difficult or even impossible to load certain file in some languages as range != array.

Let say you have a file that looks like this:

[job.a]
pages = [1:3]

[job.b]
pages = [1:2, 5:7]

[job.c]
pages = [1:2, 10]

In this case, job.c.pages would have a list of [range, int] Which is incompatible if you load a List<Range> for example. The other would be a list of ranges only. If you wanted to explode them into list you'd have a List> and in the last example you'd still have an int in conflict so you'd have to write this instead.

[job.c]
pages = [1:2, [10]]

To be able to do something like this:

for page_ranges in data['pages']:
   for page in page_ranges:
      do_something(...)

But lets put aside the memory limit and OOM killer issue and let say we want to explode range and join list together to have this [1:2, 5, 7:10] explode into [1,2,5,7,8,9,10] then we could in theory loop over all the elements as if it was a list... But like I said earlier if you do that, you're not capable of serializing it back into a range.

With that file:

[stars]
indices = [1:100000000000000000]

Let's imagine you do that:

# load the file and explode as list
x = toml.load("file.toml")
x['stars']['good'] = [1, 2, 3]
toml.dump(open('file.toml', 'w'), x)

If those things are loaded as array literals, they're going to be saved as array of int. Thought the data didn't really change but you start from a file with a few bytes to a couple thousands of bytes just because expansion loose the data it stores because it can't know what was the range previously stored in the file if it expand it.

And that would be difficult to handle correctly because if you can expand it, you're either breaking the ram (you're certainly going to go out of memory) or you need to keep it as a range and handle it as completely different type as they're not array literals but in that case you're getting hit by platform limitations in implementation specific ways.
I mean even if we could set a limit on the parser to limit 1 expansion to 1000 elements, it doesn't prevent a malicious user to input 1000 ranges expanding to 1000 elements. And all those checks add complexity to a parser and you only need to forget one case to hope it was never implemented.

Also it's not very typical to manipulate ranges directly in code. For example, I don't see why in code I'd build something like this:

pages = [range(1, 10), 1, range(13, 45)]

The only way I see how it could make sense is if you received an input text as toml to be parsed to [range(1, 10), 1, range(13, 45)] but as you suggest, it would return a list of int anyway so if you had

pages = pages_from_toml()

It would always store a list of int, and your output file would always have the exploded version of an array literal. So in order for a software to output ranges you'd have to manually do something like this

pages = []
for rarr in ranges:
    pages.append(range(rarr.start, rarr.stop))

How is that easier to write than:

pages = []
for rarr in ranges:
    pages.append([rarr.start, rarr.stop])

Programming wise, it's not particularly different, a range is really just a start stop and possibly increment.

  1. Your (@llacroix) first comment completely misses the point. A range syntax will be an array literal if TOML says so. It is that simple. This is a very simple point. I am not understanding why it is repeatedly ignored. @abelbraaksma has explained this multiple times, and this observation is no more than a standard CS use of the term "literal". It is just a matter of convenient syntax.
  2. I will repeat the other simple but apparently misunderstood point. The goal is not to have a TOML file plus schema that together can be used to produce a configuration. (I am not just noting the lack of a TOML schema framework.) The goal is to have a TOML file that directly represents the configuration. That is, it is to facilitate the direct use of handwritten TOML files as configuration files, which is a common use for them. Each suggested workaround completely misses the point. Both @abelbraaksma and I have tried very hard to draw this distinction, but neither you nor @ChristianSi have offered any indication of understanding what we are trying to say. That may be the fault of our communication skills -- I for one do not have formal CS training -- but surely you can overcome our shortcomings in that area.
  3. The worry about exploding array sizes is something of a red herring. Right now, a TOML file has no restriction on array sizes, so I can already send a file will enormous arrays. Of course the difference is that right now the TOML file would have to be correspondingly large. If this difference is seen as significant, then the size of arrays specified with a range notation could be limited (e.g., to 1000 items). But seriously, if array size is a concern, then parsers should protect against that no matter what the source, so that discussion should be entirely separate.

Similar to #428 ?

I think this is the case where a good feature might be missed due to some partisan entrenchment that has bubbled up over the course of the discussion.

As someone with no horse in this race, I think that a range should only be added if it is truly a datatype.

That is, the TOML parser should not evaluate the range and provide an array of numbers. It should provide a range object, which the implementation would then know how to deal with. If this was to be the case, the discussion reduces to whether TOML should provide a clearly defined range type, much like it provides a datetime type.

Assuming that range is now a type that is not evaluated by TOML, but merely parsed, this also requires the implementation to recognize that in almost all cases the field could be EITHER a range or an array of numbers. This likely wouldn't be much of an issue, but does add a bit of complexity to the language.

I think it then also becomes important to distinguish between bounded and unbounded ranges. These would be two separate types, ideally. It is important that an application can know whether a range is bound or unbounded, since in many cases an unbounded range might not be appropriate and lead to a non-terminating execution.

@JeppeKlitgaard Bounded ranges would be great. Personally, I have no need for unbounded ranges or non-integer ranges.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clarfonthey picture clarfonthey  Â·  4Comments

uvtc picture uvtc  Â·  3Comments

genericptr picture genericptr  Â·  4Comments

tamasfe picture tamasfe  Â·  3Comments

paiden picture paiden  Â·  3Comments