Toml: Suggestions to enhance TOML documentation and specification

Created on 5 Feb 2020  Â·  14Comments  Â·  Source: toml-lang/toml

Hello,

I have prepared Russian translation for TOML (#700). I got several ambiguation. Here are my notes, which I think should be corrected.

Notes which request a better explanations / example:

  • [x] 1. In Spec there are defined Whitespace and Newline, but in the multi line Strings uses term "Whitespace characters", which includes both terms.
  • [x] 6. In the data-time sections a lot of duplications which could be express in better way (I suppose) (related PR #696)
  • [x] 7. Some ambiguation: you text that "A TOML file must be a valid UTF-8 encoded Unicode document",
    but class Whitespace characters is defined very poor. What if there are
    other UTF-8 whitespace characters before a Key?

Work-in-Progress:

  • [x] 3. Some paragraphs contains more than one idea, and it is better to split them into several to improve readability. For example, when multi-line Strings are defined at the same time there is enumeration of its properties, which may be separated and have dedicated link anchor. (WIP #705)
  • [x] 5. In Integer definition as I suppose, it says "64 bit (signed long) range expected ...", but it is not clear defined that implementation should use signed long encoding. (WIP #705, alse there is discussion in #688)

Closed notes:

  • [X] 2. It is useful to use more heading levels with #, ##, ### and etc, It makes table of content more structured and easy to refer to exact part if it require. (Note: also it would be useful to pdf rendering with pandoc as Example) (this topic are touched in #702, and moved to #703 and #704 )
  • [X] 4. Table should be renamed to Hash Table or Dictionary. This word makes wrong association with CSV and Spreadsheets (Which I initially expected). (With context "Table" as name is OK. comment)
  • [X] 5.1 You try to define (or recommend) hardware limitations of numbers (like 64 bit or IEEE binary64. But as mentions a lot of times it is really hardware specific. Maybe you should describe more clear a template of relationships between toml and hardware, and provide "recommended" filled template which appropriate for the most cases (i.e. x64). (Discussion in #538)
  • [X] 8. Do you have any recommendations to keep TOML document compatible with INI format, or Shell file (with variables only)? It would be good practice to prepare system to transfer from INI to TOML format. (Clearly described in comment)
  • [X] 9. This example really unexpected (from README.md) (WIP: #702, now it is #706):
# As I understand It is valid
fruit.apple.smooth = true
fruit.orange = 2
# THIS IS INVALID (why not?)
fruit.apple = 1
fruit.apple.smooth = true
  • [X] 10. I suggest also define .tml extension, I program embedded device with FAT32
    and I limited with DOS 8.3 filename format. So I have to make it simple .txt (which is more desirable in my case, actually). (discussed in #573)
clarification question

All 14 comments

You may want to split up all these points into separate issues. For now, I only have time to address one or two.

Regarding points 1 and 7, don't confuse TOML whitespace, which is cleanly defined, with Unicode's whitespace characters. When the README says the word *whitespace," it consistently refers to CHARACTER TABULATION (U+0009) and SPACE (U+0020). If you believe that the spec needs to acknowledge other characters that are considered whitespace in Unicode, you'll need to make a technical case for explicitly including those characters in the spec.

Regarding point 10, this was discussed on issue #573. You could use .tml if you wanted, though it seems like you'd have less confusion if you used .tom. Just my opinion, there.

Regarding point 6 in particular, I have assumed that before v1.0 is released, that somebody will apply consistent formatting across the spec, including "refactoring" the language for dates, times, and timestamps. I have a pending PR (#696) that would be affected by such an effort. But I welcome efforts like these.

I'm a little biased, though. I'd love for TOML to have a standard as tightly defined as ECMA-404 is for JSON. But because TOML was originally forged in flames of urgency, I wish for the self-righteous confidence expressed in the original spec to shine through in v1.0. Again, this is just my opinion. I'll admit, though: selling TOML to those who haven't considered using it before shouldn't be the end goal of a technical specification.

This sort of touches upon point 8 as well. There's never been an official INI spec, and TOML certainly comes closer to an official specification than any INI format has.

Hello, @eksortso . Yeap, I also have no so much time to fix it right now. So let's keep it as one note to not loose information. If someone can fix it, it will be better to start linked pull-request with discussion, rather than new issue.

  1. I am just afraid that other Unicode whitespaces can confuse users.

    1. .TML stands as Toml Markup Language, but yes there are collisions (TOML is not a Markup Language, It is more data structure/object language, so TOL or TDL can be considered)

Honestly, I was thinking about specification-editor/author skills, so it is really good way to practice in it. But I am limited with time and work. So you will see PR if it will have happened.

This sort of touches upon point 8 as well. There's never been an official INI spec, and TOML certainly comes closer to an official specification than any INI format has.

The point 8 is not about official/unofficial. It is about guidelines for migrations. Who uses INI format there is straight-forward way to start use TOML. JUST IDEA: for shell user, we may create parser export-toml-to-shell which parse toml and modifies environment variable according to it. With shebang (#!/.??./bin/export-toml-to-shell) user can import configuration with source command.

The point 8 is not about official/unofficial.

I just meant that one of the features of TOML that I've heard expressed by outsiders is that it does have a well-defined specification. There's no INI spec.

There was talk about appealing to current users of INI, under a long-closed issue. But in the end, folks thought that TOML would be used more for new applications (like the import-into-shell feature you described), and not so much for existing apps. Older programs were written to use their unique INI file dialects, after all.

If we're doing things right, then the benefits of a TOML configuration will be obvious to curious users. We do encourage folks to try the format and to offer feedback.

Point 2: I'm only a little familiar with basic Markdown to understand the #, ##, etc. syntax that you're advocating. Of course, we already use # for comments.

But I think you're asking for something else entirely different: using TOML to define documents that can be converted into PDFs. Writing a script to make TOML compatible with pandoc could be a productive project, and interesting at least.

But I'm inclined to think that there are better formats for writing documents than TOML. You'd have to wrap every paragraph with quote marks, for instance.

Correct me if I'm wrong here.

Point 9:

# THIS IS INVALID (and let me show you why)

# This sets fruit.apple to an integer value.
fruit.apple = 1

# But this tries to make fruit.apple a table, containing "smooth = true". You
# cannot make a key's value an integer on one line and then change it into a
# table on another line.
fruit.apple.smooth = true

Point 4: I can understand having clear terminology for our sets of key-value pairs. The word "table" is already fairly common for such a thing. Some programming languages, e.g. Lua, even officially call them tables. And things like lookup tables arguably have been around longer than computers. "Table" for this concept is a legitimate term, and when used with key-value pairs, makes perfect sense.

It's true that two-dimensional arrays are also called "tables" in English. But there's no confusion between the concepts in the context of key-value pairs and their organization.

Point 5 does raise some important topics. Issue #538 overlaps with this somewhat. But just like that issue, this overall topic seems to be a post-v1.0 issue.

Point 2: ...
Correct me if I'm wrong here.

You have been mislead-ed. Look into this commit and comment. I told about TOML documentation mark up

@eksortso re https://github.com/toml-lang/toml/issues/701#issuecomment-584981257:

That sounds like a good explanation – maybe you could turn it into a little PR? (I would skip the "(and let me show you why)", but the rest is fine.)

The discussion in #702 has turned into the direction of a possible more thorough solution, but until that happens (and it probably won't be soon) such a succinct and understandable comment would be very helpful.

@eksortso re #701 (comment):
That sounds like a good explanation – maybe you could turn it into a little PR? (I would skip the "(and let me show you why)", but the rest is fine.)

Actually, the #702 is initially for this purpose. We came to the same understanding, but with little time shift. We just need to choose appropriate words.

That sounds like a good explanation – maybe you could turn it into a little PR? (I would skip the "(and let me show you why)", but the rest is fine.)

@ChristianSi Sure thing. I wanted to shorten it, but it probably works best as it is. Without that parenthetical comment, sure.

Edit: Actually, let me try this shorter version instead. Do you think it would work better?

# THIS IS INVALID

# This defines the value of fruit.apple to be an integer.
fruit.apple = 1

# But then this treats fruit.apple like it's a table.
# You can't turn an integer into a table.
fruit.apple.smooth = true
Was this page helpful?
0 / 5 - 0 ratings

Related issues

jacobconley picture jacobconley  Â·  4Comments

emilmelnikov picture emilmelnikov  Â·  4Comments

jakwings picture jakwings  Â·  3Comments

ChristianSi picture ChristianSi  Â·  4Comments

LongTengDao picture LongTengDao  Â·  4Comments