As first discussed in https://github.com/akrennmair/newsbeuter/issues/158, moving urls file from plain text to a more sophisticated format like YAML will let us have per-feed settings (as requested in https://github.com/akrennmair/newsbeuter/issues/248, https://github.com/akrennmair/newsbeuter/issues/326, https://github.com/akrennmair/newsbeuter/issues/465, https://github.com/newsboat/newsboat/issues/77, and probably touched on in https://github.com/akrennmair/newsbeuter/issues/176). #77 is fairly recent, and there is also a related discussion on our mailing list, so I'm creating this tracking issue here now.
The first order of business is to decide on the format. Requirements Wishes are:
As discussed on IRC, I took a look around the world of configuration languages, there's... not much to choose from, ini and yaml are the only languages I found that fit our criteria so I'm just going to break them down so we can decide:
I'm going to comment with the mindset that they will be used only for the urls file, meaning we need to have a feed identifier (url) and some associated data whose data format varies (strings, lists etc).
# feed without any tags
[https://github.com/newsboat/newsboat/releases.atom]
# feed with tags and name
[https://newsboat.org/news.atom]
name="newsboat.org"
tags="newsboat,news"
#- or, with TOMLs lists
tags=["software", "updates"]
# feed without any tags
- url: https://github.com/newsboat/newsboat/releases.atom
# feed with tags and name
- url: https://newsboat.org/news.atom
name: "newsboat.org"
tags: "newsboat,news"
#- or
tags:
- newsboat
- news
Rejected languages:
In my opinion yaml is the best bet for customization and future expandability while ini/toml is still a good choice and a safe bet, it's not nearly as powerful.
As much as I normally prefer toml, I believe yaml is better for this.
Thank you for the analysis! I'll go read the YAML spec to see what pitfalls it has for the non-technical user. The only such pitfall I can see right now is "no tabs for indentation" rule—we'll need to make sure that our parser reacts to this with a friendly message.
Just for the record. It probably won't affect us very much as the feed config doesn't really benefit from them, but multi-line strings in YAML are a bit… let's say: scary (you may also want to read through those comments). Nevertheless, YAML still seems like the best choice suiting our needs.
Until now I've never heard of CSON and it indeed looks even better than YAML.
Hey, why not using XML, Newsboat already has a parser for that?! :-> No, please don't.
I finally read the Preview chapter of the spec.
After reading about structures in YAML, I changed my focus from "what pitfalls await the user" to "how much potentially useless things YAML contains" (useless for Newsboat's urls file). I think the answer is "quite a lot": aforementioned structures, complex mapping keys, myriad ways to escape/multiline strings, explicit typing. The only remaining features are structures, scalars, and tags.
TOML has everything but the tags, but we can emulate them in code by adding "settings groups" or something:
[group.frequent]
reload-time = 1
reload-retries = 1
[group.broken_ssl]
ssl-verifypeer = false
['https://example.com/feed.atom']
groups = [ 'frequent', 'broken_ssl' ]
After that, I read TOML spec with the same focus. It's better; potentially useless features are: special handling for dates and time, arrays of tables. There are only four types of strings, which is acceptable I guess.
One inconvenience is that tables use keys for names, and bare keys can only contain ASCII alphanumerics, underscores, and dashes. This means @tsipinakis's example would look slightly differently:
# feed without any tags
['''https://github.com/newsboat/newsboat/releases.atom''']
# feed with tags and name
['''https://newsboat.org/news.atom''']
name="newsboat.org"
tags="newsboat,news"
#- or, with TOMLs lists
tags=["software", "updates"]
One set of quotes might be enough (i.e. 'https://newsboat.org/news.atom'), but not always; since URLs might contain single quotes (they don't have to be percent-encoded IIRC), it's better to always use triple quotes. Unfortunately for us, this is part of the format, so the parser would hide this detail from us, and we wouldn't be able to enforce it. In other words, that's a pitfall for the user.
I also didn't see any requirement on the order of the tables, i.e. it's not guaranteed that parser will return URLs in the same order as they are in the file. This is a problem because we depend on that order (by default feedlist shows feeds in the same order as they're in the file).
I invite everyone to think hard about things that we might add to urls file and which can't be (conveniently) expressed in TOML.
@gregf, @der-lyse, I'd be glad to see expanded versions of your comments to see what makes you prefer YAML in this case.
To clarify: "useless" features worry me because user might inadvertently trigger them and either get a confusing error message (most likely), or, worse yet, get a valid file that doesn't do what they think it does (highly unlikely, and I struggle to come up with an example such that it would be both valid YAML and acceptable from Newsboat's standpoint). Less is more.
To start off with, advantages and disadvantages on the YAML vs. TOML topic are very minimalistic in my opinion. But here are my thoughs (later points are just brainstorming, probably not helpful at all, but who knows):
url key it's even longer than TOML, so I'm obviously biased as I'm more used to YAML than to TOML (in fact only used INI so far).Speaking of that – watch out, now it's getting sketchy – we may could just allow an abbreviation for that by allowing strings instead of objects:
# feed without any tags, just a string and no object
- https://github.com/newsboat/newsboat/releases.atom
# feed with tags and name
- url: https://newsboat.org/news.atom
name: newsboat.org
tags: [software, updates] # I definitely prefer using arrays here
But of course this makes editing the file harder if one decides to finally add some settings. :-(
Yeah, a lot of quoting is indeed bad. Let's not do that. I also agree that most feeds will have no custom settings.
I think if we adopt TOML or YAML, we'll have to provide an interface to edit them :)
Thinking of a custom format, the first thing that comes to mind is a mix of current format plus our config format:
https://example.com/atom.xml
https://newsboat.org/news.atom "awesome software" "buggy software"
- max-items 40
https://github.com/newsboat/newsboat/releases.atom
- use-proxy no
- download-full-page yes
In this new format, a line either:
Pros:
Cons:
This is just a proposal. It feels like a half-measure, and I'm not sure if it's enough or if I'm just delaying the inevitable.
I like that idea a lot, this is absolutely great! It marries both the _urls_ and _config_ in a mostly seamless manner (just the dashes are truly new), I'm really amazed. :-)
Speaking of dashes, maybe just say, that the per-feed config must be indented by at least one whitespace, so no leading dashes at all. Indentation is a common thing to do when organizing things hierarchically, even non-technical people do this I reckon. To go even further, we should also be able to recognize settings without any indentation or dash prefixes because they look quite different to regular URLs, query and exec feeds. However some dedicated marking (indentation and/or dash) of per-feed settings certainly don't hurt. I'm not arguing against the dashes, I'm totally fine with them, it's just another idea to (maybe) further simplify the new urls file format.
Regarding reusable configuration blocks we could add them, too. E.g. if the line starts with @ (or whatever else) it's recognized as a named block definition which can be used anywhere else:
https://example.com/atom.xml
https://newsboat.org/news.atom "awesome software" "buggy software"
- @news
- max-items 40
https://github.com/newsboat/newsboat/releases.atom
- @news
@news
- reset-unread-on-update yes
- reload-time 60
On a side note: it seems that special care has to be taken with some of the settings, like reset-unread-on-update which takes a list of URLs at the moment but in the _urls_ file should probably take a boolean instead.
And as the example above illustrates, I'd like if one could use the reusable block even before it was defined. Defining it a second time would be an error.
We may also consider to introduce a tags setting (which is valid in the _urls_ file only), so that even the tags could be reused quite easily. But I don't use them, so I'm not sure if that's actually something useful or even worth the hassle with per-feed only.
If a custom format is the way forward, and it'll require indentation, I suggest limiting it to spaces or tabs. I hate to bring up that old argument but allowing both can create drama.
@der-lyse, good points, I'm still thinking :)
@sungo, what kind of drama do you mean? I understand how this leads to holy wars between programmers, but urls file is not source code: it's used by a single person, and there is no collaboration on a config. As a result, every user can pick their own style, and noone will get mad because noone else uses that particular file.
@Minoru agreed, mostly. The issue has arisen for me mostly because of bad editor defaults that don't visually distinguish between hard tabs and spaces. I have ended up with personal drama because a file ended up with mixed spaces and tabs. Sure, that's my fault for accepting bad defaults but it caused an issue, regardless.
On the flip side of my own argument, YAML's fussiness around indentation, as @tsipinakis noted, is one of its worst qualities.
If the intention is to only ever allow a single level of indentation, mandating spaces over tabs over allowing mixed doesn't matter all that much. If there might be a possibility for multiple levels of indention, it might be something to consider.
If the intention is to only ever allow a single level of indentation, mandating spaces over tabs over allowing mixed doesn't matter all that much. If there might be a possibility for multiple levels of indention, it might be something to consider.
Yeah. When I wrote the proposal, I had a single level in mind. My thinking was that more levels means more complex structures, and if we need that, we're definitely would be better off reusing some existing format. OTOH limiting ourselves to one level might prevent us from implementing something in the future.
I guess the next step is to find out where we could use more than one level of indentation. I'd go through the list of options we already have and see if any of them would benefit from being turned into arrays or dictionaries (which can be handily represented as a list with some indentation).
To clarify, I think you can preserve the order of a toml dictionary with serde by deserializing into a Vec<(String, Settings)>. If you want to write your own parser I would recommend looking at nom: https://github.com/Geal/nom I can provide a rust parser for the format you suggested if there's interest.
Would be cool if this new format would support per-feed proxy settings, that's the only thing I wish newsboat had.
Most helpful comment
As discussed on IRC, I took a look around the world of configuration languages, there's... not much to choose from, ini and yaml are the only languages I found that fit our criteria so I'm just going to break them down so we can decide:
I'm going to comment with the mindset that they will be used only for the urls file, meaning we need to have a feed identifier (url) and some associated data whose data format varies (strings, lists etc).
INI/TOML
Pros:
Cons:
Sample urls file (at least how I image it looking)
YAML
Pros:
Cons:
Sample urls file
Rejected languages:
In my opinion yaml is the best bet for customization and future expandability while ini/toml is still a good choice and a safe bet, it's not nearly as powerful.