Libelektra: Classification of storage plugins

Created on 29 Sep 2020 · 11Comments · Source: ElektraInitiative/libelektra

I think we need some classification of storage plugins. At the very least we need to distinguish between "general purpose" (e.g. TOML, quickdump, yamlcpp) and "special purpose" (e.g. hosts, fstab, passwd) plugins. A "general purpose" plugin should be able to store any KeySet, while a "special purpose" plugin only needs to store KeySets that originally came from the plugin itself or have a compatible structure.

In general, a storage plugin should be able to read all possible files of the underlying format into some sort of KeySet and then turn that KeySet back into the same file. For "general purpose" plugins, this might be relaxed, if it conflicts with the goal of serialising all possible KeySets. Although such a deviation from the format spec, must be stated clearly near the top of the storage plugin's README.

Possible classification:

infos/features/storage with following tags:

[ ] read
[ ] write
[ ] spec (+version?)
[ ] preserves/order
[ ] preserves/empty/lines
[ ] preserves/comments
[ ] preserves/indentation
[ ] nested
[ ] arbitrary/metadata
[ ] directory/value
[ ] type

Source

kodebach

Most helpful comment

The main use case is the test suite.

Good point. See my new proposal below.

We could even render README.md from an README.md.in to make it nice when reading the actual file (or at GitHub), similar to the man pages.

Please don't. I find the solution for manpages absolutely horrible, because it involves committing auto-generated files into the git repository. I can't remember how often I had to revert the manpage files, because new versions where generated that only changed the date.

we tried very hard in the hosts plugin to preserve the formatting but e.g. the whitespaces between the aliases are still not preserved.

I see... You have a different understanding of "preserves file structure". I didn't think about whitespace (outside of comments or strings). I was thinking more on a syntactical level than on a byte level. In a sense you could say I meant "preserves the abstract syntax tree".

Preserving the file on a byte level would be useful for committing files into git or forms of diffing. For humans editing the file "preserving the AST" would probably be enough most of the time.

To sum up this discussion, I think we should add a new infos/features for machine use-cases. Each of the flags should have an associated test-suite. Only if the plugin passes these automated tests, can it have a feature flag.

In addition, there should be a human-readable description for each of the flags listed in central place (e.g. storage plugin tutorial). This human-readable list can also include additional plugin features that have no automated test-suites and therefore no flags. Both the features with flags an those without should also be described in the prose part of the README, so that new users don't have to look up the definitions of the flags.

For current plugins, with unknown feature sets, we could leave the infos/features empty and also set infos/status = experimental.

kodebach on 30 Sep 2020

👍2

All 11 comments

Thank you, this is a very good idea! I fully agree with the goal but I didn't have a good idea how to do such a classification.

In #666 there are some ideas but in the end I needed to drop them all as they over-complicate infos/status too much which needs to be simplified and not made more complicated.

My original idea was that storage plugins describe their structure via configuration for the struct plugin (if there is any limitation in structure). But due to limitations of the struct plugin and also the tedious work to describe the structure of every plugin this never happened. The struct plugin in the current form actually only can be retired.

Although such a deviation from the format spec, must be stated clearly near the top of the storage plugin's README.

It is not at the top of the README but something like this can be found in the section Limitation (often the last section) in most of the storage plugins. Obviously, documentation can be improved a lot.

markus2330 on 29 Sep 2020

The best idea is probably to either add a new infos/??? tag that is used just for storage plugins, or even simpler just define some categories and mention it at the beginning of the actual README. This kind of classification won't be used in any automatic process for a long time (that seems far too complicated), so it doesn't have to be a machine readable classification.

It is not at the top of the README but something like this can be found in the section Limitation (often the last section) in most of the storage plugins. Obviously, documentation can be improved a lot.

It's fine for the Limitations section to be at the bottom, as long as there is a visible reference to it at the top. At least for bigger limitations in storage plugins (e.g. if there was a JSON plugin that doesn't understand arrays). Otherwise #3472 will happen all over again...

kodebach on 29 Sep 2020

👍1

Any suggestions for the categories?

I think @sanssecours always did a very good job in describing the limitations (also in the beginning), e.g. www.libelektra.org/plugins/mini

markus2330 on 30 Sep 2020

Any suggestions for the categories?

Not really. I am also starting to think that (named) categories may be the wrong approach.

Maybe we should define a standard list of features that storage plugins may support and recommend/require a description of supported features in the README.

These features could include:

Can read files
Can write files
Supports all files following the standard spec for the format, if not list limitations
Retains exact file structure
Can store arbitrary KeySets
Supports arrays
Supports nested structure (more a feature of the underlying format)
Supports metadata
Supports non-leaf keys with value
etc.

Then we could also determine which of these features are needed for a default storage plugin

I think sanssecours always did a very good job in describing the limitations

Yes, mini is a very good example. It describes important limitations of the format upfront. But I'm still missing that mini doesn't support metadata. This is only mentioned at the end. At least a link "further limitations below" would be nice, so new users don't have to read the whole file to find the limitations.

kodebach on 30 Sep 2020

👍1

Yes, this is along the lines we already tried with infos/status. Maybe the mistake was that we wanted to squeeze all the information into the same field.

What about having a infos/features, which describes the features and not the status of the development?

Can read files Can write files

Yes, this is already in infos/status but actually fits in infos/features much better.

Supports all files following the standard spec for the format, if not list limitations

Maybe also the version of the standard?

Retains exact file structure

This is a huge topic and only formal approaches (like Augeas) have any chance to really do this without exceptions. I think for now following is useful:

ordering
empty lines
comments
leading whitespaces

Can store arbitrary KeySets

This is also a huge topic, including some of the other things mentioned below.

Supports nested structure (more a feature of the underlying format)

I think that intuitively it is quite clear what is meant by nesting. Of course there are some underlying formats that do not support it but also the ones who do, can always be serialized in a flat way. So it is a feature if we serialize nested (opposed to flat).

Supports metadata

Also a huge topic. There is already infos/metadata. I assume you mean to serialize arbitrary metadata that is not interpreted semantically (like comments or ordering)?

Supports non-leaf keys with value

:wink:

At least a link "further limitations below" would be nice, so new users don't have to read the whole file to find the limitations.

The question for me is: is this a limitation or only a missing feature?

E.g. in #3472 there were no expectations that metadata is seralized.

The other question is: should we separate limitations and features in the classification?

I listed what we have now in the top post, to be further edited.

markus2330 on 30 Sep 2020

I also added "type" at the end. If plugins do not have this, they would serialize everything as string and ignore the meta-data type. But this information is actually already in infos/metadata...

@sanssecours any further input? Would you do this classification for your plugins?

Is it too elaborate? Any ideas for simplification? (One goal of this issue is for me to simplify #666.)

markus2330 on 30 Sep 2020

A separate infos/features field would certainly be an improvement. But suspect that will have a similar problem to infos/status. Since this is just a list of flags that are automatically interpreted, it doesn't give a lot of details. It's also not very human-friendly and even less beginner-friendly. Unless we have a specific use-case where we need to interpret these features, I don't think this should be in a machine-focused format.

I was thinking more a long the lines of a section of questions in the storage plugin tutorial, similar to the checklists in the PR template. These questions should then be answered near the top of the README. Whether it is incorporated into the text:

This is a JSON plugin for Elektra. It supports the full feature set of JSON and is fully compliant with the JSON spec. The plugin also supports the full functionality of Elektra's KeySets. It can store metadata, binary data, non-leaf keys with values and correctly translates between Elektra's arrays and JSON arrays, as well as between the two type systems.

Or is some sort of checklist that even a newcomer would understand doesn't really matter, IMO.

In my mind the main goal was to give a quick idea of what the plugin does to a human user, who can investigate further if something is unclear, and not to give a precise specification for automated use.

this is along the lines we already tried with infos/status. Maybe the mistake was that we wanted to squeeze all the information into the same field.

infos/status certainly tries to do too much. But I also think, the rating that is attached to infos/status (and is very non-obvious at first) was a mistake. (see also #666)

This is a huge topic and only formal approaches (like Augeas) have any chance to really do this without exceptions.

Again, I think you interpreted my idea far to formal. I had thought of a very informal description of features. If the description of a plugin says "preserves file structure" most people wouldn't be mad if there is tiny change (hopefully they'd file a bug report). The README could even just say "tries to preserve file structure".

Also in some cases, it is actually possible to guarantee file structure is preserved without a lot of formal proofs. A simple example is mmapstorage, or a version of mini that uses order. If there is not a lot of structure, it is not hard to preserve it.

Also a huge topic. There is already infos/metadata. I assume you mean to serialize arbitrary metadata that is not interpreted semantically (like comments or ordering)?

Yes, I meant "this plugins can store the metadata associated with keys". How its done or whether it is interpreted or not is irrelevant, as long as kdb export /somewhere pluginX | kdb import /somewhere pluginX doesn't change anything.

The question for me is: is this a limitation or only a missing feature?

That's exactly the point of this issue. Currently there is no standard set of features and therefore nobody mentions if a feature is missing.

The other question is: should we separate limitations and features in the classification?

In a formal specification we can only have features. Because a limitation to me, means that the plugin supports part of a feature.

I also added "type" at the end. If plugins do not have this, they would serialize everything as string and ignore the meta-data type. But this information is actually already in infos/metadata...

This also not very human-friendly. Just seeing infos/metadata = type could be interpreted as a lot of things...

kodebach on 30 Sep 2020

we have a specific use-case where we need to interpret these features

The main use case is the test suite. Furthermore, in the longer run kdb tools can also be updated to show and use these flags. E.g. it would be nice to be able to specify specific features during mounting.

But the main feature of such flags is that they would have identical semantics across plugins and thus make plugins comparable for someone searching for the best plugin. (Even when done manually!)

It's also not very human-friendly and even less beginner-friendly.

I agree that the rendering of the infos/* entries in README.md should be a nice list with a copy from the explanation in CONTRACT.ini. But this is a nice-to-have which can be done at any time post-1.0. We could even render README.md from an README.md.in to make it nice when reading the actual file (or at GitHub), similar to the man pages. Prerendering actually makes a lot of sense because then all tools like qt-gui can already start from something more suitable to be read by humans.

A simple example is mmapstorage,

In binary formats where everything is fixed it is of course trivial to preserve formatting :wink:

or a version of mini that uses order. If there is not a lot of structure, it is not hard to preserve it.

I also thought so. E.g. we tried very hard in the hosts plugin to preserve the formatting but e.g. the whitespaces between the aliases are still not preserved. In mini you have similar problems around the =. There are so many sneaky little places that a formal approach simply makes more sense if you really want to completely preserve the formatting. So probably we should simply not make it as goal to preserve whitespace in general but only indentation.

That's exactly the point of this issue. Currently there is no standard set of features and therefore nobody mentions if a feature is missing.

Thank you for stepping forward :sparkling_heart:

As clarification to others: To simply improve the documentation (e.g. better describe limitations) no proposal is needed. Anyone can go ahead, nobody will object. So this discussion here can only be about something we want for all plugins (like a minimum standard).

markus2330 on 30 Sep 2020

The main use case is the test suite.

Good point. See my new proposal below.

We could even render README.md from an README.md.in to make it nice when reading the actual file (or at GitHub), similar to the man pages.

we tried very hard in the hosts plugin to preserve the formatting but e.g. the whitespaces between the aliases are still not preserved.

Preserving the file on a byte level would be useful for committing files into git or forms of diffing. For humans editing the file "preserving the AST" would probably be enough most of the time.

For current plugins, with unknown feature sets, we could leave the infos/features empty and also set infos/status = experimental.

kodebach on 30 Sep 2020

👍2

I started now documenting all the decisions relevant for 1.0 in #3514.

markus2330 on 14 Oct 2020

@sanssecours any further input?

Not really. I like the proposal by Klemens (the part below the horizontal ruler), since it allows automatic assignment based on tests.

Would you do this classification for your plugins?

If the classification is easy and I have time to do it, then sure.

sanssecours on 7 Nov 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Toml/tablearray problem

markus2330 · 3Comments

testmod_gopts: sporadic timeouts

mpranj · 3Comments

Jenkins: build timeouts

mpranj · 3Comments

Release: debian

mpranj · 3Comments

Plugin Functions: Return Value on Success

sanssecours · 4Comments